In Part 1, I delved into lessons learned, creating the Data Factory Resources and configuring Source Control.
Part 2 covers setting up sample resources, creating your Data Factory Pipeline in a Dev Environment and Publishing it.
Our Data Factory pipeline is going to be pretty simple. We’re just going to move a file from one location to another location in a storage account. Simple and easy sample but this article is more about DevOps then some really cool Data Factory.
Preparing for CI/CD
You need to think through all your resources your data factory will need access to like connection strings, storage accounts, etc. You could create them as parameters and replace them in the CI/CD pipelines. Not a bad idea but if your secrets change, you’ll have to redeploy.
I have seen a sample of Azure DevOp pipelines that take CSV files of secrets for a Data Factory. It uses that file to replace strings in the ARM Templates. I really hate that idea. I can’t even believe someone is suggesting that. In my opinion, using that solution is storing connection strings and secrets in places you don’t really want them. Next thing you know those CSV files are check in to the repo and then everyone on your team has passwords to prod.
Instead, I use KeyVault. It is much simpler approach. It is rather easy to tell CI/CD which key vault to use instead of storing parameters in a CI/CD Pipeline. It’ll simplify the deployment process and when a connection string or secret changes then you can easily update it in the appropriate key vault. Next time the pipeline is triggered, it’ll get the new secret from Key Vault. Easy to manage!
Setup a Storage Account
So let’s go ahead and setup a storage account. I am not going to go into a lot of details on this. Since this is a sample, we’ll keep it simple. I picked the redundancy to be “Locally-redundant storage” which is the cheapest. Then I clicked through the rest of the steps accepting all the defaults.
For this sample I created a DEV and QA storage account.
Setup a Key Vault
Again I am keeping this pretty simple. I’ll assume the reader knows how to create a Key Vault. I pretty much created the Key Vault with the default config.
For this sample I created a DEV and QA Key Vault. Once you have the Key Vault resource setup for each environment, then configure the secrets (using the same Secret Name) in each Key Vault making sure the secret is configured appropriately for that environment.
We are ready to setup our data factory. The first step is configuring the “Linked Services”
Configure “Linked Services” in ADF
Let’s setup a pipeline in our DEV data factory and publish it. As previously stated, we are not going to do anything exciting. We’re just going to create a pipeline that copies a CSV file from one location to another. I know you’re seriously disappointed but this blog is already long enough.
First let’s add our “Linked Services”. We want to add a linked service for our Dev Key Vault resource. You’ll need to go to “Linked Services” under “Manage” on the left menu.
Next click “+ New” and search for “Key Vault”. Click the icon for “Azure Key Vault” in the search result.
This will open the settings that you will need to configure for your Azure Key Vault. Name is just the name of the linked service. I leave the “Authentication Method” set to “Managed Identity”. Then I select “Enter Manually” for the “Azure key vault selection method”. This “Base URL” is a parameter you setup below. Under parameters, I setup the default value for the parameter. The Azure DevOps pipeline will pass the Key Vault in as this parameter. I’ll show you this in Part 3. But don’t save yet! There is still one critical piece you need to do so read the next section. We’ll come back and save this in a minute.
Now you need to open a new tab and head over to your Dev Key Vault to add an access policy to your Dev ADF. This is necessary for your Dev Data Factory to have access to your secrets. In your Key Vault, click “Access policies”.
The next step is adding an access policy. Click “Add Access Policy”. I only give Data Factory access to get and list secrets. Your scenario may vary but for this sample we are building, we only need to get the secret. Regarding the principal on the left, search for your Data Factory resource, click it and then click the “Select” button. Now you are ready to click the “Add” button on the left.
Wait a second! There is yet another button to click. You MUST click the “Save” button. I forget to do this all the time. The previous page said “Add”. That means add and save it to me. But not here! You still need to click that “Save” button.
Now go back to your Data Factory and test your connection. It should light up green. If so, then click “Create” button.
Let’s add a Linked Service for the Storage Account. This will be a little easier to create based on how the connection string is being handled. In this sample we are linking this to the “Secret” we created in the Key Vault. This is not the most secure way to connect to a storage account but for purpose of the sample it works well. We are linking this to the Azure Key Vault “Linked Service” we just created.
Test your connection. It should light up green. If so, then click “Create” button.
Our linked services are ready! They should look similar to the screen shot below.
Create the Data Factory Pipeline
Let’s start by creating a simple pipeline using the “Copy Data Tool”.
First you have to setup the Source. Notice how we select the linked service for the Storage Account. You’ll need to configure the “File or Folder” location to copy from.
Next we have to pick the Target. We’ll use the same storage account. In addition, we will enter in the “Folder path” to copy too.
The next step requires setting up the file format for the Target.
Next is the settings where we set the Task Name.
Now review the summary…
Then click “Next” which kicks of creating the copy pipeline.
Once the deployment is complete, the pipeline is created. You can click the “Edit Pipeline” button to view the new pipeline.
Here is the critical piece which is publishing the pipeline. This commits the pipeline to source control in the publish branch. This branch is used by Azure DevOps for deploying to the QA and Prod environments.
After clicking “OK”, the pipeline and datasets are published along with the linked services.
Check out your repo and you’ll see the commit for the publish you just completed.
Now let’s trigger the pipeline to run which will copy the file.
After running the trigger the file is copied into the “copyto” folder. So amazing, right! Ok, still just a quick sample…
Now we have a working pipeline in Dev. Part 3 will demonstrate how to setup Azure DevOps and deploy to your QA environment.
3 thoughts on “Building CI/CD Pipelines with Azure Data Factory: Part 2”