Building CI/CD Pipelines with Azure Data Factory: Part 3

In Part 1, I delved into lessons learned, creating the Data Factory Resources and configuring Source Control. Here is a link to Part 1.

In Part 2, I covered setting up sample resources, creating your Data Factory Pipeline in a Dev Environment and Publishing it. Here is a link to Part 2.

Now it’s time to create our Release Pipeline in Azure DevOps. This will be the CI/CD Pipelines to easily deploy out Azure Data Factories across our environments!

Note: The following resources should of already been created for your QA environment:

Data Factory
Key Vault
Storage Account

Configure Containers in QA Storage Account

Your storage account will need to mimic your DEV Storage Account. Make sure it is identical. For this sample I had to do the following in the QA Storage Account:

Created the “copyto” and “copyfrom” containers
Uploaded the products.csv file to the “copyfrom” container.

Add the Secret to your QA Key Vault

You need to setup your connection string in your QA Key Vault. First, grab your connection string from your QA Storage Account. This is under the “Access Keys” menu.

Just in case you are new to Storage Account, take a look at the “Rotate Key” button. Also note that there are two keys: key1 and key2. Why? You can use either connection string for key1 or key2. This allows you to constantly rotate your keys as needed for security purposes. For instance, you can have a two week rotation cycle:

Week 1 ->
- rotate key1
- update Key Vault with the new connection string for key1
- wait pre-determined time for key to be propagated to all dependent resources
- rotate key2 since resources are no longer using it
Week 2 ->
- rotate key2
- update Key Vault with the new connection string for key2
- wait pre-determined time for key to be propagated to all dependent resources
- rotate key1 since resources are no longer using it
Repeat

If a key is compromised, it is only a matter of time before it no longer works. This rotation also allows you to quickly change the keys if your system is hacked. This entire process can be scripted and automated using PowerShell.

Now, let’s add the connection string to your QA Key Vault:

The name of the key must be identical to the name in Dev.

Once you have created the key, then you are ready to move on to adding the access policy.

Add Access Policy in Key Vault

Your QA Data Factory needs to have access to your Key Vault. In Part 2, we set this up for Dev. Now, we need to do it for QA. Start by clicking the “Add Access Policy” button.

This is a repeat of what we did before. I am only setting the access to allow my QA Data Factory to “List” secrets and “Get” secrets. It does not need any other access. Next, select the principal that needs access. This principal is your QA Data Factory. Then click “Add”

Finally, click “Save” which will save the new access policy for your Key Vault.

Data Factory Pipeline Triggers

In most real-world applications, your Data Factory Pipelines will be running based on scheduled triggers. When deploying, you need to make sure the triggers are stopped before deployment and started after deployment. Luckily we don’t have to write that logic. Microsoft already did. You just need to add a PowerShell script to your “adf_publish” branch in your repo. See the following link:

https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-sample-script

This script actually does a lot more than stopping and starting triggers. Directly from the link above:

The following sample script can be used to stop triggers before deployment and restart them afterward. The script also includes code to delete resources that have been removed. Save the script in an Azure DevOps git repository and reference it via an Azure PowerShell task the latest Azure PowerShell version.

Now add the script to your “adf_publish” repo.

Finally we are ready to create our CI/CD Pipeline.

Create your CI/CD Pipeline

Now let’s go setup Azure DevOps to deploy. Start by navigating to “Pipelines” in your project in Azure DevOps. Then head over to “Releases” and click “New Pipeline”.

We need an empty job. This will allow us to configure the pipeline as needed. So click “Empty Job”.

We will need to configure the stage. I kept the defaults since this is just a sample so I just closed the dialog.

Now add the artifact to the stage. We are deploying what was published by the DEV Data Factory. Remember, in DEV, when the Data Factory publishes the changes, it commits the changes to the “adf_publish” branch. We are not deploying/publishing an actual build like you would when deploying an App Service. Instead, we are just deploying ARM templates that are in the Repo branch so we need to pick the Azure Repo Project and Source. Then select the “Default Branch” which will be “adf_publish”. Finally, save the artifact.

Now the artifect is configured so it is time to setup the actual job. Click the “1 Job, 0 Task” link.

Now let’s add three tasks to this job. Here are the following tasks we will create:

First Task: Azure PowerShell Script (this will stop the triggers)
Second Task: ARM Template Deployment (this will deploy the ARM templates)
Third Task: Azure PowerShell Script (this will start the triggers)

First Task

You will need to click the “+” to search for the task to add.

This task of this job will be configured to stop the triggers. You’ll need to add an “Azure PowerShell” task.

Once you have added the task, you will see it added under your job.

Now click on the task and edit it. I changed the “Display Name” to include “Pre-Deployement”. Then you’ll need to select your Azure Subscription. The next step is picking the location of the PowerShell script that you added to your repo. You can easily do this by clicking the eclipse.

After that you need to fill in the script arguments. When running it for pre-deployment, the syntax for the arguments is the following:

-armTemplate "$(System.DefaultWorkingDirectory)/<your-arm-template-location>" -ResourceGroupName <your-resource-group-name> -DataFactoryName <your-data-factory-name> -predeployment $true -deleteDeployment $false

You ARM template location is in the “adf_publish” branch in your repo. You need the path including ARMTemplateForFactory.json file.

Here is the configuration. Also note, that I selected the latest version for the PowerShell version.

Now, save the job and let’s add the next task.

Second Task

We’ll need to add an “ARM template deployment” task.

After clicking add, configure the “Azure Details including the subscription, resource group and location. The action should be set to the default which is “Create or update resource group”.

Now scroll to the template section. Using the eclipse, select the the location of the ARM Template and the Template Parameters. Next click the eclipse for the “Override Parameters”. See the next screen shot.

Here is where the secret sauce happens. This is where everything comes together. You will need to adjust the parameters using the Override Parameters dialog. The factory name will be the resource name of your Data Factory in QA. The Key Vault will be the URL for the QA Key Vault. When it deploys, the ARM template will be deployed to the resource you set for factoryName and point it to the Key Vault you configured. Pretty sweet!

Finally, “Save” the task.

Third Task

This is really just a clone of the first task along with changing the boolean parameters. First, let’s clone the task.

Then move the ARM Template task in between the “Pre-Deployment” and the “Pre-Deployment copy” as below.

Now update the name to be “Post Deployment”. Then change the -predeployment flag to $false and the -deleteDeployment flag to $true.

Save! Your three tasks should look like this:

We are finally ready to deploy to QA!

Deploy

Let’s deploy it. First, I opened up the Azure Data Factory Studio. As you can see there are no pipelines, datasets, etc…

Then I close that tab. Now let’s go deploy! We need to create a release.

You will need to select the “Stage” and the “Version” of the “Artifact”. Basically this git hash of what you want to deploy.

Once you click “Create”, it will queue the release.

Then I go into the “Release” and click “Deploy”.

After a few minutes, your deployment will complete if you did everything properly.

Now open Azure Data Factory Studio (if already open, refresh it) for your QA Data Factory. Notice that your pipeline and datasets were deployed.

Then go check out your Key Vault. Notice that it points to your QA environment so the parameters were properly deployed.

Pretty exciting! We are finally ready to test the Data Factory Pipeline.

Run your QA Data Factory Pipeline

Notice that nothing is in the “copyto” folder in the QA Storage Account.

Back in Azure Data Factory Studio, open up the pipeline and go to your pipeline. Then click “Trigger Now”.

Most pipelines will require parameters however our pipeline does not. It’s just a simple sample. In order to run it, just click “Ok”.

You can verify it ran correctly in the “Pipeline Runs” under “Monitor”. Notice that the pipeline ran just fine.

But the proof is seeing that the file now exists in the “copyto” folder in the QA Storage Account. Nothing like the sweet smell of success!!!

Finally we have built out a DEV and QA environment with CI/CD for your data factory. You’ll have to do that again for your production environment but that should be a lot simpler now that you have QA working.

Conclusion

I really hope you have found this three part series really helpful for setting up your CI/CD Release Pipelines for your Azure Data Factory. I know this has been a lot to read through and setup up.

References

Here is a list of references:

Doylestown Coder

This blog covers .NET, C#, Visual Studio, VS Code, Azure, Azure Data Factories, Azure Functions, and so much more…