Introduction
This is the first part of a 10-part blog article series about my Azure MLOps Template repository. For an overview of the whole series, read this article. I will present each of the 10 parts as a separate tutorial. This will allow an interested reader to easily replicate the setup of the template.
In this first part, I am putting the spotlight on provisioning the Azure resources and an Azure DevOps project which we will use to build CI/CD workflows and version control our code. The Azure resources and the Azure DevOps project together make up our infrastructure and, thus, the foundation of our MLOps solution. We will use a popular Infrastructure as Code (IaC) open source tool called Terraform for the provisioning of our infrastructure.
Prerequisites to follow along:
- You have an Azure Account
- You either have a Pay-As-You-Go subscription (in which case you will incur cost) or you have some free Azure credits
Infra as Code and Terraform Overview
IaC is an integral part of DevOps and enables automation of the infrastructure provisioning process. Automation, in turn, helps to improve the speed, reliability, transparency, reproducability and robustness of the software delivery process.
As an integral part of DevOps, IaC is also a core component of an enterprise-grade scalable MLOps solution. In a simplified way, MLOps can be understood as DevOps for Machine Learning solutions; although I want to point out that Machine Learning solutions have very different DevOps requirements than traditional software solutions. This becomes clear for example through the fact that Machine Learning solutions heavily depend on data and cannot be looked at in an isolated way.
There are many tools that can be used for infrastructure provisioning in Azure such as ARM templates and a new language developed by Microsoft called Biceps. However, Terraform as an IaC tool is open source, has cross-compatibility across cloud providers and has evolved as a rising star gaining significant traction in the market. Therefore, I have decided to opt for Terraform as my tool of choice. Terraform makes use of declarative configuration files written in the human-readable HashiCorp Configuration Language (HCL).
In the following, I will give a step-by-step walkthrough of the infrastructure provisioning process using Terraform. While you might want to set up a complete enterprise infrastructure environment including private link and secure networking practices in some scenarios, I will only set up a quite simple infrastructure environment as part of this tutorial series. For more enterprise-ready infrastructure environments, Clemens Siebler has written a good blog here.
1. Set up an Azure DevOps Organization & Personal Access Token (PAT)
First, we need to set up an Azure DevOps Organization and create a PAT that can be used by Terraform to interact with the Azure DevOps Service API. For this purpose, go to https://dev.azure.com and sign in to Azure DevOps with your account. Then click on “New organization” in the left-side panel and create a new Azure DevOps organization with your desired name.
Note: We don’t need to create an Azure DevOps Project as this will be taken care of by our Terraform configuration files. So you can stop after you see the screen that prompts you for the creation of an Azure DevOps Project.
Within your new Azure DevOps organization, create a Personal Access Token with “Full Access” as follows:
Click on “Personal access token”, then click on “New Token” and create a new Personal Access Token called terraform_pat
:
Note: Make sure to store the created token in a textfile. It will have to be stored inside an environment variable in the next step.
2. Clone the Azure MLOps Template Repository to your Workstation
Now that we have created our Azure DevOps Organization and Personal Access Token, we will provision the rest of our Azure environment using Terraform. Open a shell, preferably a bash shell on your local workstation and clone my Azure MLOps Template repository. Then, navigate to the infrastructure
directory of the template repository where all Terraform configuration files are stored (replace <TEMPLATE_ROOT> with the path to your cloned repository):
$ git clone https://github.com/sebastianbirk/pytorch-mlops-template-azure-ml.git $ cd <TEMPLATE_ROOT>/infrastructure
Note: Below setup steps and commands are based on the Bash Unix shell. Some commands will deviate if alternative command-line shells, such as PowerShell, are used instead.
3. Prepare for Infrastructure Delivery with Terraform
Next, set up the two below environment variables and replace <ADO_ORG_NAME> and <ADO_PAT> with the name of your Azure DevOps Organization and the PAT token that you have stored respectively. It is important that the environment variables are prefixed with “TF_VAR_” so that they can be referred by Terraform.
$ export TF_VAR_ado_org_service_url="https://dev.azure.com/<ADO_ORG_NAME>" $ export TF_VAR_ado_personal_access_token="<ADO_PAT>"
If you have not yet installed the Azure CLI on your compute, install it as per this link. You can check whether you have the Azure CLI installed with the following command:
$ az --help
If this command successfully returns the help, then you are good to go.
Once the Azure CLI is installed, log in to your Azure tenant, set the subscription and install the Azure Machine Learning CLI extension. Replace <TENANT_ID> and <SUBSCRIPTION_ID> with the values of your Azure tenant and subscription respectively. Terraform uses the credentials stored by the Azure CLI to access the Azure Resource Manager API and the Azure DevOps Service API:
$ az login --tenant <TENANT_ID> $ az account set --subscription <SUBSCRIPTION_ID> $ az extension add -n azure-cli-ml
4. Deliver Infrastructure as Code with Terraform
If you have not yet installed the Terraform CLI on your compute, install it as per this link. You can check whether you have the Terraform CLI installed with the following command:
$ terraform -help
If this command successfully returns the help, then you are good to go.
Execute the Terraform initialization command to prepare the current working directory for use with Terraform:
$ terraform init
Run the Terraform plan command to check whether the execution plan for the configuration matches your expectations before provisoning or changing infrastructure:
$ terraform plan
Run the Terraform apply command to reach the desired state of the configuration (you will need to type “yes” to approve the execution):
$ terraform apply
Your Azure environment should now be provisioned. You can log in to the Azure portal and search for the resource group “mlopstemplaterg” to see all your provisioned resources:
Note: You will have a different resource suffix.
Outlook
In the next parts of my Azure MLOps Template blog article series, we will have a look at how to use this provisioned infrastructure to develop and train a model. In part 2 specifically, we will cover how to set up reproducible and maintainable Python environments using the popular open source package management system Conda as well as the Azure Machine Learning Environments feature.
Thank you for sharing! Great article.
Thank you Paula, glad you liked it!
Very cool article. The tf resource setup with a random suffix is a very nice idea. I like posts that setup their infrastructure automatically. 😉
Thank you Felix 🙂
It would also be interesting to explore biceps, Microsoft’s new domain-specific language to deploy Azure services.