Azure MLOps Template Part 2: Environments

Introduction

This is the second part of a 10-part blog article series about my Azure MLOps Template repository. For an overview of the whole series, read this article. I will present each of the 10 parts as a separate tutorial. This will allow an interested reader to easily replicate the setup of the template.

In the first part, we have seen how to provision our Azure infrastructure with the Infrastructure as Code (IaC) open source tool called Terraform. In this second part, we will cover how to set up reliable, reproducible and easily maintainable Python environments using the popular open source package management system Conda as well as the Azure Machine Learning (AML) Environments feature.

Prerequisites to follow along:

You have an Azure Account
You either have a Pay-As-You-Go subscription (in which case you will incur cost) or you have some free Azure credits
You have followed along in the first part of the blog article series or already have an Azure Machine Learning workspace provisioned

Conda & AML Environments Overview

One of the main objectives that we should keep in mind when building machine learning systems is to have reliable and reproducible code environments. Although the art of building ML systems has grown more mature over the last couple of years and has adopted various good practices from traditional software engineering, it still happens alarmingly often that Data Scientists and ML practitioners say things like “it worked on my machine”. Many hours of precious time get lost when moving from experimentation to production just because proper code environment management is not in place and it is cumbersome to reproduce environments and identify dependencies. For this reason, creating reproducible code environments is one of the key elements of successful MLOps.

Conda and AML environments are two important tools that I will leverage as part of my Azure MLOps Template repository to guarantee reliable and reproducible code environments. Conda is an open source package management system and was initially created for Python programs. By creating different conda environments on our machines, we can create isolated environments according to the needs of our current ML system. This is especially important since different versions of Python dependencies often create conflicts and might not be compatible with each other. For example, we might need one version of PyTorch for one model but another version for another model. Instead of always upgrading and degrading the package on our machine’s Python installation, we can just create separate Conda environments and switch seamlessly between them. Moreover, Conda environments can be defined in YAML files and easily be recreated from these YAML files. This allows us to easily port our environments from one compute target to another one.

AML environments take the concept of Conda environments one step further as they allow us to create holistic dockerized software environments. These environments can include Python dependencies the same way as Conda environments but also allow us to define environment variables, run times and other software settings around our training and scoring scripts. What we will see in my Azure MLOps Template repository is that we can use a Docker image as a base image containing all our desired software settings and then install a Conda environment on top to include our required Python dependencies. The beauty of AML environments is that they can also be directly created from Conda YAML files and can be easily managed, versioned and retrieved via the AML workspace.

In the following, I will give a step-by-step walkthrough of the environment setup process using Conda and AML environments. We will leverage the Azure infrastructure that has been provisioned in part 1 of this blog article series as well as the cloned Git repository that has been created inside your Azure DevOps Project.

1. Clone your Azure MLOps Template Repository to your AML Compute Instance

Log in to the Azure portal, search for the resource group “mlopstemplaterg”, and then click on and launch the Azure Machine Learning workspace:

Alternatively, you can also directly navigate to the Azure Machine Learning website and select the provisioned AML workspace.

In the Azure Machine Learning workspace, navigate to the “Compute” tab and launch Jupyter Lab from your Compute Instance:

Once in Jupyter Lab, open a terminal:

Then navigate to your User folder and clone your Azure MLOps Template repository from your created Azure DevOps Project to your AML Compute Instance. To do this, open a separate browser tab to retrieve your repository clone URL. Go to dev.azure.com, sign in, navigate to your mlopstemplateproject Azure DevOps project and then to the cloned pytorch-mlops-template-azure-ml repository:

Click on “Clone” and copy the HTTPS clone URL.

Next, create a new Personal Access Token (PAT) in your Azure DevOps Project and use it as password when authenticating the following clone operation. Have a look at the part 1 article if you do not remember how to create a PAT.

To clone the repository, run the following from your terminal:

$ git clone <REPO_CLONE_URL>

where your <REPO_CLONE_URL> will have the following format:

https://<ADO_ORG_NAME>@dev.azure.com/<ADO_ORG_NAME>/mlopstemplateproject/_git/pytorch-mlops-template-azure-ml

2. Run the Environment Setup Notebook

Now that you have cloned your repository to your AML Compute Instance, we will dive into the actual environment setup. For this purpose, I have created a notebook called 00_environment_setup.ipynb, which is located in the notebooks directory of the template repository. Open this notebook on your AML Compute Instance (e.g. with Jupyter Lab). Then follow the instructions from the Markdown cells step by step. In the rest of this article, I will be giving supporting explanations and information in addition to the instructions of the Markdown cells. Feel free to skip reading through it if you prefer to go through the notebook yourself.

2.1 Environment Setup Notebook Overview

In the 00_environment_setup.ipynb notebook, I explain how to create a local Conda environment for development with an IDE (such as VS Code) as well as how to create a Jupyter kernel based on this local Conda environment. The Jupyter kernel can then be used to run notebook cells in our newly created environment with all our specified Python dependencies available.

Last but not least, I also show how environments can be registered as AML environments to the AML workspace. From there they can be used by other Data Scientists on other compute targets. This will ensure easy reproducibility and robust development, training and deployment workflows.

Next to the development environment, I recommend to create two additional separate environments for training and deployment respectively. The reason behind this recommendation is that different dependencies are needed during different stages of the ML lifecycle. By separating our code environments, we can keep them as leightweight as possible and only need to include the necessary dependencies for each stage. With this approach, environments will stay small and less error-prone.

2.2 Creating Conda YAML Files

The backbone of our Python environment management are Conda YAML files that are checked into our git repository. Instead of just creating a Conda environment locally and then installing dependencies with Pip and Conda via the terminal, we will include all Python dependencies that are required for each of the stages in these Conda YAML files. We then create the code environments directly from these Conda YAML files so that no dependencies get lost if we want to install the same environments on other compute targets.

I would also strongly encourage you to fix all packages to a specific version as lacking the same can often lead to broken environments or undesired automatic updates, which might cause your code to break.

For the example model that we will train in the subsequent articles, the development environment Conda YAML file looks as follows for example (this file is created as part of the notebook):

Dependencies that are defined in the “dependencies” section are installed from Conda directly whereas packages defined in the “pip” section are installed from the Python Package Index pip. In both cases, however, the packages will be installed into our isolated Conda environment.

Note: Versions for Conda dependencies are fixed with a single “=” sign whereas versions for pip dependencies are fixed with a double “=” sign.

Similar Conda YAML files are created for the training and deployment environments.

2.3 Creating a Conda Development Environment

After we have specified our Conda YAML files, we now install our Conda development environment from our Conda development YAML file. We can then leverage this Conda development environment from our development IDE, e.g. VS Code, or we can use it to create a Jupyter kernel for usage from within notebooks.

In order to create the Conda development environment on the AML Compute Instance, open a terminal, navigate to the template root directory, and create a Conda environment from the YAML file as mentioned in the notebook:

$ cd <TEMPLATE_ROOT>
$ conda env create -f environments/conda/development_environment.yml --force

You now have a ready Conda environment that can be leveraged from VS Code. If you want to use VS Code for your development, click on VS Code next to your Compute Instance in the “Compute” tab of the AML workspace:

Note: There are a few prerequisites to leveraging VS Code on the AML Compute Instance:

You have VS Code installed on your local workstation
You have the AML Extension installed in your VS Code
You need to log in to your Azure account in your local VS Code

2.4 Creating a Jupyter Kernel

If we want to leverage the Conda environment from within notebooks in Jupyter Lab, Jupyter Notebooks or from the AML Notebooks experience, we need to create a Jupyter kernel from the Conda environment. Creating a Jupyter kernel from a Conda environment is simple. As described in the notebook, simply open a terminal and run the following commands.

First activate your conda environment:

$ conda activate dogs_clf_dev_env

Next, use the ipykernel library to install a Jupyter kernel. Use the same name as for the Conda environment:

$ python -m ipykernel install --user --name=dogs_clf_dev_env

Now, when you open a new notebook, you will be able to select the installed Jupyter kernel.

Note: You will not need to change the kernel for this particular notebook. The environment setup notebook should be run from the Python 3.6 – AzureML kernel. However, we will need to select this Jupyter kernel for the other notebooks to run. We will see this in the next articles of my Azure MLOps Template blog article series.

2.5 Creating AML Environments

Now we will use the Python AML SDK to create AML environments that can be registered to the AML workspace. We leverage the Environment class to create an environment based on our Conda YAML files for development, training and deployment respectively. The conda YAML files will specify all Python dependencies that we require. In addition, we will specify a base docker image that includes additional required tools next to our Python dependencies. The environment can thus be understood as a layering of a base docker image and our Python dependencies on top. We then register the environments to our AML workspace. Here, for example, the development environment is shown:

Similar AML environments are created for the training and deployment environments.

Note: When creating the AML deployment environment, we need to set the inferencing stack version to “latest”. See the notebook for implementation details.

We can now see our registered AML workspaces in the “Environments” tab of the AML workspace:

Outlook

In the next parts of my Azure MLOps Template blog article series, we will have a look at how to use our registered AML environments to develop and train a model. In part 3 specifically, we will cover how to use Python and the Python AML SDK to download public data and upload it to an Azure Blob Storage that is connected to the Azure Machine Learning workspace.