Introduction
Customer churn prediction is one of the most common value-generating use cases for machine learning across organizations. Specifically, machine learning models are used to predict the propensity of a particular customer to churn based on, for example, the customer demographics and other customer characteristics. In combination with identifying the customer lifetime value, this can be useful to target marketing campaigns to customers that have a high customer lifetime value but are also likely to churn.
In this blog article, I will give a step by step walkthrough to set up an end-to-end customer churn prediction model with Azure Machine Learning. You can find all resources in the following github repository. Different features of the Azure Machine Learning Studio will be shown while working to solve a Kaggle challenge to predict customer churn. The Kaggle challenge can be found here.
Prerequisites to follow along:
- You have an Azure Account
- You either have a Pay-As-You-Go subscription (in which case you will incur cost) or you have some free Azure credits
- You have at least some foundational Azure and Data Science knowledge
Hands-on Tutorial
Step 1
Create an Azure resource group:
Step 2
Create an Azure Machine Learning workspace:
Step 3
Enter your Azure Machine Learning workspace by clicking “Launch Now”:
Step 4
Create a compute instance in your Azure Maching Learning workspace:
Step 5
Open Jupyter Notebooks in your compute instance:
Step 6
Enter the terminal, switch directories to your user directory and clone this repository:
Step 7
Download the two data files from the “data” folder to your local machine:
Step 8
Create an Azure Data Lake Gen2. You do this by creating a storage account that has hierarchical namespace enabled:
Enable hierarchical namespace:
Create a container called “raw” in your Azure Data Lake Gen2:
Create two directories:
- 2020/03/31
- 2020/04/01
Step 9
Upload the two data files in the respective directories in your Azure Data Lake Gen2 (according to the date):
Step 10
Register your Azure Data Lake Gen2 as a storage account (not as an Azure Data Lake Gen2) datastore in the Azure Machine Learning Workspace.
First retrieve your storage account key from the Azure Portal:
Step 11
Register a dataset using your datastore. Important: the dataset has to be named “customer-churn”.
Step 12
Install all necessary dependencies on your compute instance:
Step 13
You can now run the notebooks. Specific explanations can be found as comments in the notebooks. You can omit running “03_customer_churn_train_decision_tree” and “04_customer_churn_train_automl” without affecting the downstream workflow. The notebooks will cover the ML lifecycle from exploratory data analysis to model deployment.