Machine Learning - MLflow for managing the end-to-end machine learning lifecycle
Gaurav Pahuja
Senior Data Scientist | DatSci 2019 Finalist | Python/Plotly-Dash | R/R-Shiny | Oracle SQL/BI | SQL | Machine Learning | Deep Learning | Techfitlab
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. The main advantages of including MLflow in your ML lifecycle is the transparency and standardisation it brings to the table when it comes to training, tuning and deploying ML models. It lets you train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a “black box,” without even having to know which library you are using.
MLflow Setup
In the first step, we will set up the central repository database where we will log all our tracking information and we will also create an artifacts folder to store our models and other relevant information about the models.
# Terminal Command
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./artifacts \
--host 0.0.0.0 \
MLflow Tracking
MLflow Tracking is an API and UI for logging parameters, code versions, metrics and output files when running your machine learning code to later visualize them. With a few simple lines of code, you can track parameters, metrics, and artifacts:
Set Tracking URI and Create Experiment
Then in our notebook we will set up our tracking server as you can see in the example we are using localhost but you direct this to any other server as well based on passing the host and the port in the set_tracking_uri call. Initially, you will have a default experiment created which you can get with get_experiment call or you can create your own as shown in the next step.
mlflow.set_tracking_uri("https://127.0.0.1:8080/")
experiment = mlflow.get_experiment('0')
print("Name: {}".format(experiment.name))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
print("Experiment ID: {}".format(experiment.experiment_id))
Import Packages
Lets import all the packages that we will be using in this example.
import os
import sys
import pandas as pd
import warnings as w
import numpy as np
import datetime
import time
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import confusion_matrix, precision_score, matthews_corrcoef, recall_score, f1_score, plot_confusion_matrix, roc_auc_score, classification_report
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature
w.filterwarnings('ignore',category=Warning)
init_notebook_mode(connected=True)
sns.set(style="whitegrid")
%matplotlib inline
Create Dataset
In this example, we will create our own dataset through the sklearn.datasets package.
Package: sklearn.datasets.make_classification
from sklearn.datasets import make_classification
X, y = make_classification(
n_classes=2, class_sep=0.5, weights=[0.6, 0.4],
n_informative=3, n_redundant=1, flip_y=0.3,
n_features=20, n_clusters_per_class=3,
n_samples=80000, random_state=11
)
model_dataset = pd.DataFrame(X)
model_dataset['Class'] = y
model_dataset = sklearn.utils.shuffle(model_dataset)
Split Dataset
After shuffling the dataset in the previous step we will now split the dataset into a train, test and validation set.
n = len(model_dataset)
train_df = model_dataset[0:int(n*0.8)]
val_df = model_dataset[int(n*0.8):int(n*0.9)]
test_df = model_dataset[int(n*0.9):]
X_train, y_train = train_df.iloc[:,:-1], train_df.iloc[:,-1]
X_val, y_val = val_df.iloc[:,:-1], val_df.iloc[:,-1]
X_test, y_test = test_df.iloc[:,:-1], test_df.iloc[:,-1]
MLflow Auto-Logging On?
We will then turn on the autolog functionality in MLflow to record all the relevant information about the model run automatically. However, In final training of the model we will also look at how to log information and artifacts manually.
# enable autologgin
mlflow.sklearn.autolog(log_models=True)
mlflow.xgboost.autolog(log_models=True)g
Hyperparameter Tuning
Next, we can define the search procedure with all the elements mentioned below and run the RandomizedSearchCV after starting the mlflow instance through the mlflow.start_run method. We pass two arguments in mlflow.start_run, experiment_id and run_name, which will be saved in the mlflow front end i.e. UI.
Finally, we can perform the optimization and report the results.
xgb_reg = xgb.XGBClassifier()
params = {
'num_boost_round': [5, 10, 15, 25],
'eta': [0.05, 0.001, 0.1, 0.3],
'max_depth': [3, 6, 5, 8],
'subsample': [0.9, 1, 0.8],
'colsample_bytree': [0.9, 1, 0.8],
'alpha': [0.1, 0.3, 0]
}
with mlflow.start_run(experiment_id=experiment.experiment_id, run_name='debt_probability_model') as run:
random_search = RandomizedSearchCV(xgb_reg, params, cv=5, n_iter=100, verbose=1)
start = time.time()
random_search.fit(X_train,
y_train,
eval_set=[(X_train, y_train), (X_val, y_val)],
early_stopping_rounds=10,
verbose=True)
best_parameters = random_search.best_params_
print('RandomizedSearchCV Results: ')
print(random_search.best_score_)
print('Best Parameters: ')
for param_name in sorted(best_parameters.keys()):
print("%s: %r" % (param_name, best_parameters[param_name]))
end = time.time()
print('time elapsed: ' + str(end-start))
print(' ')
print('Best Estimator: ')
print(random_search.best_estimator_)
y_pred = random_search.predict(X_test)
Logging Information Manually
You can also log information manually in mlflow as shown below:
mlflow.log_param() logs a single key-value param in the currently active run. The key and value are both strings. Use mlflow.log_params() to log multiple params at once.
mlflow.log_metric() logs a single key-value metric. The value must always be a number. MLflow remembers the history of values for each metric. Use mlflow.log_metrics() to log multiple metrics at once.
mlflow.log_artifact() logs a local file or directory as an artifact, optionally taking an artifact_path to place it within the run’s artifact URI. Run artifacts can be organised into directories, so you can place the artifact in a directory this way.
mlflow.log_artifacts() logs all the files in a given directory as artifacts, again taking an optional artifact_path.
Final Data Split
Next, we will create our final train and test set to create the final model.
n = len(model_dataset)
train_df = model_dataset[0:int(n*0.8)]
test_df = model_dataset[int(n*0.8):]
X_train, y_train = train_df.iloc[:,:-1], train_df.iloc[:,-1]
X_test, y_test = test_df.iloc[:,:-1], test_df.iloc[:,-1]
Final Model
Lets train our final model after starting the mlflow instance through the mlflow.start_run method. We pass two arguments in mlflow.start_run, experiment_id and run_name, which will be saved in the mlflow front end i.e. UI. We will train our model based on loading the best parameters from our RandomizedSearchCV method from previous step:
?xgb.XGBClassifier(**random_search.best_params_)?
In this example, we will log the parameters manually in MLflow as shown below:
We will also save the model manually with the help of log_model method in MLflow. All the files will be saved in the ./artifacts folder. Finally, I am calculating some metrics on the test set (i.e. Accuracy, F1, MCC) and also creating some plots to visualise the results of my model (i.e. Feature Importance, ROC, Confusion Matrix), which is then saved in the folder in my directory (i.e. www/xgb_results). You can also pass the directory into the log_artifacts method which will log all the visualisations with the model run and will be visible in the front end of the MLflow UI.?
At last I have also logged my feature names with the model run to make sure when we use this model in the future, I do have the name of all the features that were used in the training. This is a useful technique in case you are using one-hot encoding on your dataset and your features are dynamic depending on the values in the dataset.?
# Final Model
with mlflow.start_run(experiment_id=experiment.experiment_id, run_name='roi_xgb') as run:
xgb_reg_main = xgb.XGBClassifier(**random_search.best_params_)
xgb_reg_main.fit(X_train, y_train)
xgb_dict = random_search.best_params_
mlflow.set_tag('model_name', 'roi_xgb')
# Log Parameters
mlflow.log_param('subsample', xgb_dict['subsample'])
mlflow.log_param('num_boost_round', xgb_dict['num_boost_round'])
mlflow.log_param('max_depth', xgb_dict['max_depth'])
mlflow.log_param('eta', xgb_dict['eta'])
mlflow.log_param('colsample_bytree', xgb_dict['colsample_bytree'])
mlflow.log_param('alpha', xgb_dict['alpha'])
# Save Model
signature = infer_signature(X_train, xgb_reg_main.predict(X_train))
mlflow.xgboost.log_model(xgb_reg_main, "xgb_roi", signature=signature)
y_preds = xgb_reg_main.predict(X_test)
y_preds_proba = xgb_reg_main.predict_proba(X_test)
# Calculating Metrics
acc_xgb_main = (y_preds == y_test).sum().astype(float) / len(y_preds)*100
f1_xgb_main = f1_score(y_test, y_preds, average='micro')
mcc_xgb_main = matthews_corrcoef(y_test, y_preds)
features = X_train.columns
# Feature Importance
xgb_importances_main = pd.DataFrame({'Feature': features, 'Importance': xgb_reg_main.feature_importances_})
xgb_importances_main = xgb_importances_main.sort_values(by='Importance', ascending=False)
xgb_importances_main = xgb_importances_main.set_index('Feature')
imp_xgb_main = xgb_importances_main[:25].plot.bar(figsize=(15,8))
fig = imp_xgb_main.get_figure()
fig.savefig('www/xgb_results/xgb_main_imp.png',dpi=100, bbox_inches = 'tight')
# Test ROC
roc_xgb_main = metric_graph(y_test, y_preds_proba[:,1], metric='roc', figsize=(15, 8),filename='www/xgb_results/xgb_main_roc.png')
# Test Confusion Matrix
class_names = np.unique(y_test)
disp_xgb = plot_confusion_matrix(xgb_reg_main, X_test, y_test,
display_labels=class_names,
cmap=plt.cm.Blues)
disp_xgb.ax_.set_title("Model: XGBoost")
plt.savefig('www/xgb_results/xgb_main_cm.png',dpi=100, bbox_inches = 'tight')
# Test Log Metrics
mlflow.log_metric('test_accuracy', acc_xgb_main)
mlflow.log_metric('test_f1_score', f1_xgb_main)
mlflow.log_metric('test_mcc_score', mcc_xgb_main)
mlflow.log_artifacts('www/xgb_results')
# Log Features
pd.DataFrame(columns = X_train.columns).to_csv('roi_features.csv', index=False)
mlflow.log_artifact('roi_features.csv', artifact_path='features')
MLflow does not support XGBoost model through scikit-learn API, which is why we log all the parameters and metrics manually. Let’s also look at an example for native implementation of XGBoost.
params = random_search.best_params_
params['eval_metric'] = 'mae'
del params['num_boost_round']
num_boost_round = 200
dtrain = xgb.DMatrix(X_train_undersample, label=y_train_undersample)
dtest = xgb.DMatrix(X_test_undersample, label=y_test_undersample)
# Final Model
with mlflow.start_run(experiment_id=experiment.experiment_id, run_name='roi_xgb') as run:
xgb_reg_main = xgb.train(params, dtrain, num_boost_round=num_boost_round, evals=[(dtest, "Test")], early_stopping_rounds=20)
mlflow.set_tag('model_name', 'roi_xgb')
# Log Parameters
mlflow.log_param('subsample', params['subsample'])
mlflow.log_param('max_depth', params['max_depth'])
mlflow.log_param('eta', params['eta'])
mlflow.log_param('colsample_bytree', params['colsample_bytree'])
mlflow.log_param('alpha', params['alpha'])
y_xgb_preds_main = xgb_reg_main.predict(dtest)
y_xgb_preds_main = [1 if n >= 0.5 else 0 for n in y_xgb_preds_main]
y_xgb_preds_main_proba = xgb_reg_main.predict(dtest)
# Calculating Metrics
acc_xgb_main = (y_xgb_preds_main == y_test_undersample).sum().astype(float) / len(y_xgb_preds_main)*100
f1_xgb_main = f1_score(y_test_undersample, y_xgb_preds_main, average='micro')
mcc_xgb_main = matthews_corrcoef(y_test_undersample, y_xgb_preds_main)
features = X_train_undersample.columns
# Feature Importance
ax = xgb.plot_importance(xgb_reg_main, max_num_features=25, height=0.5, importance_type='weight')
fig = ax.figure
fig.set_size_inches(15, 8)
fig.savefig('www/xgb_results/xgb_main_imp.png',dpi=100, bbox_inches = 'tight')
# Test ROC
roc_xgb_main = metric_graph(y_test_undersample, y_xgb_preds_main_proba, metric='roc', figsize=(15, 8),
filename='www/xgb_results/xgb_main_roc.png')
# Test Confusion Matrix
import matplotlib.pyplot as plt
class_names = np.unique(y_test_undersample)
matrix = confusion_matrix(y_test_undersample, y_xgb_preds_main)
plt.clf()
# place labels at the top
plt.gca().xaxis.tick_top()
plt.gca().xaxis.set_label_position('top')
# plot the matrix per se
plt.imshow(matrix, interpolation='nearest', cmap=plt.cm.Blues)
# plot colorbar to the right
plt.colorbar()
fmt = 'd'
# write the number of predictions in each bucket
thresh = matrix.max() / 2.
for i, j in itertools.product(range(matrix.shape[0]), range(matrix.shape[1])):
# if background is dark, use a white number, and vice-versa
plt.text(j, i, format(matrix[i, j], fmt),
horizontalalignment="center",
color="white" if matrix[i, j] > thresh else "black")
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names, rotation=45)
plt.yticks(tick_marks, class_names)
plt.tight_layout()
plt.ylabel('True label',size=14)
plt.xlabel('Predicted label',size=14)
plt.title('Technique: Undersample | Model: XGBoost')
plt.savefig('www/xgb_results/xgb_main_cm.png',dpi=100, bbox_inches = 'tight')
# Test Log Metrics
mlflow.log_metric('test_accuracy', acc_xgb_main)
mlflow.log_metric('test_f1_score', f1_xgb_main)
mlflow.log_metric('test_mcc_score', mcc_xgb_main)
mlflow.log_artifacts('www/xgb_results')
# Log Features
pd.DataFrame(columns = X_train_undersample.columns).to_csv('roi_features.csv', index=False)
mlflow.log_artifact('roi_features.csv', artifact_path='features')
MLflow Models
MLflow Models is a convention for packaging machine learning models in multiple formats called “flavors”. MLflow offers a variety of tools to help you deploy different flavors of models. Each MLflow Model is saved as a directory containing arbitrary files and an MLmodel descriptor file that lists the flavors it can be used in.
artifact_path: xgb_roi
flavors:
python_function:
data: model.xgb
env: conda.yaml
loader_module: mlflow.xgboost
python_version: 3.6.10
xgboost:
data: model.xgb
xgb_version: 1.3.3
run_id: 1c39ab98054340a5b14eebf975ae52b0
MLflow Model Flavours
In this example, the model can be used with tools that support either the sklearn or python_function model flavors.
Diverse Platform
MLflow provides tools to deploy many common model types to diverse platforms.
Compare Model Runs
You can compare and visualise your model runs to see which version of the model is performing better.
Visualise Model Plots
You can visualise the plots we created in the final model run in the MLflow UI.
Register Model and Deploy in Production
You can register your models in the MLflow UI and deploy them in production which can then be loaded into another python code by specifying the stage and name of your model, which we will demonstrate in the next step.
Load Feature Names and Model from Production
Once our model is in production stage, we will then open another python notebook which has our prediction dataset. We will look for the model version that is in the production based on the search_model_versions method. We do this to create our model path to load the features that we saved with our model in the artifacts folder to make sure the feature names are the same in our prediction set.
from mlflow.tracking import MlflowClient
client = MlflowClient()
for mv in client.search_model_versions("name='roi_xgboost'"):
model_versions = dict(mv)
if model_versions['current_stage'] == 'Production':
xgb_dict = dict(mv)
else:
xgb_dict = '0'
if xgb_dict == '0':
print('No model version is deployed to production stage..')
else:
features_filepath = os.path.join('./artifacts/0/',xgb_dict['run_id'],'artifacts/features/roi_features.csv')
model_features = pd.read_csv(features_filepath)
# Get missing columns in the training set
missing_cols = set(model_features.columns) - set(X_pred.columns)
# Add a missing column in prediction set with default value equal to 0
for i in missing_cols:
X_pred[i] = 0
X_pred[i] = X_pred[i].astype('uint8')
# Ensure the order of column in the prediction set is in the same order than in training set
X_pred = X_pred[model_features.columns]
Let’s start loading our model so we can do some predictions, first we will set the tracking uri and get the experiment.
mlflow.set_tracking_uri("https://127.0.0.1:8080/")
experiment = mlflow.get_experiment('0')
print("Name: {}".format(experiment.name))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
print("Experiment ID: {}".format(experiment.experiment_id))
We can load the model based on the stage that model is in, i.e. none, stage, production or archived. In our scenario we have deployed our model into production based on the MLflow UI in the previous steps. So, we will load the model from the production stage by load_model method. load_model method takes one argument which is model_uri.
xgb_model_name = "roi_xgboost"
stage = 'Production'
xgb_reg_main = mlflow.xgboost.load_model(
model_uri=f"models:/{xgb_model_name}/{stage}")
Predictions on New?Dataset
Finally, we can make predictions on our prediction set.
XGBoost = xgb_reg_main.predict(xgb.DMatrix(X_pred))
debt_prob = pd.DataFrame(XGBoost, columns=['XGB_DEBT_PROBABILITY'])
Automate the Life Cycle
There are now 2 ways I recommend you can automate your model lifecycle based on using open source methods such as airflow or cron jobs. You can automate your training script to run weekly/monthly and you can automate your predictions to happen daily.
Airflow:
Cron Jobs:
Summary
In this article, we covered the basics of MLflow and how to use MLflow for managing the end-to-end machine learning lifecycle.
MLflow provides a powerful way to simplify deployment of machine learning models within the organisation by tracking, managing and deploying models. Further, MLflow facilitates reproducibility, meaning that the same training or production machine learning code is designed to execute with the same results regardless of environments, whether in the cloud, on a local machine, or in a notebook.
Framework: Jupyter Notebook, Language: Python, Libraries: os, sys, datetime, time, sklearn, pandas, numpy, xgboost, matplotlib, seaborn and mlflow.
Follow me on Medium — TechFitLab