ML + DEV + OPS
Yashwanth M B
SRE @ Cisco | Mlops | Devops AL | Hybrid Multi Cloud | Ansible | Flutter | AWS | GCP | Docker | Python3
According to Deeplearnig.ai 60-90 percent of machine learning learning projects are never implemented and only 22 percent companies have successfully deployed their ML projects
So, your company decided to invest in machine learning. You have a talented team of Data Scientists churning out models to solve important problems that were out of reach just a few years ago. All performance metrics are looking great, the demos cause jaws to drop and executives to ask how soon you can have a model in production.
It should be pretty quick, you think. After all, you already solved all the advanced scienc-y, math-y problems, so all that’s left is routine IT work. How hard can it be?
Pretty hard, it turns out. deeolarning.ai reports that “only 22 percent of companies using machine learning have successfully deployed a model”. What makes it so hard? And what do we need to do to improve the situation?
Challenges
In the world of traditional software development, a set of practices known as DevOps have made it possible to ship software to production in minutes and to keep it running reliably. DevOps relies on tools, automation and workflows to abstract away the accidental complexity and let developers focus on the actual problems that need to be solved. This approach has been so successful that many companies are already adept at it, so why can’t we simply keep doing the same thing for ML?
The root cause is that there’s a fundamental difference between ML and traditional software: ML is not just code, it’s code plus data. An ML model, the artifact that you end up putting in production, is created by applying an algorithm to a mass of training data, which will affect the behavior of the model in production. Crucially, the model’s behavior also depends on the input data that it will receive at prediction time, which you can’t know in advance.
Thus, ML Ops can be defined by this intersection:
Thus, we could define ML Ops as follows:
ML Ops is a set of practices that combines Machine Learning, DevOps and Data Engineering, which aims to deploy and maintain ML systems in production reliably and efficiently.
Let’s now see what this actually means in more detail, by examining the individual practices that can be used to achieve ML Ops' goals.
This is the work flow in creating and deploying a machine learning model.
This article is an example on an mlops use case in real life.
In this article we will integrate ml code and jenkins to increase the accuracy of the model without doing it manually.
Yes we can do it, that the automation in machine learning buddy which is full filled by devops.
SO let's start:
Problem statement...
1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.
7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created
9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
Now lets pull everything from scratch.
1.Creating a docker image.
Here we are creating a docker image that contains the respective python modules for processing our deeplearning model like keras, tensorflow etc.
This is the dockerfile to build our deeplearning docker image .
When we run the docker image this automatically searches our deeplearning model from the folder that we have mounted and starts training it.
To build the image type the below command
docker build -t name .
Remember, you have to type this command inside the folder where dockerfile is present.
this creates the required image.
2.creating Job1.
In this job we have integrate our github repository with jenkins and it works in such a way that when you push your DL model it automatically copies it to the host folder where jenkins is running.
Note here jenkins is running on redhat 8 and that is the host.
In Repository URL give your git repository url.
For build trigger we can use "GitHub hook trigger for GITScm polling"
Before this we have to add webhook in our repository.
You can get this from my previous article in the "creating job1" section, link:
Now we have to execute a shell code for copying of the DL code from github repository.
Thus job1 is created.
Console Output.
3.Creating Job2.
In this job, By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
Here is the DL model file code.
from keras.datasets import mnist dataset = mnist.load_data('mymnist.db') train , test = dataset X_train , y_train = train X_test , y_test = test X_train_1d = X_train.reshape(-1 , 28*28) X_test_1d = X_test.reshape(-1 , 28*28) X_train = X_train_1d.astype('float32') X_test = X_test_1d.astype('float32') from keras.utils.np_utils import to_categorical y_train_cat = to_categorical(y_train) from keras.models import Sequential from keras.layers import Dense model = Sequential() model.add(Dense(units=512, input_dim=28*28, activation='relu')) model.add(Dense(units=256, activation='relu')) model.add(Dense(units=128, activation='relu')) model.add(Dense(units=32, activation='relu')) model.add(Dense(units=10, activation='softmax')) from keras.optimizers import RMSprop model.compile(optimizer=RMSprop(), loss='categorical_crossentropy', metrics=['accuracy']) model = model.fit(X_train, y_train_cat, epochs=2) acc=model.history['accuracy'][-1:][0] acc=acc*100 import sys acc_file = open('acc.txt', w) print(acc , file=acc_file) acc_file.close()
In this model we have used mnist dataset which contains image of handwritten numbers.
We have created neural network from Dense function and after the model is trained the accuracy is printed in the file acc.txt
So lets configure it.
This job has to start building after job1.
the shell commands in the above image start the docker container from the above created docker image.
Here we have mounted the folder in which github files are copied.
Console Output
In the console output you can see the DL file content that is been used in this job.
This is the acc.txt file
This ends job2.
4.Creating job3.
In this job we will test the accuracy of the trained model.
This job is triggered after job2 is successfully built.
This ends job3
5.Creating Job4.
In this job we have to check the accuracy and if it is less than 85% we have to tweak the model by adding some more extra "dense layers" to our model.
I have used the below command to add the dense layer.
sudo sed "18imodel.add(Dense(units=32, activation='relu'))" /root/mlops_task3/mnist.py > mnist1.py
The changed file content.
Refer it to the code in job2 to see the changes.
Console output
6.Creating job5.
Here we are retraining the model that we have tweaked and checking whether the accuracy is above 85%.
Email Notification
Console Output
Thus we have achieved the required accuracy.
The email was triggered by another job which runs after stable build of job5.
7.Creating Job6.
This job monitors the docker container that we have built and relaunches if not present.
Pipeline View of our Project
You can see how to build the build pipeline from the previous article, link already provided above.
Thus we have achieved the use case of mlops
Thank You