ML + DEV + OPS

ML + DEV + OPS

According to Deeplearnig.ai 60-90 percent of machine learning learning projects are never implemented and only 22 percent companies have successfully deployed their ML projects


No alt text provided for this image


So, your company decided to invest in machine learning. You have a talented team of Data Scientists churning out models to solve important problems that were out of reach just a few years ago. All performance metrics are looking great, the demos cause jaws to drop and executives to ask how soon you can have a model in production.

It should be pretty quick, you think. After all, you already solved all the advanced scienc-y, math-y problems, so all that’s left is routine IT work. How hard can it be?

Pretty hard, it turns out. deeolarning.ai reports that “only 22 percent of companies using machine learning have successfully deployed a model”. What makes it so hard? And what do we need to do to improve the situation?


Challenges

In the world of traditional software development, a set of practices known as DevOps have made it possible to ship software to production in minutes and to keep it running reliably. DevOps relies on tools, automation and workflows to abstract away the accidental complexity and let developers focus on the actual problems that need to be solved. This approach has been so successful that many companies are already adept at it, so why can’t we simply keep doing the same thing for ML?

The root cause is that there’s a fundamental difference between ML and traditional software: ML is not just code, it’s code plus data. An ML model, the artifact that you end up putting in production, is created by applying an algorithm to a mass of training data, which will affect the behavior of the model in production. Crucially, the model’s behavior also depends on the input data that it will receive at prediction time, which you can’t know in advance.

Thus, ML Ops can be defined by this intersection:

No alt text provided for this image

Thus, we could define ML Ops as follows:

ML Ops is a set of practices that combines Machine Learning, DevOps and Data Engineering, which aims to deploy and maintain ML systems in production reliably and efficiently.


Let’s now see what this actually means in more detail, by examining the individual practices that can be used to achieve ML Ops' goals.

No alt text provided for this image

This is the work flow in creating and deploying a machine learning model.


This article is an example on an mlops use case in real life.


In this article we will integrate ml code and jenkins to increase the accuracy of the model without doing it manually.

Yes we can do it, that the automation in machine learning buddy which is full filled by devops.

SO let's start:

Problem statement...


1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile 

2. When we launch this image, it should automatically starts train the model in the container.

3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins 

4. Job1 : Pull the Github repo automatically when some developers push repo to Github.

5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).

6. Job3 : Train your model and predict accuracy or metrics.

7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.

8. Job5: Retrain the model or notify that the best model is being created

9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.


Now lets pull everything from scratch.


1.Creating a docker image.

Here we are creating a docker image that contains the respective python modules for processing our deeplearning model like keras, tensorflow etc.

No alt text provided for this image

This is the dockerfile to build our deeplearning docker image .

When we run the docker image this automatically searches our deeplearning model from the folder that we have mounted and starts training it.

To build the image type the below command

docker build -t name .

Remember, you have to type this command inside the folder where dockerfile is present.

this creates the required image.

2.creating Job1.

In this job we have integrate our github repository with jenkins and it works in such a way that when you push your DL model it automatically copies it to the host folder where jenkins is running.

Note here jenkins is running on redhat 8 and that is the host.

No alt text provided for this image

In Repository URL give your git repository url.

For build trigger we can use "GitHub hook trigger for GITScm polling"

No alt text provided for this image

Before this we have to add webhook in our repository.

You can get this from my previous article in the "creating job1" section, link:

Now we have to execute a shell code for copying of the DL code from github repository.

No alt text provided for this image

Thus job1 is created.

Console Output.

No alt text provided for this image


3.Creating Job2.

In this job, By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).

Here is the DL model file code.

from keras.datasets import mnist
dataset = mnist.load_data('mymnist.db')
train , test = dataset
X_train , y_train = train
X_test , y_test = test
X_train_1d = X_train.reshape(-1 , 28*28)
X_test_1d = X_test.reshape(-1 , 28*28)
X_train = X_train_1d.astype('float32')
X_test = X_test_1d.astype('float32')
from keras.utils.np_utils import to_categorical
y_train_cat = to_categorical(y_train)
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(units=512, input_dim=28*28, activation='relu'))
model.add(Dense(units=256, activation='relu'))
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
from keras.optimizers import RMSprop
model.compile(optimizer=RMSprop(), loss='categorical_crossentropy', metrics=['accuracy'])
model = model.fit(X_train, y_train_cat, epochs=2)
acc=model.history['accuracy'][-1:][0]
acc=acc*100
import sys
acc_file = open('acc.txt', w)
print(acc , file=acc_file)
acc_file.close()
No alt text provided for this image

In this model we have used mnist dataset which contains image of handwritten numbers.

We have created neural network from Dense function and after the model is trained the accuracy is printed in the file acc.txt

So lets configure it.

No alt text provided for this image

This job has to start building after job1.

No alt text provided for this image

the shell commands in the above image start the docker container from the above created docker image.

Here we have mounted the folder in which github files are copied.

Console Output

No alt text provided for this image
No alt text provided for this image

In the console output you can see the DL file content that is been used in this job.

No alt text provided for this image

This is the acc.txt file

This ends job2.

4.Creating job3.

In this job we will test the accuracy of the trained model.

No alt text provided for this image

This job is triggered after job2 is successfully built.

No alt text provided for this image

This ends job3

5.Creating Job4.

In this job we have to check the accuracy and if it is less than 85% we have to tweak the model by adding some more extra "dense layers" to our model.

No alt text provided for this image
No alt text provided for this image

I have used the below command to add the dense layer.

sudo sed "18imodel.add(Dense(units=32, activation='relu'))" /root/mlops_task3/mnist.py > mnist1.py

The changed file content.

No alt text provided for this image

Refer it to the code in job2 to see the changes.

Console output

No alt text provided for this image
No alt text provided for this image

6.Creating job5.

Here we are retraining the model that we have tweaked and checking whether the accuracy is above 85%.

No alt text provided for this image
No alt text provided for this image


Email Notification

No alt text provided for this image
No alt text provided for this image

Console Output

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Thus we have achieved the required accuracy.

No alt text provided for this image

The email was triggered by another job which runs after stable build of job5.

7.Creating Job6.

This job monitors the docker container that we have built and relaunches if not present.

No alt text provided for this image


Pipeline View of our Project

No alt text provided for this image

You can see how to build the build pipeline from the previous article, link already provided above.



Thus we have achieved the use case of mlops


Thank You

要查看或添加评论,请登录

社区洞察

其他会员也浏览了