Power Of ML-Ops
Shubham Bhalala
Data Scientist @System2 | MS Data Science @Columbia University | Data Science, MLOps Engineer
What is MLOps?
In layman terms when we integrate the concepts of Devops ( continuous development & continuous integration) on ML for model and testing it's accuracy automated. What exactly we will look in this article is how you can automatically create your machine learning model. By automatic, I mean, you don't have to manually adjust the hyper-parameters. What are hyper-parameters? Hyper-parameters are those variable units which we need to give to our program to train model, Ex.
- Number of Hidden layers.
- Number of neurons in one particular layer.
- What optimizer to use?
- What learning rate to give?
- The kernel size.
- Which pooling method to use, and many more things like these!
So, one of the fact that more than 60% of the Machine Learning Program or Project in corporate world are not implemented. This is a article for proof HERE. Now the question is why is this so?
Because the changing of these hyper-parameter is very huge and tedious task, because while training with one set of specification it might take days to train model and at the end we if don't get the satisfying results we again change the hyper-parameter and again train it. This will again take few days. This process goes on until you are not satisfied with the outcomes. Hence at last either it's too late for the production or the accuracy doesn't match our requirement, eventually ends in shutting down of project. This might create a huge loss for companies and it's reputation.
So, this article is a solution to it with one great example, a real life tested example by me that how can you make this thing automated and yield great accuracy! ??
Lets first discuss the requirement of this project. Even you can also follow the steps to build your own.
- Docker image for running CNN & ANN program.
- Docker image for running Regression & Classification program.
- Jenkins & Docker & Git-GitHub
So, lets talk about making the image for our requirement because we will be running our program on top of container hosted by docker. Here we will make one image using Dockerfile and other directly downloading from docker hub and modifying it according to our use case and even general use cases.
Docker image for Regression & Classification
FROM centos:latest RUN yum install epel-release -y &&\ yum update -y &&\ yum install python36 -y &&\ pip3 install sckit-learn &&\ pip3 install numpy &&\ pip3 install pandas &&\ pip3 install matplotlib &&\ pip3 install pillow &&\ yum update -y
This Dockerfile will help you to create your image. Command to build it is:
docker build -f lrcpython .
This will create a image with name lrcpython-linear regression classification python. For this you should be present in the directory where Dockerfile is made, for me I am in the same directory so, I have used " . " . You will see this image comes up. For detail description about Dockerfile, feel free to view my previous articles.
Now, lets have one image for CNN & ANN, basically we need Tensorflow & keras. So to make our task easy and not using much resources of our own, we will download it from docker hub. The image name is tensorflow/tensorflow. To download it use pull command.
docker pull tensorflow/tensorflow
A new image will be downloaded with support of tensorflow only. We manually need to download other libraries.
You will see this image come up. Now using the command launch one container of it and manually download it and then commit or else again use Dockerfile to create one image. Simple way is to go and manually download because this will save our time from troubleshooting. To launch container:
docker container run -it --name tensor tensorflow/tensorflow
This will land you inside the tensorflow container. Here using pip download the required modules.
pip install keras pip install numpy pip install pandas pip install pillow pip install scikit-learn
This will prepare your container for all of the CNN & ANN requirements.
Then commit this image:
docker container commit tensor tensorflow/tensorflow:v1
This will create one image with name tensorflow/tensorflow:v1
So, till now our one requirement is finished i.e. to have our images ready with us for training our ML Model. Now our main job starts to integrate this with Jenkins and Docker.
Before moving further, we will be using one concept of Transfer Learning, using the MobileNet to train our model on monkey breed. I have provided code on my github link you can download it from there.
What is transfer learning?
In my previous article, I have described this topic in very detail. Tho I will brief about it.
Transfer Learning is a way in which we can use the intelligence of the pre-trained model to train our new dataset. It is helpful as it requires very less resources and can give high accuracy in limited images. In this example you will see the power of it and how it becomes more robust when we integrate this with our Devops concepts.
For now we have the code with us and our docker images are with us. So let's start integrating all the things.
JOB1: download_classify
This job will download the code from GitHub and classify whether the code is for CNN or ANN or Regression/Classification, accordingly it will copy it one folder created. Later this folder will be treated as volume for the container.
So, my file system is like this: /mlops_project->/download_classify->/ann /cnn /lrc
I already have some files here because I have tested my setup. This is just to understand the file system.
Now create one repository in GitHub and initialize it, if you don't know much about git & github please visit my previous articles for it.
Here will be uploading out ML code, and from here everything will be automated and we will be getting our desired output.
This is how your repository looks like, obviously the name differs. Now lets create our job.
This are the configuration of the JOB1. This is the build script we used:
sudo cp -rf * /mlops_project/download_classify/ if sudo grep -r Conv2D * then fe=$(sudo grep -r Conv2D * | cut -d ":" -f 1) sudo cp -rf $fe /mlops_project/download_classify/cnn sudo docker container run -dit -v /mlops_project/download_classify/cnn:/root/ --name cnnmodel tensorflow/tensorflow:latest elif sudo grep -r Dense * && ! sudo grep -r Conv2D * then fe=$(sudo grep -r Dense * && ! sudo grep -r Conv2D * | cut -d ":" -f 1) sudo cp -rf $fe /mlops_project/download_classify/ann sudo docker container run -dit -v /mlops_project/download_classify/ann:/root/ --name annmodel tensorflow/tensorflow:latest elif sudo grep -r sklearn then fe=$(sudo grep -r sklearn | cut -d ":" -f 1) sudo cp -rf $fe /mlops_project/download_classify/lcr sudo docker container run -dit -v /mlops_project/download_classify/lcr:/root/ --name lcrmodel lcrpython else echo "We don't support this model" fi
Here I have used basic Linux command to classify whether it is ANN, CNN or LRC(Linear Regression or Classification).
sudo grep -r Conv2D *
This will search for Conv2D keyword in all the current directory( hence we have used -r for recursively ).
sudo grep -r Conv2D * | cut -d ":" -f 1
Here what so ever result we are getting from grep -r Conv2D we are giving it to cut -d ":" -f 1 command, what this will do is, cut command will only take the first part before the deliminator " : ". Because grep will give output as filename:lines where it found Conv2D so we just tool the first part which is filename which contains Conv2D will be in CNN, similarly if the code contains Dense but not Conv2D is classified under ANN and the file which has sklearn keyword can be classified under Regression or Classification. Since the ANN&CNN can be trained under same image we don't have to make one separate for each of them. Similarly for Regression or Classification.
Then we have mounted the folder where we will copy our code into the container using this commad:
sudo docker container run -dit -v /mlops_project/download_classify/cnn:/root/ --name cnnmodel tensorflow/tensorflow:latest
Here we have copied the file in the location /mlops_project/download_classify/cnn of our redhat vm and mounted it to the /root directory of the container. Hence we have given the ML program file to the container. Now we have to execute it. Internally at the end of the code we have to add a snippet of code which will get the validation accuracy and and check if it is less than 92 %, Our retrain job will change some hyper-parameter automatically and again train the code. This cycle goes on until be get accuracy greater than 92%.
JOB2: exec
Here we will execute our machine learning model code which we mounted inside the containers /root directory, using normal docker exec command. Here are the configuration of the JOB2.
Here is the build code.
if sudo docker ps | grep cnnmodel then fe=$(sudo grep -r Conv2D * | cut -d ":" -f 1) sudo docker container exec cnnmodel python /root/$fe elif sudo docker ps | grep annmodel then fe=$(sudo grep -r Dense * && ! sudo grep -r Conv2D * | cut -d ":" -f 1) sudo docker container exec annmodel python /root/$fe elif sudo docker ps | grep lcrmodel then fe=$(sudo grep -r sklearn * | cut -d ":" -f 1) sudo docker container exec lcrmodel python /root/$fe else echo "Something is wrong check container" fi
Here you can see first we are checking which container is running then we are executing the code using docker container exec cnnmodel pyhton /root/$fe here $fe contains the python code name. Similarly for all the models.
JOB3: retrain
Here what we want to achieve is, if the accuracy is not greater than 92 we change some parameter and train it again. For this we have one pre-requisite. Our ML Program should have this below mentioned code.
final_accuracy=history.history["val_accuracy"][-1] print(final_accuracy) import os if final_accuracy < 0.92: os.system("curl --user '<jenkins user>:<jenkins password>' https://192.168.99.102:8080/view/mlops/job/retrain/build?token=retrain") else: print("Your New accuracy=",final_accuracy)
Here you can see we are getting the accuracy and comparing it, if its less than 92, then we have to trigger the retrain job using token. For that we have to generate token also. Trigger URL might be different for you and me, as I am doing it in my different list view, that's why it's showing some extra things.
These is the job configuration.
Here In build code I have used basic Linux operations to do this task, which increases the hidden layer with same configuration. Here is the code:
if sudo docker ps | grep cnnmodel then ln=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 2) lt=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 3) fi=$(sudo grep -r "Dense(1024,activation='relu')" * | cut -d ":" -f 1) let "ln+=1" hp=" top_model = Dense(512,activation='relu')(top_model)" sudo sed -i -e $ln'i\\'"$hp" /mlops_project/download_classify/cnn/$fi sudo sed -i -e $ln'i'\\"$lt" /mlops_project/download_classify/cnn/$fi fe=$(sudo grep -r Conv2D * | cut -d ":" -f 1) sudo docker container exec cnnmodel python /root/$fe elif sudo docker ps | grep annmodel then ln=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 1) lt=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 2) fi=$(sudo grep -r "Dense(1024,activation='relu')" * | cut -d ":" -f 1) sed -i -e $ln'i\\'"$hp" /mlops_project/download_classify/cnn/$fi sed -i -e $ln'i\\'"$lt" /mlops_project/download_classify/cnn/$fi fe=$(sudo grep -r Dense * && ! sudo grep -r Conv2D * | cut -d ":" -f 1) sudo docker container exec annmodel python /root/$fe else echo "Something is wrong with the code" fi
What new thing I have used is sed command, this helps to insert line at particular position. So here I am increasing my hidden layer that is Dense layer. When I will show you the results you will come to know just by little tweak in this existing code we will achieve accuracy which normally people don't achieve using the same code. I have tested and seen multiple code where they have got around 92-93% validation accuracy. So lets see what our setup predicts.
At last we will have one monitoring job on the running container, it checks every minute, if the, container goes down it will restart it and save the model, as the code is written in such a way that every time it updates the weight and store in file.
JOB4: monitor
There is nothing fancy in this, we can use kubernetes cluster for deployment also, which makes our work simpler.
Here is the build code.
if sudo grep -r Conv2D /var/lib/jenkins/workspace/download_classify/* then if sudo docker container ls | grep cnnmodel then echo "Alright" else sudo docker container run -dit -v /mlops_project/download_classify/cnn:/root/ --name cnnmodel tensorflow/tensorflow:latest fi elif sudo grep -r !Conv2D /var/lib/jenkins/workspace/download_classify/* && Dense /var/lib/jenkins/workspace/download_classify/* then if sudo docker container ls | grep annmodel then echo "Alright" else sudo docker container run -dit -v /mlops_project/download_classify/ann:/root/ --name annmodel tensorflow/tensorflow:latest fi elif sudo grep -r sklearn /var/lib/jenkins/workspace/* then if sudo docker container ls | grep lrcmodel then echo "Alright" else sudo docker container run -dit -v /mlops_project/download_classify/lrc:/root/ --name lrcmodel lrcpython fi else echo "We don't recognize this coding language" fi
Overall scenario of JOB scheduling
download_classify---Downstream--->exec---Trigger--->retrain---Downstream--->monitor
Finally, I am very excited to show you the results which people are not getting normally in this dataset and how I got that much accuracy in just one small tweak.
Accuracy of the by default script which we get on internet is:
Code:
Here you can see they have used Three Dense layer, two with 1024 neurons and one with 512 neurons. We are going to tweak this using our MLOps power.
After training you might get some result! I am showing you how to use our created MLOps setup. I am running this and showing you just to have a comparison. In my case first time the accuracy was around 94%.
Now these are some results of our creation. By default while uploading to github I have kept just one dense layer.
And as soon as I commit it my JOB1 starts:
Then JOB2 automatically starts and start training our model, here I have already given the dataset using WinSCP because this big file can't be uploaded to GitHub.
You can see here after JOB1, JOB2 started and started training, you can see each epoch in detail from Console Output.
So, by one Dense layer we have achieved accuracy greater then 92 %, we got around 94% just on one Dense layer. So now I you have two way either manually trigger the retrain JOB or increase the threshold value from 92 to 95. Here once I have done manually to show you that it actually adds one Dense layer with indentation and runs succesfully.
You can see at the end we have two Dense layer with 1024 neurons, previous we only had one. Now we have two! Tho I manually triggered it. After this I will change the threshold and it will automatically trigger the retrain, add one Dense layer with 1024 neurons and give us the accuracy.
You can see here that the accuracy decreased but still above 92% and it even triggered the monitor job. Now lets change the threshold and see the results. Now we have modified the requirement to have compulsorily have one Dense layer with 512 neurons and one which we added by default. So the changes look something like this.
if sudo docker ps | grep cnnmodel then ln=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 2) lt=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 3) fi=$(sudo grep -r "Dense(1024,activation='relu')" * | cut -d ":" -f 1) let "ln+=1" hp=" top_model = Dense(512,activation='relu')(top_model)" sudo sed -i -e $ln'i\\'"$hp" /mlops_project/download_classify/cnn/$fi sudo sed -i -e $ln'i'\\"$lt" /mlops_project/download_classify/cnn/$fi fe=$(sudo grep -r Conv2D * | cut -d ":" -f 1) sudo docker container exec cnnmodel python /root/$fe elif sudo docker ps | grep annmodel then ln=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 1) lt=$(sudo grep -n -r "Dense(1024,activation='relu')" * | cut -d ":" -f 2) fi=$(sudo grep -r "Dense(1024,activation='relu')" * | cut -d ":" -f 1) sed -i -e $ln'i\\'"$hp" /mlops_project/download_classify/cnn/$fi sed -i -e $ln'i\\'"$lt" /mlops_project/download_classify/cnn/$fi fe=$(sudo grep -r Dense * && ! sudo grep -r Conv2D * | cut -d ":" -f 1) sudo docker container exec annmodel python /root/$fe else echo "Something is wrong with the code" fi
This is our final modified code. Threshold is still 92%.
Here you can see we have added one Dense layer with 1024 and other with 512 neurons. Now let's see what our setup predicts.
Finally we achieved 94% accuracy!???
Remember here we didn't do anything manual, just to show you all how the retrain works I have done somethings manually, otherwise we were luck to have such a great accuracy at the beginning it self! This is the core and main reason to choose MLOps, because you just have to commit the code, rest all changes our MLOps setup will do for us.
SDE @Amazon | Ex-SDE Intern @Amazon | BTech CSE'22 @UPES
4 年Hi Shubham!!! It was nice to see you work hard...But I would like to add two things which you could work on... 1) You worked only on improving accuracy in fully comnected layers, it will be good if you are able to improve the model by adding Conv2d+relu+maxpooling layers if its required too...2) You can upload large dataset in github too, you will have to use git bash/gitkraken these types of tools...