Integration of Machine learning with Devops : MLOps task3
Divya Raj Lavti
Experienced IT Project Manager | Expert in Agile & Scrum, Risk Management, IT Infrastructure | Cloud Migration Specialist
In the era of automation and fast learning everybody want fast working and integrated working of projects specially in MachineLearning .
In the industry currently a lot of people know the ML and there is a few separate group of people who understand DevOps well. But have you ever thought what we can accomplish, how much time we can save if these were to combine somehow.
We all know how tedious and tiring it can be to keep tuning or tweaking the hyperparameters of an ML model manually in order to achieve the right accuracy. This is possibly the reason approximately 60% of the indsustry projects never get executed or implemented. To solve this issue of speed and quality or performance of a ML model i think i might have just come up with a solution.
Prerequisites to understand what is going in this project :
- Git
- Github
- Docker
- Jenkins
- Linux OS
- Shell/Unix commands
- ML,CNN,Deep learning basics,Keras
Steps to be followed :to implement this project
1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.
7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created
9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left
So lets start :
I used Linux based RHEL8 operating system in this project.
for neat and clear work ,i created a separate folder using these linux command and jump into that particular folder for future work
mkdir /mlt3 cd /mlt3
you can create a separate folder to download codes from github but i used this one.
Now, we create a Dockerfile for creating an image for running our CNN model through Docker container.
Hence, while saving this Dockerfile you need to save this with this very name.
After saving it we need to build the image using this command :
# docker build -t <tag_name>:<version_tag> .
Take a note of the dot ( . ) in the end which means all files.
Hence we are now set to move ahead and create the Jenkins jobs.
Job1 ( Github pull ) :
This job basically pulls the code from Github and downloads it on the local OS where Jenkins is installed.
This setting automatically pulls and downloads the Github code whenever there is a newer version of the code available.
Job2 ( Check code ) :
By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the software required for the CNN processing).
This job will run after the successful launch of job1 as we enabled the trigger after stable build of Job1. Now using the above image a container will be launched if the code belongs to CNN ( if one is already not running ).
Point to note is that it can be any code you wish. Here by chance I’m considering a CNN code to show the power of this setup how finely it performs or tunes the hyperparameters for me until desired accuracy is achieved.
Job3 ( Train and Predict Accuracy ):
This job will automatically trigger after the successful completion of job2. For the training model, I am using as basic a program as LeNet model and i train it on the MNIST dataset ( as we know exactly how great LeNet model performs on the MNIST dataset ).
But before training and predicting, I have intentionally made a few changes to the code (i.e. tweaked the hyperparameters here and there as i want my jenkins to automatically retrain untile the desired accuracy is achieved in the next step.
Job 4 ( Retrain ) :
This job is triggered automatically after the successful completion of the Job3. Basically in this job we read the output to a variable from the accuracy.txt file using the Shell command. Now, using Conditional and loop statements we ask the job to run the retrain.py and new.py files.
About the files :
- accuracy.txt : This file reads and writes the accuracy obtained from the base model that we ran in the previous step.
- retrain.py : This is an intermediate file that reads the input file (main file) into a variable and reads the code line by line as it starts building an output file correspondingly with the added code. The added code in my case is the layers ( Conv2D and MaxPooling ) you can also add the Hidden/ Dense layers using this approach.
- Finally all these lines are written to a new output file ( new.py in my case ).
- You can rewrite over the same input file if you want.
- new.py : This file is just the inout file plus the appended code in the retrain.py file.
Here as you can see, if the accuracy is not matched then another layer is added. You can add as many layers you want. After which i retrain it by changing the epochs as we know the more epochs the better the model predicts.
PS:
- You can tweak any hyperparameter this way such as neurons in the Dense layer, Activation function etc )
- Syntax used here is too critical and sensitive. I had a real struggle while creating this job ( Required almost 52 builds )
But i would like to take your attention to a particular Unix command which adds to the existing epochs here
sudo sed -i /^epochs=*/a epochs=epochs+1 /root/mlt3/new.py
This command has various components that need to be understood
- sed -i : It is a UNIX command used to insert into an existing file
- /^ : this is used to search for the line/keyword after which you would like to insert
- */ : everything in the current directory that ends on a /. Usually files don't end on /, but ending a path with a slash means you specified a directory, so this only means directories.
- /root/mlt3/new.py : new.py is the file in which the changes are to be made
Finally, if the desired accuracy is achieved then only we see the next job being triggered otherwise this build fails.
Job5 ( Notification ) :
This job runs on successful completion of the previous job. It basically sends the mail to the developer. This can be done by the Email Notification option within Manage Jenkins or some plugin such as Job Direct Mail Plugin available on the Plugins section.
But to keep it simple I have used a very simple python program using smtplib and ssl libraries exactly what can be very easily done using Email Notification feature of Jenkins.
Hence, this job reruns the responsible container if it stops due to any reason.
Results obtained :
Now we would like to follow with the output part because creating the jobs is not enough. In this section I’ll show you what results i obtained after each step for you to link better with what i’m trying to do here.
So as seen in the previous section, as soon as the developer commits to changes and pushes to the Github we see the Jenkins automatically downloads the files to the local directory.
Now we see the Jenkins jobs building automatically one after the other.here is pipeline view
Here a few insights on how the code executes in job3.
Now the Job5 is triggered and a corresponding notification is sent to the developer regarding the success of the job.
here is Github repo link :- https://github.com/drlraj2805/mlt3.git
Thanks for the read guys !