ML-DevOps Integration
Machine Learning gives power to the computers and is powered by DevOps.
When DevOps tools are integrated with Machine Learning we can actually express its real potential. The following project which integrates the tolls like Git, Docker and Jenkins with Machine Learning is a great example to this.
In this project we first create a container image using Dockerfile in our BaseOS rhel 8 that’s has python3 , keras ,tensorflow and pandas installed on top of a preloaded docker image of centos , inorder to run our CNN model inside it which is written in python3 and trains the ML model using keras which uses tensorflow at backend.
Now we have created our own image named mlos:v1 which has all the necessary libraries installed for running our python code. This image can launch various containers reducing our efforts to create the environment every time we want to use it. It works exactly like an independent O.S. When we launch this image, it should automatically start training the model in the container.
Secondly we create a job chain of JOB_1, JOB_2, JOB_3, JOB_4 and JOB_5 using build pipeline plugin in Jenkins.
JOB_1 :
Pulls the Github repo (link -https://github.com/mansigautam777/machine_learning.git ) automatically when some developers push repo to Github.
In the repository named machine_learning the CNN code is written in the cnn1.py file.
In the Source Code Management we copy the Github repo link from where we want to pull the code.
Then we build the triggers using the Poll SCM which schedules the pull operation to every minute. Therefore it downloads the code from Github after every minute to check for changes in it.
Next we give functionality to the Jenkins job to copy the code inside a folder accuracy in the BaseOS rhel 8.
The JOB_1 successfully downloads the code to the folder.
JOB_2 :
JOB_2 has a Build Trigger attached to it which allows its execution only if JOB_1 was successful.
In this job Jenkins should automatically start the image container which has the machine learning softwares installed in it ie. mlos:v1 to deploy code and start training if the container is not running already.
The job runs the python code, trains the model with all the epochs and predicts the accuracy of the model successfully.
The accuracy after all the epochs comes out to be 98.01%.
JOB_3 :
If accuracy is less than 90% , then tweak the machine learning model architecture to start training the model again else print the accuracy .
JOB_4 :
In this we notify the user if the build was a failure ie. that the JOB_3 was not built successfully using the Email notifications under the Post-build Actions.
As the JOB_3 was successful the JOB_4 did not send any notification and was build successfully.
JOB_5:
In this job if the container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
- As the container is running smoothly JOB_5 builds successfully.
Finally we the whole Job chain can be analyzed using the Build Pipeline in Jenkins.
SDE II at Amazon
4 年Nice work!
Student at St. Marys Convent High School
4 年Good point