MLOps : Integrating Machine Learning with DevOps
Around 90% of the machine learning models are created but are never deployed. Integrating Machine Learning with DevOps provides us a solution for overcoming such failure. In Machine Learning we have to generally find hyper-parameters for training a model but finding them manually is very tiring as it is totally a hit and trial process and sometimes we also do not get the desired accuracy even after a large number of trials. That is why most of the models fail.
In Machine Learning, a hyperparameter is a parameter whose value is set before the learning process begins.
This is an article taken from a famous website of data science.
I have created a small project by integrating Machine Learning with DevOps to make the model training process of Machine Learning automatic. This model does the hit and trial for the hyper-parameters by its own and makes the complete process automatic till a desired accuracy is achieved. I have used Convolution Neural Network (CNN) for training my model.
The architecture of CNN is depicted by the below picture
Here, is the link to my GitHub Repository
OUTLINES
1. Creating a container image that has the desired setup which will be required for training of your model installed using Dockerfile.
2. Creating a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins.
3. # Job1 : Pull the Github repo automatically when some developers push repo to Github.
4. # Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
5. # Job3 : Train your model and predict accuracy or metrics.
6. # Job4 : If metrics accuracy is less than 80% , then tweak the machine learning model architecture.
7. # Job5: Retrain the model or notify that the best model is being created
8. Creating one extra job # Job6 for monitor : If container where model is being trained fails due to any reason then this job should automatically start the container again from where the last trained model left
DESCRIPTION
I have taken a Dog and Cat Dataset for training my model. This dataset can be anything depending on what you want your machine learning model to be trained for.
- You have to create a container image which contains all the softwares and libraries necessary to train your model. I have used a CNN model and created a container image as
- After you have created the image you have to build it by using the command shown
- Create a Job chain of Job1, Job2, Job3, Job4 and Job5 using Build Pipeline Plugin in Jenkins.
JOB 1 : Pulling the Github repo automatically when some developers push repo to Github.
- For automatically performing this action you need a public IP for Jenkins. If it is available to you can use it but if it is not available you can use a software i.e, ngrok to provide your Jenkins with pubic IP.
- Create a Webhook in the GitHub repository where the code has to be pushed by the developer.
- Create a Job in Jenkins which will take the code from the GitHub repo when the developer pushes it and copies the code to a folder in our O.S.
JOB 2 : By looking at the code or program file, Jenkins should automatically start the respective container
- This Job is triggered when the first Job is successfully built.
- You have to check the code to know which kind of code has the developer uploaded. For checking this you can go inside the file uploaded by the developer and search for a specific keyword that is used only for a specific kind of model. I used a keyword as 'Convolution2D' as i used a CNN model.
JOB 3 : Train your model and predict accuracy or metrics.
- This job runs only after the previous Job is built
- You have to start training your model inside your container and predict and extract the accuracy of the model for further use.
- My model achieved an accuracy of around 65% when the hyper-parameters which the developer has given in the code were used.
JOB4 : If metrics accuracy is less than 80% , then tweak the machine learning model architecture.
- Job4 runs after successful build of Job3
- You have to compare the accuracy of the trained model with whatever percentage you want your model to achieve. And if the model has an accuracy of less than the desired percentage then it would tweak i.e, the model will train again with different hyper-parameters and this process will go on until the desired accuracy is achieved. My desired accuracy was 80% so i used 80 to compare.
I have changed the hyper-parameters only once but you can change it as many times you want to and according to your convenience. All the conditions are to be written once and the model will tweak according to the given conditions till the time the desired accuracy is achieved.
Here i have made the job to fail after it has reached accuracy greater than 80% so that it does not go to the third job again i.e, it does not tweak again.
- I got an accuracy of around 81% by using the new hyper-parameters
JOB 5 : Retrain the model or notify that the best model is being created
- This Job will only run when Job 4 fails otherwise it would keep on tweaking and go to Job 3.
- You can send a email to the developer or whomsoever you wish to tell that your model has been trained successfully achieving an accuracy of more than 80%.
- As the achieved accuracy is more than 80% mail has been sent.
JOB 6 : If container where app is running fails due to any reason then this job should automatically start the container again from where the last trained model left.
- You can also create a Job for monitoring the container so that it would restart the container whenever it fails due to any reason. You can set a time for the Job to build i.e, you can set after what time you want your Job to go to the container and check for its working.
CONCLUSION
You can make the training of your machine learning model especially the model that requires hyper-parameter fully automatic and achieve your desired accuracy without changing the hyper-parameters again and again manually.
Thank you for reading ;)