MLOps : Integrating Machine Learning with DevOps

MLOps : Integrating Machine Learning with DevOps

Around 90% of the machine learning models are created but are never deployed. Integrating Machine Learning with DevOps provides us a solution for overcoming such failure. In Machine Learning we have to generally find hyper-parameters for training a model but finding them manually is very tiring as it is totally a hit and trial process and sometimes we also do not get the desired accuracy even after a large number of trials. That is why most of the models fail.

In Machine Learning, a hyperparameter is a parameter whose value is set before the learning process begins.

This is an article taken from a famous website of data science.

No alt text provided for this image

I have created a small project by integrating Machine Learning with DevOps to make the model training process of Machine Learning automatic. This model does the hit and trial for the hyper-parameters by its own and makes the complete process automatic till a desired accuracy is achieved. I have used Convolution Neural Network (CNN) for training my model.

The architecture of CNN is depicted by the below picture

No alt text provided for this image

Here, is the link to my GitHub Repository

OUTLINES

1. Creating a container image that has the desired setup which will be required for training of your model installed using Dockerfile.

2. Creating a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins. 

3. # Job1 : Pull the Github repo automatically when some developers push repo to Github.

4. # Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).

5. # Job3 : Train your model and predict accuracy or metrics.

6. # Job4 : If metrics accuracy is less than 80% , then tweak the machine learning model architecture.

7. # Job5: Retrain the model or notify that the best model is being created

8. Creating one extra job # Job6 for monitor : If container where model is being trained fails due to any reason then this job should automatically start the container again from where the last trained model left

DESCRIPTION

I have taken a Dog and Cat Dataset for training my model. This dataset can be anything depending on what you want your machine learning model to be trained for.

  • You have to create a container image which contains all the softwares and libraries necessary to train your model. I have used a CNN model and created a container image as
No alt text provided for this image
  • After you have created the image you have to build it by using the command shown
No alt text provided for this image
  • Create a Job chain of Job1, Job2, Job3, Job4 and Job5 using Build Pipeline Plugin in Jenkins.

JOB 1 : Pulling the Github repo automatically when some developers push repo to Github.

  • For automatically performing this action you need a public IP for Jenkins. If it is available to you can use it but if it is not available you can use a software i.e, ngrok to provide your Jenkins with pubic IP.
No alt text provided for this image
  • Create a Webhook in the GitHub repository where the code has to be pushed by the developer.
No alt text provided for this image
  • Create a Job in Jenkins which will take the code from the GitHub repo when the developer pushes it and copies the code to a folder in our O.S.
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

JOB 2 : By looking at the code or program file, Jenkins should automatically start the respective container

  • This Job is triggered when the first Job is successfully built.
No alt text provided for this image
  • You have to check the code to know which kind of code has the developer uploaded. For checking this you can go inside the file uploaded by the developer and search for a specific keyword that is used only for a specific kind of model. I used a keyword as 'Convolution2D' as i used a CNN model.
No alt text provided for this image

JOB 3 : Train your model and predict accuracy or metrics.

  • This job runs only after the previous Job is built
No alt text provided for this image
  • You have to start training your model inside your container and predict and extract the accuracy of the model for further use.
No alt text provided for this image
  • My model achieved an accuracy of around 65% when the hyper-parameters which the developer has given in the code were used.
No alt text provided for this image

JOB4 : If metrics accuracy is less than 80% , then tweak the machine learning model architecture.

  • Job4 runs after successful build of Job3
No alt text provided for this image
  • You have to compare the accuracy of the trained model with whatever percentage you want your model to achieve. And if the model has an accuracy of less than the desired percentage then it would tweak i.e, the model will train again with different hyper-parameters and this process will go on until the desired accuracy is achieved. My desired accuracy was 80% so i used 80 to compare.
No alt text provided for this image

I have changed the hyper-parameters only once but you can change it as many times you want to and according to your convenience. All the conditions are to be written once and the model will tweak according to the given conditions till the time the desired accuracy is achieved.

Here i have made the job to fail after it has reached accuracy greater than 80% so that it does not go to the third job again i.e, it does not tweak again.

No alt text provided for this image
  • I got an accuracy of around 81% by using the new hyper-parameters
No alt text provided for this image

JOB 5 : Retrain the model or notify that the best model is being created

  • This Job will only run when Job 4 fails otherwise it would keep on tweaking and go to Job 3.
No alt text provided for this image
  • You can send a email to the developer or whomsoever you wish to tell that your model has been trained successfully achieving an accuracy of more than 80%.
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
  • As the achieved accuracy is more than 80% mail has been sent.
No alt text provided for this image

JOB 6 : If container where app is running fails due to any reason then this job should automatically start the container again from where the last trained model left.

  • You can also create a Job for monitoring the container so that it would restart the container whenever it fails due to any reason. You can set a time for the Job to build i.e, you can set after what time you want your Job to go to the container and check for its working.
No alt text provided for this image
No alt text provided for this image

CONCLUSION

You can make the training of your machine learning model especially the model that requires hyper-parameter fully automatic and achieve your desired accuracy without changing the hyper-parameters again and again manually.

Thank you for reading ;)

要查看或添加评论,请登录

Priyanshi Kaila的更多文章

社区洞察

其他会员也浏览了