MLOPS TASK 3
Ankit Kumar
DevOps| Terraform | Linux |Azure | Azure DevOps | AWS | RH294( Ansible) | Python | Docker | Kubernetes | Grafana | ELK | Prometheus
In the meantime of the Corona, the industries required employees that know the operation work as well as the machine learning work. This kind of culture in the industry where the Developer guys as well as the operation guys can work together. This kind of environment is called MLOPS.
Also, there is a requirement because most of the start fails because they do not have such employees that can perform both the task - developer as well as operation.
So collaboration is one solution to the above problem, where the DATA Science team and DevOps team will work together and perform the task in a much better way like deploy, monitor, and automatically train the model and put in the production system.
In Regard to this, our mentor Mr. Vimal Daga Sir has given us a task to create a model that will automatically find the accuracy and you can easily, and look out the model.
So I have tried to integrate some of the technology like Git, Jenkins, Docker, Github, and try to create such kind of model.
Finally, create a pipeline in which you can see the whole picture of the task.
Detail Explanation of the task.
Collectively the task has 5 jobs to do and create the required model.
1. Create container image that’s has Python3 and Keras or NumPy installed using docker file
2. When we launch this image, it should automatically start to train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1: Pull the Github repo automatically when some developers push the repo to Github.
5. Job2: By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the software required for the CNN processing).
6. Job3: Train your model and predict accuracy or metrics.
7. Job4: if metrics accuracy is less than 80%, then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created
9. Create One extra job job6 for monitor: If the container where the app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
So, let’s start and see how we reached our destination.
- It all starts with creating DockerFile. DockerFile has contains some basic and required images require to run the machine learning model like Python, Keras, TensorFlow, OpenCV, Python Interpreter.
- Next, it will create a docker image.
- Once, the docker image created, now its time to work on the Jenkins and create Jobs
Job 1
- In the 1 job, the git trigger will download the code from the GitHub and copy it to my base OS(RED HAT).
- I use Pool SCM, so that if any code changes it will download and copy the code.
JOB2
- This is the job in which the docker container is launched and the python code will run and accuracy will be saved in the accuracy.txt file.
As we make job changes, so job2 will run only when job1 will successfully complete.
GitHub Link of the Machine Learning Code:
https://github.com/ankiiitt/MLOPS-TASK-3/blob/master/MNIST%20machine%20learning%20code.txt
- The output of the second Task
JOB 3
- This is the most important part of this whole task. In this job, machine learning will change its hyperparameter such as filters, a number of layers to get more accuracy and it will continue until it will reach our expected value.
- It will check the accuracy and if the accuracy is less than 95%, then it will do some changes in the hyperparameter and reach that accuracy.
JOB 4
- In the job 4, after getting the desired accuracy, its time to get the email confirmation so that you will be notified about the accuracy and about your machine learning model. It is the way of monitoring.
CODE FOR THE EMAIL
So, after completing the job4 successfully, its time to run the job5.
- There are many reason behinds the failure of the container. Let us suppose, our main container will fail somehow, then our whole model will stop. So to prevent this problem, we create job5. In this job, if the container will fail somehow, it will launch the container again.
The log of Job5
Finally, after lots of hard work, the beautiful pipeline.
It shows the successfully completed the job.
I hope you will like this project. Thank you for your precious time.
The project is the integration of Linux, Jenkins, Machine Learning, Python, Git, Github.
Github link of the code: https://github.com/ankiiitt/MLOPS-TASK-3