Automation of Machine Learning model training by using DEVOPS,increasing accuracy of model by tweaking hyerparameter automatically
INTRODUCTION:
>>What is Machine Learning??
Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so.Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.
>>What is DEVOPS?
DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality.
>>What are Hyperparameters??
In machine learning, a hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are derived via training.
Task Description :
1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.
7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created
9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left
Pre-Requsites:
-Red Hat Linux on avirtual machine
-Jenkins and Docker should be installed on the machine and their services should be active
-ngrok software should be there in your system
-Here,I have implemented a CNN code,and used my own data for creating a code for model training,I have used cats vs Dogs data from Kaggle site: https://www.kaggle.com/c/dogs-vs-cats and implemented the code and need some additional software like git bash installed on your windows system
PROCESS:
1: Create a github repository and set it up in your git bash environment and change the post-commit hook so that it pushes code automatically when commited.You can check this step in one of my previous article,here's the link:
https://www.dhirubhai.net/pulse/integrating-jenkinsdocker-git-hub-creating-automated-web-patnaik
and check my github repo for the files I have uploaded(mail.py,code.py,accuracy.txt),github repo link: https://github.com/Pheonix-reaper/Task3_MLOPS
2: Open jenkins on window,using your redhat ip and port no in which jenkins is alloted,start ngrok and give tunnel to port 80,we get a link,use that link as a webhook for our github repo,so that our redhat system and github are connected.
3: In your,red hat system,create a new directory,inside ,it create a Dockerfile,specify what are the libraries you want and what tools you need in your docker os for this task
Now,use create an image for this by using docker build command :::: docker build -t <imagename> : <tag> <path of dockerfile>
4: Now,its time to create all the jobs in jenkins,create job1,add git scm trigger to automatically copy code in our redhat system from our repository,it can be done by:
Code:-
sudo cp * /mlops/ sudo sed -i '$ a\accuracy= int("%2d" % (model.history.history["acc"][no_of_epochs - 1]*100))\nfile1=open("accuracy.txt","w")\nfile1.write("{}".format(accuracy))' /mlops/code.py
5: Create another job,JOB 2,which checks if the code we have pushed to git hub is a CNN code or not,if it is a CNN code,it will automatically create a docker os for us for model training,else it will show an error message if it is not a CNN code.
6: Create another job,JOB 3 which trains the model,and finds the accuracy and stores it in a file called accuracy.txt
7: Install Downstream plugin to get build project extend action,Create a job,JOB 4 which checks if the model has the accuracy we need if yes,then it executes JOB5 which sends mail to Developer,if not,we tweak the hyperparameter,and call job2 again for model training.
Code:
acc=$(sudo cat "/root/accuracy.txt") acc_req=85 if [ $acc -ge $acc_req ] then echo "Accuracy is good" exit 1 else sudo sed -i 's/no_of_epochs = 1/no_of_epochs = 3/' /mlops/code.py sudo sed -i '/Flatten()/i \model.add(Convolution2D(filters=32,kernel_size=(k_size,k_size),activation="relu"))\nmodel.add(BatchNormalization())\nmodel.add(MaxPooling2D(pool_size= p_size))\nmodel.add(Dropout(0.25))' /mlops/code.py sudo sed -i '/sigmoid/i \model.add(Dense(units=256, activation="relu"))\nmodel.add(BatchNormalization())\nmodel.add(Dropout(0.25))' /mlops/code.py fi
8: Create a job,JOB 5 which sends mail to developer when the model reaches maximum accuracy
9: Create a job,JOB 6 which monitors the container we have created if the container gets destroyed or stopped working,it deploys our container again.
10: Install pipeline Plugin,create a pipeline for the jobs we have created:
You can see that after creating the docker container in job 2,from job 3 to job 4 if in case our container fails to work,than Job6 is executed to deploy the container again
With this we have completed the task,it is a very powerful setup as it automatically finds out about the best code by which we can create an efficient model,this is very basic of how amazon sagemaker basically works,although amazon sagemaker is very efficient but the working principle is almost same.
I have explained each and every step I took for this task and the importance of this task. If you follow all the steps I have mentioned you can also create this setup.