MLOps - Automated Tuning: Automating Machine Learning using DevOps for Tuning Hyper Parameters
Machine Learning is the trend of the time which now almost every CS/IT guy knows about or aspires to know.But the problem which lies in creating the best model is deciding the Hyper Parameters for the model.
What are Hyper Parameters ??
A hyperparameter is a parameter that is set before the learning process begins like number of layers , number of neurons in a layer etc.
These parameters have a direct impact on the accuracy of the model. Since these are set by us and not automatically decided by the machines , it becomes an absolute necessity to select the best values for hyper parameters which shall increase the overall accuracy of the model.
(Please check the cover image for a better understanding)
Now choosing the best values for this is not so easy and especially in deep learning , we need many hit and trials to finally find the best ones for our model and doing this manually is both tiring and time taking. So, to solve this problem of ours , our mentor Vimal Daga Sir gave us a unique project which required to use DevOps to automate the process of hit and trials and thus select the best Hyper Parameters.
Now since we are clear with the objective of the task , so lets begin with the procedure to implement it.
I have used 6 jobs in Jenkins to implement this which are as follows :
Job 1 : Pull GitHub Code
When the developer will push any code to GitHub , this job will copy that code into the local repository in our system. For this I have used Poll SCM to keep on checking the remote repository for any changes.
sudo cp -v * /home/jyotirmaya/ws/mlops1
The above code will transfer the code files copied to Jenkins from GitHub into my local repository /home/jyotirmaya/ws/mlops1 .
Job 2 : See Code and Launch
It will do the following tasks :
1)Check whether the code (stored in program.py) is of CNN or not (checked using program checkcode.py)
2)If the code is of CNN , execute its container from its image(convoimage:v13) created using Dockerfile.
The checkcode.py as you can see below is an extremely simple code using the basic concept that any CNN model will definitely have 2 words in it to implement their modules which are keras and conv2D. Thus if these words are present , the output of the program will be kerasCNN.
programfile = open('/home/jyotirmaya/ws/mlops1/program.py','r') code = programfile.read() if 'keras' or 'tensorflow' in code: if 'Conv2D' or 'Convolution' in code: print('kerasCNN') else: print('not kerasCNN') else: print('not deep learning')
As you can see in the below image , I compared the output of the above program in Jenkins and launched my container using the image I created using the Docker File.
Here is my Docker File Code
The Docker File as you can see here has just few lines of code but it has a size of 2.62 GB and it took me 13 versions to build the perfect version that shall satisfy my requirements.
The below code will run "python3 /mlops/program.py" as soon as the container is launched where /mlops/ is the path to access the code file program.py inside the docker container.The directory was created using the volume linking feature of docker as mlops folder is linked with the local repository stored in baseOS.
CMD [ "python3","/mlops/program.py" ]
The program.py code was actualy of LeNet for MNIST dataset but I modified the layers part.The data of the layers Convolve and Fully Connected which is actually are hyper parameters are now input through a file 'input.txt'.
convlayers = int(input()) first_layer_nfilter = int(input()) first_layer_filter_size = int(input()) first_layer_pool_size = int(input()) model.add(Conv2D(first_layer_nfilter, (first_layer_filter_size, first_layer_filter_size), padding = "same", input_shape = input_shape)) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size = (first_layer_pool_size, first_layer_pool_size))) #Subsequent CRP sets for i in range(1,convlayers): nfilters = int(input()) filter_size = int(input()) pool_size = int(input()) model.add(Conv2D(nfilters, (filter_size, filter_size),padding = "same")) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size = (pool_size, pool_size))) # Fully connected layers (w/ RELU) model.add(Flatten()) fc_input = int(input()) for i in range(0,fc_input): no_neurons = int(input()) model.add(Dense(no_neurons)) model.add( model.add(Activation("relu"))
I have done so because in the next executions of the code when there shall be a need to tweak the Hyper Parameters for improving accuracy, Jenkins shall simply run a program (tweaker.py) which shall change the contents of the input file without touching the main code file and the hyper parameters will change.
The program tweaker.py for me is the soul of this setup and it is from where I began this entire project. In Job 4 , there is a proper explanation of what it actually does.
Job 3 : Predict Accuracy
The task done by it very simple that is the accuracy along with the setup is deployed on the Apache Web Server so that user can directly access and see it using the following URL :
IPofSystem/display_matter.html
sudo cp /home/jyotirmaya/ws/mlops1/display_matter.html /var/www/html
Just write the above code in the Execute Shell of Jenkins job 3.
Job 4 : Analyse Accuracy and move
This job performs the following tasks :
1)Checks accuracy , if accuracy is less than required , then tweak the code using program tweaker.py and again start job 2 i.e. see code and launch to start the container and run the model once again.
2)If accuracy requirement is met , call job 5 i.e. model create success
Now lets see how tweaker.py tweaks the code...
When tweaker.py is called , it compares the accuracies old (initially 0) and new (gained from running the container) . If the accuracy has increased then it increases the value of the first hyper parameter(here number of filters) of the base convolve layer.
Also it changes the initial 0 accuracy to new accuracy received in the data.txt file for next build calculations.
As soon as the hyper parameter value is changed , the job2 is re run to see the accuracy.
Now , if the accuracy would have increased , it means that the value increased was good and can be increased further , so it increases that parameter's value further.
But if it finds that the accuracy has decreased , then our program tweaker.py changes the parameter's value to its initial value and now starts changing the value of the next hyper parameter (which is in our case is filter size).
In every call , it repeats this process until in that layer , no more hyper parameters can be increased and when such a case arises , it goes on to add another layer and do all the above processes once again in the new layer.
Here are the detailed images to support the written matter.
Above was the explanation of how tweaker.py actually works.
Now lets see the code used to implement this job in Jenkins
The above codes are :
if [[ "$(sudo cat /home/jyotirmaya/ws/mlops1/accuracy.txt)" < "0.9999999" ]] then echo "Tweaking The program" sudo python3 /home/jyotirmaya/ws/mlops1/tweaker.py curl 192.168.43.250:8080/view/Integrate%20Machine%20Learning%20with%20Jenkins/job/See%20code%20and%20Launch/build?token=tweakedNowRun else echo "Merge and Email" curl 192.168.43.250:8080/job/Model%20Create%20Success/build?token=modelCreateSuccess
fi
Here 0.999999 is the target accuracy to be achieved to accept the model as successful.
The first curl command is to trigger job 2 since our hyper parameters are tweaked and ready to be tested.
The second curl command is to trigger job 5 on successful model creation.
Job 5 : Model Create Success
This is triggered when the required accuracy is met and the input file is mailed to the developer to help the developer know the correct value of the hyper parameters
sudo python3 /home/jyotirmaya/ws/mlops1/email.py
Just write the above command in the Execute Shell of the Jenkins job 5.
Job 6 : Restart Docker
This is a monitoring job called when job 2 fails that is due to nay reason the container kerasCNNos fails to complete execution.
It restarts the docker engine completely to make sure docker engine is working fine because in our setup , this can be the major reason for job 2 failure.
And then it triggers job 2 once again.
Using the above setup , I achieved an accuracy of 99.21% using 5 epoches per training within a few hours till the time I did not manually stopped the building .
No. of convolve layers : 2 Layer 1 No of filters : 128 Filter Size : 7 Pool Size : 2 Layer 2: No of filters : 2048 Filter Size : 2 Pool Size : 2 No. of FC Layers : 1 Neurons in Layer 1 : 10
Accuracy achieved : 0.9921000003814697
My GitHub Link for the above codes : https://github.com/JyotirmayaV/mlops1/tree/developer
Thank you guys for reading this.This project was truly a great learning experience and it taught me many great things especially it taught me the real meaning of MLOPS.....
Your approach is nice.Well done.Keep it up...
World Record Holder | 2x TEDx Speaker | Philanthropist | Sr. Principal Consultant | Entrepreneur | Founder LW Informatics | Founder Hash13 pvt ltd | Founder IIEC
4 年Very proud of u , appreciated
Java | Spring Boot | Microservices | Backend Developer | Researcher | Machine Learning | MLH Best Social Good Hack Winner
4 年Nicely Done!!! ????
Databases | Infrastructure | DBRE-OKTA
4 年Very well implemented
Salesforce Developer
4 年That's actually a very good implementation of it Kritik Sachdeva check him out