Automation of Machine learning With Devops
HYPERPARAMETER
In the practice of machine and deep learning, Parameters are the properties of training data that will learn on its own during training by the classifier or other ML model. For example, weights and biases, or split points in Decision Tree.
Hyperparameters are instead properties that govern the entire training process. They include variables which determines the network structure (for example, Number of Hidden Units) and the variables which determine how the network is trained (for example, Learning Rate). Hyperparameters are set before training (before optimizing the weights and bias).
For example, here are some model inbuilt configuration variables :
- Learning Rate
- Number of Epochs
- Hidden Layers
- Hidden Units
- Activations Functions
Hyperparameters are important since they directly control behavior of the training algo, having important impact on performance of the model under training.
Choosing appropriate hyperparameters plays a key role in the success of neural network architectures, given the impact on the learned model. For instance, if the learning rate is too low, the model will miss the important patterns in the data; conversely, if it is high, it may have collisions.
Choosing good hyperparameters provides two main benefits:
- Efficient search across the space of possible hyperparameters; and
- Easier management of a large set of experiments for hyperparameter tuning.
Task description
1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.if metrics accuracy is less than 95% , then tweak the machine learning model architecture.
7. Job4 : notify that the best model is being created
8. Create One extra job job5 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left
Project Description:
1. Building the Docker images for TensorFlow and sklearn installed using Dockerfile:
- This is my Dockerfile for keras model:
- In this Dockerfile I have installed centos as os and all the libraries that are required for running the keras code .
Run command "docker build -t m1:v1 ." for creating your image.
Dockerfile for sklearn model:
Run command "docker build -t m2:v1 ." for creating the image.
Finally my docker images are created.
2. Jobs in Jenkins:
Job1(Copy github code):
I use MNIST dataset to deploy this model. You can check the code from here...
Github URL- https://github.com/anurag08-git/ML.git
Now creating our first job of the task.When developer push any code in Github repository, this job automatically detect it and copy the code in host OS.
This job copy the GitHub code in that directory.(/root/mlops)
- Job2(Deploy container for model_train):
If my job1 is successfully built, it triggers job2 and launches the container. By looking at the code or program file, this job automatically launch the respective docker container(either Keras os Sklearn).
This is code which I written in this job.
if sudo cat /root/mlops/cnn.py | grep keras then if sudo docker ps -a | grep mlops then sudo docker rm -f mlops sudo docker run -t -v /root/mlops:/mlops --name mlops m1:v1 /mlops/cnn.py else sudo docker run -t -v /root/mlops:/mlops --name mlops m1:v1 /mlops/cnn.py fi fi #else if it have word as sklearn, it launch a container from keras image. if sudo cat /root/mlops/cnnn.py | grep sklearn then if sudo docker ps -a | grep sklearn then sudo docker rm -f sklearn sudo docker run -dit -v /root/mlops:/mlops --name sklearn m2:v1 /mlops/cnn.py else sudo docker run -dit -v /root/mlops:/mlops --name sklearn m2:v1 /mlops/cnn.py fi fi
- Job3(Check the accuracy and tweak the code and again run until it found the required accuracy):
This is the most important job of whole project. This job check the accuracy of model which I trained in job2 and if accuracy is below from required, this do some changes in code and run again the container to find until required accuracy.
I take number of epoch and number of convolve layer as a parameter increase them by one each time after train the dataset.
#check the accuracy of trained model read_accuracy=$(sudo cat /root/mlops/accuracy.txt) final_accuracy=95 compare=$(echo "$read_accuracy > $final_accuracy" | bc ) no_epoch=1 no_layer=1 #this loop stops when you find the accuracy greater than my required accuracy. while [[ $compare != 1 ]] do let no_epoch+=1 let no_layer+=1 sudo sed -i '/no_epoch=/c\no_epoch='$no_epoch /root/mlops/cnn.py sudo sed -i '/no_layer=/c\no_layer='$no_layer /root/mlops/cnn.py sudo docker rm -f keras_code sudo docker run -t -v /root/mlops:/mlops --name mlops m1:v1 /mlops/cnn.py compare=$(echo "$read_accuracy > $final_accuracy" | bc ) done
If anything goes wrong, it triggers to job5.
- Job4(This job sent a mail to the developer for successful train of model):
After getting the required accuracy, it sent a mail to developer.
#mail.py import smtplib # creates SMTP session s = smtplib.SMTP('smtp.gmail.com', 587) # start TLS for security s.starttls() # Authentication s.login("sender_email", "password") # message to be sent message = "Hey Developer, Finally we got the model trained. " # sending the mail s.sendmail("sender_mail", "developr_mail", message) # terminating the session s.quit()
- Job5(This job is for monitoring the job2 and job3, if any container failed, it rebuild the container):
This job relaunch the container, if any of container get failed and also sent a email for failure the job.