Integrating Deep Learning Model with CI/CD Operations using Jenkins and Docler to Automate the Complete Delivery and Deployment of DL code

Integrating Deep Learning Model with CI/CD Operations using Jenkins and Docler to Automate the Complete Delivery and Deployment of DL code

Integration of Deep Learning Model with DevOps Operations using Jenkins

 

Hey Guys, today's article based on automation of machine learning model isn’t it sounds great. Now you must be wondering what the automation of machine learning guys it means once you run your machine learning training it will automatically train himself to gt best accuracy and will mail you after all work is done isn’t is sound great so read ahead if you want to learn how to make it.

Technologies Used: TensorFlow, Keras, git, Github, Jenkins, Docker, Red Hat Linux

Let's Look at the problem Statement

Job1: Pull the GitHub repo automatically when some developers push the repo to GitHub. In Totality, I have created 6 Jobs.

Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training ( eg- If code uses CNN, then Jenkins should start the container that has already installed all the software required for the CNN processing).

Job3 : Train your model and predict accuracy or metrics.

 Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.

Job5: Retrain the model or notify that the best model is being created.

After implementation, the overall structure of Build pipeline in jenkins looks like:

No alt text provided for this image


Before starting, i have created a Docker Image in which i have configured miniconda which provides almost all the software and Packages for implementing Deep Learning model. You can see see my Docker image on Docker Hub.

You can also pull this image by simply typing:

docker pull priyansh9879/centos-miniconda3



Job1 - Pulling the code from GitHub to Rhel8 Machine using PoleSCM Triggering.

From base Rhel8 Machine, we have configured the Directory with git and automatic push using hooks concept in git.

No alt text provided for this image


Whenever we commit any of the file, The code will get pushed to the GitHub Repo. Here, we have created CNN model using MNIST dataset to predict the Numbers.


No alt text provided for this image


You can use the code form the GitHub Repo.


Lets Look at the Configuration of the Job1.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

In the end, after downloading the code, this Job will copy all the Files to a specific Location. We have to use the following Commad:

sudo cp * /root/deploy-dl-code

         

         [ Directory where my code will copy]


After its completion, Job2 will Trigger Automatically.

Job2 - Building the Container Specified for the Model Code and its Requirements.

In Job2, i have automated the process of building the container from miniconda docker image. Beside building the container with all software, i have made a requirements.txt file in which i have specified all the required packages for Training our model. The requirements.txt file is also copied from GitHub to target location. Lets see how Job2 works.

Here Job2 is the Upstream of Job1 and Downstream of Job3.

No alt text provided for this image
No alt text provided for this image

The Complete Shell Script is Described as under:

    echo

       sleep 2s

       echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

       echo "                          pulling image                             "

       echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

       sudo docker pull priyansh9879/centos-miniconda3:8

       sleep 4s

       echo

fi

sleep 2s

if sudo docker ps -a | grep deploy-dl-code

then

       echo

       sleep 2s

       echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

       echo "            Starting Exisiting Container deploy-dl-code              "

       echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

       sudo docker start deploy-dl-code

       sleep 2s

       echo

else

       sleep 2s

       echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

       echo "                 Building Container deploy-dl-code                  "

       echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

       sudo docker run -dit --name deploy-dl-code -v /root/deploy-dl-code/:/dlcode priyansh9879/centos-miniconda3:8

       sleep 5s

       echo

fi


echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "       Instaling all required packages from requirements.txt        "

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo

sleep 2s


sudo docker exec deploy-dl-code /root/miniconda3/condabin/conda install -n tensorflow --yes --file /dlcode/requirements.txt



The Above Logic is designed that:

1. If the Image is not present in the system, it will download the image from my Docker Repo. If it already exists, it will start building the container.

2. If the Container is not present, it will launch a Container with the name deploy-dl-code and also mount the Volume where our code is present inside the container. In any case, if the Container is already present, then it won't create it again. If it is Stopped, it will start the Container Automatically.

3. After Container is get Build, The last Command will install all the requirement software inside the container in tensorflow environment by default.

After it Completion, Job3 will Trigger Automatically.

Job3 - Training our model inside the container.

In Job3, Jenkins will start Training the Container.

Here, Job3 is the Upstream of Job2 and Downstream of Job4.

No alt text provided for this image
No alt text provided for this image


The shell Command to tell container to start training the model cnn.py is:

sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/cnn.py

            

            [ Path of my code]



After Training the Model, it will store the accuracy results in cnn_resultbestaccuracy.txt file.

After its Completion, Job4 will Trigger Automatically.

Job4 - Tweaking the Model again to get highest Accuracy.

In Job4, i had made a logic by which we can automatically improve the accuracy of our cnn model without changing the code manually. It took me 12 hrs to design this logic which is the most Interesting part of this article.

In Job3, after Training the model successfully, it will save the accuracy result in cnn_resultbestaccuracy.txt file. In first go, if the model gives accuracy less than 90%, then Job4 will run another file called filehandling.py which adds a Dense Layer inside out main code. After this, the Shell script invokes the Job3 again to again start building the model. For invoking Job3 again from shell, here we have used the concept of Trigger builds remotely with Authentication Token. You can read the following Article for the reference.

No alt text provided for this image
No alt text provided for this image


In simple words, Job4 will work something like this:

a. If model accuracy is less than 90%, then shell script will run the filehandling.py file and then invoke the Job3 to Train the model again. This process works continuously until we get the accuracy more than 90%.

b. After getting accuracy more than 90%, it will come out of the loop and stops the process of tweaking. Here is the Shell Script command.

train=$(sudo cat /root/deploy-dl-code/test.txt)

pred=90.000000


st=`echo "$train < $pred" | bc`

if [ $st -eq 1 ]; then

 ## If accuracy is not the desired accuracy

 echo "Tweaking Model again by triggering Job3"

 sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/filehandling.py

 curl -X POST https://192.168.99.111:8080/job/Job3-Train_Model/build?token=job3 --user priyansh:11cecf8d41bce413ad35249c815a28e2a8

else

 ## If accuracy is greater than the desired one

 echo "Model Successfully tweaked and your Accuracy is improved"


fi


After its Completion, Job5 will automatically Trigger.

Job5 - Display Best Accuracy.

Finally in Job5, Jenkins will display the Accuracy result. Here for this, i have used a jenkins plugin called Summary Display which displays the Summary of the complete Job.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


The command for Shell Script looks like:

sudo docker exec deploy-dl-code cat /dlcode/cnn_resultbestaccuracy.txt

So guys, this is the final Step of my Article which is a Fully Automated DevOps Operation CI/CD Pipeline with Deep Learning Code. It took me 37 Hours to complete this project to write an article on it. I would like to thank my DL and DevOps Team Members.

  1. Rohan Singh Shekhawat.
  2. Chandra Shekhar Sharma.
  3. Priyansh Magotra.
  4. Sagar Jangid.


Chandra Shekhar Sharma

RHCA | RHCE | RHCSA | OpenShift | DevOps

4 年

It was great to work with you guys keep it up bro

回复

要查看或添加评论,请登录

Aditya Gupta的更多文章

  • My Views on Industry Use Cases of Kubernetes/ Devops Session

    My Views on Industry Use Cases of Kubernetes/ Devops Session

    Hey guys hope you all are doing today's article is going to thong I learn from yesterday's session. Yesterday in Linux…

  • My Views on Industry Use Cases of Kubernetes/ OpenShift Session

    My Views on Industry Use Cases of Kubernetes/ OpenShift Session

    hey guys hope you all are doing good this article is related to a session attended by me at Linux world the session was…

  • A Session To Remember On Ansible

    A Session To Remember On Ansible

    Hey guys Hope you all are doing this article is about the wonderful session I had on Monday 28DEC2020. As being…

  • DEPLOYING WORDPRESS WEBSITE ON GCP CLOUD USING SQL AND GKE

    DEPLOYING WORDPRESS WEBSITE ON GCP CLOUD USING SQL AND GKE

    Hey guys hope you are doing good today we are going to deploy WordPress website on GKE using SQL service and also gone…

  • DEPLOYING WORDPRESS USIND AWS RDS AND KUBERNATES

    DEPLOYING WORDPRESS USIND AWS RDS AND KUBERNATES

    Hey guys hope you all are doing good today article gonna be super fun and you can use it for your use as the heading…

  • GCP WORKSHOP FEEDBACK

    GCP WORKSHOP FEEDBACK

    Hey guys yesterday I have completed in days GCP workshop under the guidance of Mr. Vimal Daga sir at LinuxWorld in this…

  • SEMANTIC SEARCH ENGINE FOR Q&A USING ELASTIC SEARCH AND DOCKER

    SEMANTIC SEARCH ENGINE FOR Q&A USING ELASTIC SEARCH AND DOCKER

    Hey guys in today's article is about my DevOps project which I created with my partner. PROBLEM DEFINITION When we are…

    2 条评论
  • Python Bot

    Python Bot

    Hey guys hope you all are doing good. In today's article, we are going to see how to make our own chatbot in python to…

  • Flutter Music player/Video Player

    Flutter Music player/Video Player

    Hey guys hope you all are doing good today we are going to learn very different things which we learn usually today we…

  • Network For Web Portal Using Terraform And Aws

    Network For Web Portal Using Terraform And Aws

    Hey guys hope you all are doing good in today's article we are going to see how to create our VPC, Subnet, Internet…

社区洞察

其他会员也浏览了