登录查看更多内容

Integrating Deep Learning Model with CI/CD Operations using Jenkins and Docler to Automate the Complete Delivery and Deployment of DL code

Aditya Gupta

Cloud & DevOps Expert | RedHat & AWS Certified | 2+ Yrs Exp | I help Companies Achieve Scalability & Efficiency with Advanced Cloud Solution

发布日期: 2020年5月30日

Integration of Deep Learning Model with DevOps Operations using Jenkins

Hey Guys, today's article based on automation of machine learning model isn’t it sounds great. Now you must be wondering what the automation of machine learning guys it means once you run your machine learning training it will automatically train himself to gt best accuracy and will mail you after all work is done isn’t is sound great so read ahead if you want to learn how to make it.

Technologies Used: TensorFlow, Keras, git, Github, Jenkins, Docker, Red Hat Linux

Let's Look at the problem Statement

Job1: Pull the GitHub repo automatically when some developers push the repo to GitHub. In Totality, I have created 6 Jobs.

Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training ( eg- If code uses CNN, then Jenkins should start the container that has already installed all the software required for the CNN processing).

Job3 : Train your model and predict accuracy or metrics.

Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.

Job5: Retrain the model or notify that the best model is being created.

After implementation, the overall structure of Build pipeline in jenkins looks like:

Before starting, i have created a Docker Image in which i have configured miniconda which provides almost all the software and Packages for implementing Deep Learning model. You can see see my Docker image on Docker Hub.

You can also pull this image by simply typing:

docker pull priyansh9879/centos-miniconda3

Job1 - Pulling the code from GitHub to Rhel8 Machine using PoleSCM Triggering.

From base Rhel8 Machine, we have configured the Directory with git and automatic push using hooks concept in git.

Whenever we commit any of the file, The code will get pushed to the GitHub Repo. Here, we have created CNN model using MNIST dataset to predict the Numbers.

You can use the code form the GitHub Repo.

Lets Look at the Configuration of the Job1.

In the end, after downloading the code, this Job will copy all the Files to a specific Location. We have to use the following Commad:

sudo cp * /root/deploy-dl-code

[ Directory where my code will copy]

After its completion, Job2 will Trigger Automatically.

Job2 - Building the Container Specified for the Model Code and its Requirements.

In Job2, i have automated the process of building the container from miniconda docker image. Beside building the container with all software, i have made a requirements.txt file in which i have specified all the required packages for Training our model. The requirements.txt file is also copied from GitHub to target location. Lets see how Job2 works.

Here Job2 is the Upstream of Job1 and Downstream of Job3.

The Complete Shell Script is Described as under:

echo

sleep 2s

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo " pulling image "

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

sudo docker pull priyansh9879/centos-miniconda3:8

sleep 4s

echo

sleep 2s

if sudo docker ps -a | grep deploy-dl-code

then

echo

sleep 2s

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo " Starting Exisiting Container deploy-dl-code "

echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

sudo docker start deploy-dl-code

sleep 2s

echo

else

sleep 2s

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo " Building Container deploy-dl-code "

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

sudo docker run -dit --name deploy-dl-code -v /root/deploy-dl-code/:/dlcode priyansh9879/centos-miniconda3:8

sleep 5s

echo

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo " Instaling all required packages from requirements.txt "

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo

sleep 2s

sudo docker exec deploy-dl-code /root/miniconda3/condabin/conda install -n tensorflow --yes --file /dlcode/requirements.txt

The Above Logic is designed that:

1. If the Image is not present in the system, it will download the image from my Docker Repo. If it already exists, it will start building the container.

2. If the Container is not present, it will launch a Container with the name deploy-dl-code and also mount the Volume where our code is present inside the container. In any case, if the Container is already present, then it won't create it again. If it is Stopped, it will start the Container Automatically.

3. After Container is get Build, The last Command will install all the requirement software inside the container in tensorflow environment by default.

After it Completion, Job3 will Trigger Automatically.

Job3 - Training our model inside the container.

In Job3, Jenkins will start Training the Container.

Here, Job3 is the Upstream of Job2 and Downstream of Job4.

The shell Command to tell container to start training the model cnn.py is:

sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/cnn.py

[ Path of my code]

After Training the Model, it will store the accuracy results in cnn_resultbestaccuracy.txt file.

After its Completion, Job4 will Trigger Automatically.

Job4 - Tweaking the Model again to get highest Accuracy.

In Job4, i had made a logic by which we can automatically improve the accuracy of our cnn model without changing the code manually. It took me 12 hrs to design this logic which is the most Interesting part of this article.

In Job3, after Training the model successfully, it will save the accuracy result in cnn_resultbestaccuracy.txt file. In first go, if the model gives accuracy less than 90%, then Job4 will run another file called filehandling.py which adds a Dense Layer inside out main code. After this, the Shell script invokes the Job3 again to again start building the model. For invoking Job3 again from shell, here we have used the concept of Trigger builds remotely with Authentication Token. You can read the following Article for the reference.

In simple words, Job4 will work something like this:

a. If model accuracy is less than 90%, then shell script will run the filehandling.py file and then invoke the Job3 to Train the model again. This process works continuously until we get the accuracy more than 90%.

b. After getting accuracy more than 90%, it will come out of the loop and stops the process of tweaking. Here is the Shell Script command.

train=$(sudo cat /root/deploy-dl-code/test.txt)

pred=90.000000

st=`echo "$train < $pred" | bc`

if [ $st -eq 1 ]; then

## If accuracy is not the desired accuracy

echo "Tweaking Model again by triggering Job3"

sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/filehandling.py

curl -X POST https://192.168.99.111:8080/job/Job3-Train_Model/build?token=job3 --user priyansh:11cecf8d41bce413ad35249c815a28e2a8

else

## If accuracy is greater than the desired one

echo "Model Successfully tweaked and your Accuracy is improved"

After its Completion, Job5 will automatically Trigger.

Job5 - Display Best Accuracy.

Finally in Job5, Jenkins will display the Accuracy result. Here for this, i have used a jenkins plugin called Summary Display which displays the Summary of the complete Job.

The command for Shell Script looks like:

sudo docker exec deploy-dl-code cat /dlcode/cnn_resultbestaccuracy.txt

So guys, this is the final Step of my Article which is a Fully Automated DevOps Operation CI/CD Pipeline with Deep Learning Code. It took me 37 Hours to complete this project to write an article on it. I would like to thank my DL and DevOps Team Members.

Rohan Singh Shekhawat.
Chandra Shekhar Sharma.
Priyansh Magotra.
Sagar Jangid.

Chandra Shekhar Sharma

RHCA | RHCE | RHCSA | OpenShift | DevOps

4 年

It was great to work with you guys keep it up bro

查看更多评论

要查看或添加评论，请登录

Aditya Gupta的更多文章

My Views on Industry Use Cases of Kubernetes/ Devops Session

2021年3月9日

My Views on Industry Use Cases of Kubernetes/ Devops Session

Hey guys hope you all are doing today's article is going to thong I learn from yesterday's session. Yesterday in Linux…
My Views on Industry Use Cases of Kubernetes/ OpenShift Session

2021年3月1日

My Views on Industry Use Cases of Kubernetes/ OpenShift Session

hey guys hope you all are doing good this article is related to a session attended by me at Linux world the session was…
A Session To Remember On Ansible

2020年12月29日

A Session To Remember On Ansible

Hey guys Hope you all are doing this article is about the wonderful session I had on Monday 28DEC2020. As being…
DEPLOYING WORDPRESS WEBSITE ON GCP CLOUD USING SQL AND GKE

2020年8月28日

DEPLOYING WORDPRESS WEBSITE ON GCP CLOUD USING SQL AND GKE

Hey guys hope you are doing good today we are going to deploy WordPress website on GKE using SQL service and also gone…
DEPLOYING WORDPRESS USIND AWS RDS AND KUBERNATES

2020年8月27日

DEPLOYING WORDPRESS USIND AWS RDS AND KUBERNATES

Hey guys hope you all are doing good today article gonna be super fun and you can use it for your use as the heading…
GCP WORKSHOP FEEDBACK

2020年8月24日

GCP WORKSHOP FEEDBACK

Hey guys yesterday I have completed in days GCP workshop under the guidance of Mr. Vimal Daga sir at LinuxWorld in this…
SEMANTIC SEARCH ENGINE FOR Q&A USING ELASTIC SEARCH AND DOCKER

2020年8月22日

SEMANTIC SEARCH ENGINE FOR Q&A USING ELASTIC SEARCH AND DOCKER

Hey guys in today's article is about my DevOps project which I created with my partner. PROBLEM DEFINITION When we are…

2 条评论
Python Bot

2020年8月20日

Python Bot

Hey guys hope you all are doing good. In today's article, we are going to see how to make our own chatbot in python to…
Flutter Music player/Video Player

2020年8月5日

Flutter Music player/Video Player

Hey guys hope you all are doing good today we are going to learn very different things which we learn usually today we…
Network For Web Portal Using Terraform And Aws

2020年7月14日

Network For Web Portal Using Terraform And Aws

Hey guys hope you all are doing good in today's article we are going to see how to create our VPC, Subnet, Internet…

See all articles

Integrating Deep Learning Model with CI/CD Operations using Jenkins and Docler to Automate the Complete Delivery and Deployment of DL code

Aditya Gupta

Cloud & DevOps Expert | RedHat & AWS Certified | 2+ Yrs Exp | I help Companies Achieve Scalability & Efficiency with Advanced Cloud Solution

Technologies Used: TensorFlow, Keras, git, Github, Jenkins, Docker, Red Hat Linux

Aditya Gupta的更多文章

社区洞察

其他会员也浏览了

Llama Recipes, Reinforcement Learning and Probabilistic Methods in Combinatorics Courses, Quarto Dashboard Tutorial

Announcing the 5-Week AI Mini-Bootcamp for Fall 2024

AI Mindmap for Studying Machine Learning

AI in Software Development: Impact and Future Prospects

Do you want to learn more about Machine Learning but don't know where to begin?

Learning from a “Failed” Cursor Project: Why It’s Sometimes Better to Start Fresh

How to get started with LLMs as a product manager

2024????? Beginners???? AI/ML Engineer ???? Learning Roadmap

PyCaret - An open source low-code machine learning library

From QA Engineer to AI MLOps Expert: A Comprehensive Roadmap

Technologies Used: TensorFlow, Keras, git, Github, Jenkins, Docker, Red Hat Linux

Aditya Gupta的更多文章

My Views on Industry Use Cases of Kubernetes/ Devops Session

My Views on Industry Use Cases of Kubernetes/ OpenShift Session

A Session To Remember On Ansible

DEPLOYING WORDPRESS WEBSITE ON GCP CLOUD USING SQL AND GKE

DEPLOYING WORDPRESS USIND AWS RDS AND KUBERNATES

GCP WORKSHOP FEEDBACK

SEMANTIC SEARCH ENGINE FOR Q&A USING ELASTIC SEARCH AND DOCKER

Python Bot

Flutter Music player/Video Player

Network For Web Portal Using Terraform And Aws

社区洞察

其他会员也浏览了

Llama Recipes, Reinforcement Learning and Probabilistic Methods in Combinatorics Courses, Quarto Dashboard Tutorial

Announcing the 5-Week AI Mini-Bootcamp for Fall 2024

AI Mindmap for Studying Machine Learning

AI in Software Development: Impact and Future Prospects

Do you want to learn more about Machine Learning but don't know where to begin?

Learning from a “Failed” Cursor Project: Why It’s Sometimes Better to Start Fresh

How to get started with LLMs as a product manager

2024????? Beginners???? AI/ML Engineer ???? Learning Roadmap

PyCaret - An open source low-code machine learning library

From QA Engineer to AI MLOps Expert: A Comprehensive Roadmap