登录查看更多内容

MLOPS

Shubham Mehta

DevOps Engineer at Amdocs

发布日期: 2020年5月28日

You might have heard the fact that 90% of the Machine Learning models never make it to the production. There are numerous obstacles in the path. One such obstacle is that the data science guys and the IT guys are not getting an opportunity to work together.

A solution to the above problem: MLOPS (A combination of ML and DevOps) that gives both the IT team as well as the Data Science team to work together in order to push the ML model into the production. With the help of this latest technology, both the teams could work together, deploy the model, monitor it, and manage the model in production.

So, here is a task which was given by Vimal Daga sir in which I have tried to integrate ML with DevOps. I have used some of the most demanding technologies in this project such as Git, Github, Jenkins, Docker, and Machine Learning. Finally, I have created a delivery pipeline by using the build pipeline plugin in Jenkins that automates the production of ML model without the interference of any human being.

TASK DESCRIPTION

1. Create a container image that has Python3 and Keras or NumPy installed using docker file

2. When we launch this image, it should automatically start to train the model in the container.

3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins

4. Job1: Pull the Github repo automatically when some developers push the repo to Github.

5. Job2: By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the software required for the CNN processing).

6. Job3: Train your model and predict accuracy or metrics.

7. Job4: if metrics accuracy is less than 80%, then tweak the machine learning model architecture.

8. Job5: Retrain the model or notify that the best model is being created

9. Create One extra job job6 for monitoring: If the container where the app is running, fails due to any reason then this job should automatically start the container again from where the last trained model left

So, Let’s try to achieve the following requirements:

The first step is to create a DockerFile that installs all the required libraries of python and builds an image from the file.

Below is the content of the Dockerfile:

After successfully creating the Dockerfile, it is the time to create a docker image using this Dockerfile. The code to do so is:

The creation of a docker image with all the above requirements has been started.

And hence, the image has been successfully created.

Now let us move towards the Jobs in Jenkins.

The first job is to pull the code from GitHub.

For automatic pulling of code from GitHub, I have used Poll SCM trigger in the first job of Jenkins and then the pulled code will be copied to the root directory of the Redhat (whichever OS/environment you are using).

JOB1:

Now, this Job will pull the code from the GitHub repository and then copy all the files from the workspace of Jenkins to the root directory of my RedHat8.

Now, here comes the Job2 where a docker container is launched and the copied python code is run. Then, the accuracy of the model is saved in a file named accuracy.txt.

JOB2:

The second job will only run when Job1 successfully runs.

This is the code in Job2 that will launch the container and run the python code in order to create a model.

The Machine Learning code that is pulled from the GitHub is:

I have used the concept of the loop in which changing the value of “filt” variable will change the number of filters and changing the value of “parameter” variable will increase the number of layers in the model.

Here is the GitHub link where you can find the above Machine Learning code:

https://github.com/devilsm13/MLOps1

The console output of the second Job is:

The accuracy has been retrieved from the model and has been stored in a text file in the RedHat8.

Now let us move towards the third job. This is the most important part where the accuracy of the model will be compared with some value and if this accuracy is less than that expected value, then the model will change the hyperparameters (such as filters, the number of layers, etc.) itself until and unless the accuracy matches the expected one.

In Job 3, I have used the “sed” concept of Linux using which I have done changes in the python code in which the model is being trained.

JOB3:

This job will check whether the accuracy of the model is 95% or not. And if it is less than the job will do necessary changes (increasing the number of layers and the number of filters) in the code. The value of the parameter variable keeps on increasing and every time the loop runs, the number of layers will be doubled and the number of filters will also be increased than the previous one.

The log of Job3 is:

The updated content of the Machine Learning code is:

Now, after getting the expected accuracy, Job 3 will trigger Job 4 which will send an email to the authority. The content of the fourth Job is:

JOB4:

The content of the email is:

The content of the received email is:

Hence, all the jobs have been completed. And here comes the role of the fifth Job that will keep monitoring the container in which the Machine Learning code is running. If the container is running properly, there is no problem at all!!

However, at any moment, if the container fails, Job 5 will restart the container and on the other hand, it will also send an email to the authority regarding the issue.

The content of the fifth job is:

JOB5:

Here, Job4 will trigger Job5 for the very first time and after that, this job will keep on monitoring the container until and unless it is stopped manually.

The log of Job5 is:

For testing purpose, I manually stopped the container to check whether the second part of the code is working or not. It worked!!

The container was restarted by Jenkins and the received email is:

The overall summary of the project from the perspective of a build pipeline is:

And thus, we have successfully achieved our requirements by integrating GitHub, RedHat, Docker, and Jenkins. Here, we created a total of 5 Jobs and then visualized them using the BIULD PIPELINE in Jenkins.

Hope you liked the project. Thanks a lot for your time.

All the codes and Screenshots of the project are uploaded at-

https://github.com/devilsm13/MLOPS_Task3_Content

Jasprit Kaur

Senior Platform Engineer at Quantiphi

4 年

Great??

1 次回应

查看更多评论

要查看或添加评论，请登录

Shubham Mehta的更多文章

AWS SQS - Use Cases

2021年3月19日

AWS SQS - Use Cases

Do you like online shopping? Let us consider a scenario where you went to an e-commerce site, you liked a product…

2 条评论
Azure Kubernetes Service (AKS) - Industry Use Case

2021年3月18日

Azure Kubernetes Service (AKS) - Industry Use Case

Containerization has become a new trend over the past few years in the technology industry. All the top MNCs have…
How Kubernetes Is Solving The Challenges Faced In The Industries?

2021年3月13日

How Kubernetes Is Solving The Challenges Faced In The Industries?

Kubernetes has gained popularity for a number of use cases, given its unique features. It’s a suitable platform to run…
Haproxy (Load Balancer) Configuration Using Ansible

2020年12月10日

Haproxy (Load Balancer) Configuration Using Ansible

The aim of this article is: 12.1 Use Ansible playbook to Configure Reverse Proxy i.
Configuring Apache WebServer on Docker

2020年11月25日

Configuring Apache WebServer on Docker

Agenda of this article: ??Configuring HTTPD Server on Docker Container ??Setting up Python Interpreter and running…

4 条评论
Sharing Limited Storage of a Slave Node to the Master Node in an HDFS Cluster

2020年10月21日

Sharing Limited Storage of a Slave Node to the Master Node in an HDFS Cluster

Task description: Task 4.1:- ??In a Hadoop cluster, find how to contribute a limited/specific amount of storage as a…

13 条评论
How Tesla Is Leveraging The Power Of Artificial Intelligence In Today's World.

2020年10月20日

How Tesla Is Leveraging The Power Of Artificial Intelligence In Today's World.

How Tesla is making use of Artificial Intelligence in its operations? Elon Musk’s Tesla Inc, the American…

4 条评论
Using AWS CLI For Basic Cloud Operations

2020年10月12日

Using AWS CLI For Basic Cloud Operations

Here are the requirements of the above task: ?? Create a key pair ?? Create a security group ?? Launch an instance…

8 条评论
WHY AND HOW NASA ADOPTED AWS CLOUD?

2020年9月22日

WHY AND HOW NASA ADOPTED AWS CLOUD?

The tech industry has grown a lot over the past few decades. And with the increase in the most demanding technologies…

2 条评论
Big Data: Problems and Solutions.

2020年9月15日

Big Data: Problems and Solutions.

If we talk about a single company called Facebook, Facebook stores 500 TB per day. Moreover, it scans around 5040 TB of…

2 条评论

See all articles

MLOPS

Shubham Mehta

DevOps Engineer at Amdocs

TASK DESCRIPTION

Shubham Mehta的更多文章

社区洞察

其他会员也浏览了

DABL

Issue #311 - The ML Engineer ??

? Study on operator bugs, 100 million images for just $100, CRD generation pitfalls, Kyverno's mutating webhooks, eBPF probes and you

Boost Your Career: Access 100 Free Certified Courses on Udemy and Coursera

Two Towers Model: A Custom Pipeline in Vertex AI Using?Kubeflow

GroupBy #11: Python at Meta, Netflix Incremental Processing with Apache Iceberg, 2023 AI year in brief

ModernBERT for Faster RAG

Issue #192 - THE ML ENGINEER ??

Issue #176 - THE ML ENGINEER ??

The Unofficial Guide to Picking the Right Coding AI Assistant for Software Developers

TASK DESCRIPTION

Shubham Mehta的更多文章

AWS SQS - Use Cases

Azure Kubernetes Service (AKS) - Industry Use Case

How Kubernetes Is Solving The Challenges Faced In The Industries?

Haproxy (Load Balancer) Configuration Using Ansible

Configuring Apache WebServer on Docker

Sharing Limited Storage of a Slave Node to the Master Node in an HDFS Cluster

How Tesla Is Leveraging The Power Of Artificial Intelligence In Today's World.

Using AWS CLI For Basic Cloud Operations

WHY AND HOW NASA ADOPTED AWS CLOUD?

Big Data: Problems and Solutions.

社区洞察

其他会员也浏览了

DABL

Issue #311 - The ML Engineer ??

? Study on operator bugs, 100 million images for just $100, CRD generation pitfalls, Kyverno's mutating webhooks, eBPF probes and you

Boost Your Career: Access 100 Free Certified Courses on Udemy and Coursera

Two Towers Model: A Custom Pipeline in Vertex AI Using?Kubeflow

GroupBy #11: Python at Meta, Netflix Incremental Processing with Apache Iceberg, 2023 AI year in brief

ModernBERT for Faster RAG

Issue #192 - THE ML ENGINEER ??

Issue #176 - THE ML ENGINEER ??

The Unofficial Guide to Picking the Right Coding AI Assistant for Software Developers

GroupBy #11: Python at Meta, Netflix Incremental Processing with Apache Iceberg, 2023 AI year in brief