INTEGRATION OF ML/DL WITH DEVOPS

INTEGRATION OF ML/DL WITH DEVOPS

TASK-3 MLOPS

CHALLENGES:

MLOps level 0 is common in many businesses that are beginning to apply ML to their use cases. This manual, data-scientist-driven process might be sufficient when models are rarely changed or trained. In practice, models often break when they are deployed in the real world. The models fail to adapt to changes in the dynamics of the environment or changes in the data that describes the environment.

SOLUTION:

To address the challenges of this manual process, MLOps practices for CI/CD and CT are helpful. By deploying an ML training pipeline, you can enable CT, and you can set up a CI/CD system to rapidly test, build, and deploy new implementations of the ML pipeline.

This project is a simple or basic version of the solution addressed above


Task Description:

  1. Create a container image that has Python3 and Keras or NumPy installed
  2. When we launch this image, it should automatically start training the model in the container.
  3. Create a job chain of job1 to job5 using the build pipeline plugin in Jenkins
  4. Job-1: Pull the Github repository automatically when some developers push the repository to Github.
  5. Job-2: By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter, install image container to deploy code, and start training.
  6. Job-3: Train your model and predict accuracy or metrics
  7. Job-4: If accuracy is less than 80%, then tweak the machine learning model architecture
  8. Job-5: Retrain the model or notify that the best mode is being created
  9. Create one extra job Job-6 to monitor: If the container where the app is running fails due to any reason then this job should automatically start the container again from the last trained model left

My Steps involved in achieving the above tasks:

  1. I have used PyTorch framework here instead of Keras I have used a prebuilt PyTorch image available at hub.docker.com Creating the docker file is very easy with this:
  2. I have created two Dockerfiles: This one without hyperparameter tuning support
  • This one will create images with hyperparameter tuning support
 FROM pytorch/pytorch

 CMD ["python", "train.py"]
  • This accepts a command-line argument -t which is a type of boolean and if True it activates hyperparameter tuning that I have defined inside the training code.
 FROM pytorch/pytorch

 CMD ["python", "train.py", "-t True"]

3. This step will create a build pipeline of JOB-1 to JOB-5 Final look: 

No alt text provided for this image

4. JOB-1: This will pull the GitHub repo whenever it updates or changes using GitHub webhook technique inside a folder /pytorch Choose this option in configuration and update the GitHub webhook accordingly.

No alt text provided for this image

 Command to copy all files pulled from GitHub to the /pytorch folder sudo cp -v -r -f * /pytorch

5. JOB-2: This job will create an os image by looking at the code inside train.py Here it will create a PyTorch image as my code contains Convolutional Neural Network implemented using PyTorch

No alt text provided for this image

The above code in the image can be changed as below to meet the task requirements

if cat /pytorch/network.py | grep Conv2d
then
  if sudo docker images | grep pytorch_train_without_hyper
  then
    echo "Required image already exist! Next job will run a container using this image"
  else
    echo "Creating the required Image..."

      if sudo docker build -t pytorch_train_without_hyper /pytorch-dockerfile/dockerfile1/
      then
        echo "Image created Successfully"
      else
        echo "Something went wrong while creating the image!"
      fi
  fi
else
  echo "Implement for other types of deeplearning and machine learning algorithms using else if statements"

6. JOB-3: This will create and run a container using an appropriate os image. Running this container will automatically start training the network for certain epochs and the test accuracy will be saved in the accuracy.txt file inside the same folder. Adding the below command in the build->Execute shell

No alt text provided for this image

This code can also be changed as below to meet the task requirement

if cat /pytorch/network.py | grep Conv2d
then
  sudo docker run -v /pytorch:/workspace pytorch_train_without_hyper
else
  echo "using else if statement and grep we can implement other functions in the similar way as this one"

If everything goes well then the output will look like this: The accuracy is very low as I have trained this model for only 1 epoch. 

No alt text provided for this image

7. JOB-4: This job will fetch the accuracy saved in a file accuracy.txt and checks whether it meets the condition such as whether the accuracy is greater than 80% or not. If the accuracy is less than the expected one then this job will recreate an OS image using Docker file saved inside /pytorch-dockerfiles/dockerfile2/Dockerfile. The code for Dockerfile is already mentioned in step 1. This way it checks for the condition and creates an image if it doesn't exist.

No alt text provided for this image

If everything goes well here then the output will look like this:

No alt text provided for this image

8. JOB-5: This job will train the network with hyperparameter tuning: Supported Hyperparameter to tune here is: Learning Rate and Optimizer However I could have added dropout, epochs, etc. But for now, there are no good resources about hyperparameter tuning in PyTorch, so, I have implemented this using simple for loops and list of parameters to tune. My virtual box is not being able to use Cuda so if I add many hyperparameters now this will take a very long time to train. That's why I have added only a few hyperparameters to tune. This job will build only if the current accuracy is less than the required which is 80% in this case.

No alt text provided for this image

Output if this JOB builds successfully This is just a sample output Same will be send to the owner also, through email.

No alt text provided for this image

Important Note about this job. After successful completion of this job, it will send email to the user/owner regardless of how much the accuracy is. It will send the best accuracy the model reached so far and the best hyperparameters that were used. It's not true that if we tune the model if will perform well. The accuracy gets increased but it's not always possible that these hyperparameter pairs always perform well. So I have done this to avoid the model Jenkins's job to enter an infinite loop. How will Jenkins Job enter an infinite loop? If the jobs are created in a way where the job2 will train a model and if the accuracy is not enough then the job4 will tune the hyperparameters and again invoke job2 to retrain. But in the worst case if none of the hyperparameter pairs give the required accuracy then this will keep executing like-> job2 will keep retraining and job4 will keep invoking job2 to retrain the model.

EMAIL EXAMPLE:

No alt text provided for this image
No alt text provided for this image


9. JOB-6 (Extra Job): This job will run once in every week. This will send the post request to job2. Commands to send a post request to job2 to train the model once in a week.

No alt text provided for this image
No alt text provided for this image

command:-

curl -X Post https://192.168.225.38:8080/view/Deep_learning/job/JOB-2/build?token=YourToken --user "username:password"

Dataset Used:

Convolutional Neural Networks


In this task, we train CNN to classify images from the CIFAR-10 database. The images in this database are small color images that fall into one of ten classes; some example images are pictured below.

No alt text provided for this image

Model Architecture Used

No alt text provided for this image
THANK YOU FOR GIVING YOUR PRECIOUS TIME
GitHub Link -> click here



要查看或添加评论,请登录

Wang Sherpa的更多文章

社区洞察

其他会员也浏览了