MLOPS: integration between machine learning and devops

MLOPS: integration between machine learning and devops

Hello everyone I am Vedansh Shrivastava here we will discuss something about mlops which can solve so many use cases or the problems faced in machine learning models.

So Lets Start:-


Problem Statement :

1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile 

2. When we launch this image, it should automatically starts train the model in the container.

3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins 

4. Job1 : Pull the Github repo automatically when some developers push repo to Github.

5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the software required for the cnn processing).

6. Job3 : Train your model and predict accuracy or metrics.

7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.

8. Job5: Retrain the model or notify that the best model is being created

9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left

SOLUTION :

Before rushing to the solution we must know some keywords and some problems in machine learning . They are :-

HYPER-PARAMETER : They are the parameters which we cant determine with the help of any formula or algorithm but they are adjustable we can change them to achieve higher accuracy or different result . Such as no. of epochs , Size of filter , no. of neurons etc.

1)Machine Learning model are good in prediction but the accuracy of the model is a problem , we have to achieve highest accuracy possible but there are many hyper-parameters when they are changed they may or may not affect the accuracy the model they may increase or decrease the accuracy of model .

2)So to achieve a best set of hyper-parameters is so hard for humans and it is very time and resource consuming , So we have created an architecture that will do this work fully automated .

Now going to solution : -

1. So according to problem statement we will create docker images in which we will install the python interpreter and the modules respective for the code . If we have CNN code we will install pyhton , keras , tensorflow and more needed library . If the code is Linear Regression then we will install numpy , sklearn etc.

2. Now we get to our Jenkins now we will start building the job chain :-

Job1: It is the simplest and first job jenkins just have to go to the github and download the code provided by the developer and it will copy the whole code in a folder in workspace .We have our python code also in this folder which we will discuss ahead .

Here both the codes are created by same developer so he knows about the code but if the machine learning code will be created by some other so he will set some restrictions or the must write things for it like we are gonna need accuracy . we can save accuracy in some file all we need which is must in the code is the info regarding the accuracy. This code we need at the end of file .

scores = model.evaluate(X_test , y_test , verbose=1)
print("test loss" , scores[0])
print("teat accuracy" , scores[1])

accuracy_file = open('/mlops/accuracy.txt' , 'w')
accuracy_file.write(str(scores[1]))
accuracy_file.close()


No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Job2: By looking at the code we have to identify which type of code it is we can use many ways for this but one way that i used is we can write a python programme which will read the code and find some specific string in the code if that string is present then we can identify it . I have used simple file handling in python . If the code is CNN then jenkins will identify it and it will launch the respective os for it with correct docker image . It will also copy the model code in the volume attached .

No alt text provided for this image

Job3: There we will train our model and our code must have some parts which will take the accuracy and store it in a file namely "accuracy.txt'' and this file must be created in the same directory in which we have model and all data. Now this will do all the magic for to understand you need to see code.

No alt text provided for this image
mport pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
model=Sequential()
model.add(Convolution2D(filters=32,kernel_size=(3,3),activation='relu',input_shape=(64,64,3)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(units=128,activation='relu'))
model.add(Dense(units=64,activation='relu'))
model.add(Dense(units=32,activation='relu'))
model.add(Dense(units=16,activation='relu'))
model.add(Dense(units=8,activation='relu'))
model.add(Dense(units=1,activation='sigmoid'))
print(model.summary())
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
from keras_preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
training_set = train_datagen.flow_from_directory(
        '/model_files/images/images/train/',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')
test_set = train_datagen.flow_from_directory(
        '/model_files/images/images/test/',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')
model.fit(training_set,
        steps_per_epoch=100,
        epochs=10,
        validation_data=test_set,
        validation_steps=800)
scores = model.evaluate(test_set , verbose=1)
print("test loss" , scores[0])
print("test accuracy" , scores[1])


accuracy_file = open('/mlops/test_accuracy.txt' , 'w')
accuracy_file.write(str(scores[1]))
accuracy_file.close()
test_accuracy = scores[1]
import os
if test_accuracy < 0.80 :
	print("Accuracy is less than 80 Run accuracy File")
	os.system("sudo python3 accuracy.py")
else:
	print("Just got the Accuracy Greater Than 80 [Accuracy :- {}]".format(test_accuracy))
	model.save("mlops_task3.h5")

This is the code for the model we are gonna train model from this code , in the end you can see that if the test_accuracy is less then 80% then it will run the another code that is accuracy.py it doesn't do much but it will open the test_accuracy.txt file and read the value if it is less then 80% then this file will run another file that is code_changer this file will change the code according to need and tune some hyper-parameters and then it will again run the model and train it.

import os


f=open('/model_files/test_accuracy.txt')
content=f.read()
accuracy=int(content[-47:-45])
print("Accuracy is ",accuracy)
if accuracy < 80:
	print("Accuracy is less than 80 Running Hyper Parameter File")
	os.system("python3 /model_files/code_changer.py")
    os.system("python3 /model_files/model.py")
else:
	print("Just got the Accuracy Greater Than 80 [Accuracy :{}]".format(accuracy))

So like this the process will be like a loop until the minimum accuracy it attained as soon as the 80% accuracy is attained the model will get saved from the name of "mlops_task3.h5" and the process will stop.

CONSOLE OUTPUT:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image


Job4: This is very simple job it will only notify the developer that model is ready . I have used Email notification so as job3 will be build stable this job will fail and we will get the email notification at the email that we have given.

No alt text provided for this image
No alt text provided for this image


Job5: This is last and final job after this our model will be completed , this job will go and check that the containers that we have used for model training are running or not if they are not running then it will notify the developer.

No alt text provided for this image

Now we can talk about the pyhton code our code is not so great it have very basic python file handling .

BUILD PIPELINE VIEW

No alt text provided for this image

Reader.py :

we just used normal file handling first we loaded model.py file then we open it in read mode and store data on ram then searched for some special strings such as in Linear model the word ''Linear" will be there for sure , in KNN code the word ''Kneighbour" will be there for sure similarly in the CNN code the string "keras" or "tensorflow" and "Conv2D" will be there in this way we can identify the code.

dev_code = open('root/task3_data/model.py')  #as normal file handling this code will open the file  and store in dev_code


code   ?= dev_code.read()  #this have just a read function which will read tha whole code and store it in code


if  'sklearn' or 'pandas' and 'LinearRegression' in code :  
#we are using if function on the readed file and find some specific words like sklearn or pandas and LinearRegression if these words are found then them we will say if is LinearModel
	print('LinearModel')


elif 'KneighborsClassifier'  :


#similarly this will find the word KneighborsClassifier and if it found then it is KNN_MODEL
	print('KNN_CODE')


elif 'keras' or tensorflow in code :
#similar
	if 'Conv2D' in code :
		print('CNN_CODE')
	else :
		print('NOT_CNN')
else :


code_changer.py :

In this code we used different approach in different type of models we are gonna discuss mainly CNN in this we just played with the hyper parameter we have increased the no of epochs mostly and increased the no. of neurons and changed some other parameters.

import re
file=open("/model_files/model.py")
content=file.read()
lines = content.split("\n")
ep=0
kernel_1=0
kernel_2=0
pool_size_1=0
pool_size_2=0
epoch="epochs=*\d{1,3}"
pattern2=r"filters=*\d{1,3}"
pattern3=r"kernel_size=\(\d{1,3},\d{1,3}\)"
pattern4=r"pool_size=\(\d{1,3},\d{1,3}\)"
ep=0
kernel_1=0
kernel_2=0
pool_size_1=0
pool_size_2=0
check=0
index=0
layers = []
count=0
for i in range(len(lines)):
    if 'model.add(Convolution2D(' in lines[i] and 'model.add(MaxPooling2D(' in lines[i+1]:
        print(lines[i])
        print(lines[i+1])
        layers.extend([lines[i],lines[i+1]])
        count+=1
        index=i+2
print("Count is ",count)
layers=[layers[-2],layers[-1]]
if count < 3:
    print("Added One CRP Layer")
    print("Now total number of CRP Layers is ",count+1)
    lines.insert(index,layers[0])
    lines.insert(index+1,layers[1])
elif count ==3:
    for i in range(len(lines)):
        if 'model.add(Convolution2D(' in lines[i]:
            kernel_size=lines[i].index('kernel_size')
            filters=re.findall(pattern2,lines[i])
            kernel=re.findall(pattern3,lines[i])
            temp_line=lines[i]
            if len(filters)>0:
                new_filter=int(filters[0].split("=")[-1])+5
            if len(kernel)>0:
                a=kernel[0].split("=")[-1]
                b=a.split(",")
                kernel_1=int(b[0].split('(')[1])+2
                kernel_2=int(b[1].split(')')[0])+2
            
            if check > 0:
                lines[i]="model.add(Convolution2D(filters={},kernel_size=({},{}),activation='relu'))".format(new_filter,kernel_1,kernel_2)
            
            check+=1
        elif 'model.add(MaxPooling2D(' in lines[i]:
            pool_size=lines[i].index('pool_size')
            result4=re.findall(pattern4,lines[i])
            if len(result4)>0:
                a=result4[0].split("=")[-1]
                b=a.split(",")
                pool_size_1=str(int(b[0].split('(')[1])+2)
                pool_size_2=str(int(b[1].split(')')[0])+2)
            lines[i] = "model.add(MaxPooling2D(pool_size=({},{})))".format(pool_size_1,pool_size_2)
            
        elif 'epochs' in lines[i]:
            result=re.findall(epoch,lines[i])
            if len(result)>0:
                ep=int(result[0].split("=")[-1])+5
            lines[i]="epochs={},".format(ep)  
                
print("Insert CRP layer at Index {} if Number of CRP Layers are less than 3".format(index))
print("Layers ",layers)
file.close()
for line in lines:
    print(line)
content=""
for new_line in lines:
    content=content+new_line+"\n"
file=open("/model_files/model.py","w")
file.write(content)
file.close()

Github URL : https://github.com/Vedanshshri/abc.git

THANKS FOR READING YOU ALL ARE WELCOME TO GIVE REVIEWS IN COMMENTS ........................

要查看或添加评论,请登录

Vedansh Shrivastava的更多文章

社区洞察

其他会员也浏览了