Adding Ops in ML now!

Adding Ops in ML now!

In this article I have given a brief introduction of Mlops. You will not find more technological names of software or development jargons, but you will enjoy reading this.You can check all details about MLops by just googling it , but the aim of this article is to make you realize the need of MLops , that is why major portion of article is on how a traditional ml project works and based on the problems we face in it we ponder over the fact that what Mlops can help?

If you have no idea about Devops, which is a pre- requisite for Mlops let me tell you first that it’s not a software, tool, or a programming language, it’s a set of principles and practice to standardize and streamline projects.

In Devops it was about the common development projects, in case of ML it also includes ML projects. The overall aim for this is to integrate the Process between the development team and operational team, and we plan to make teams collaborate to, automate, build, and test ml pipelines.

It is noted that, around 80 % of the ML projects don’t make into the production. To see why this happens we need to understand the traditional ML lifecycle-

So, in a summary,

Once the Data Scientist team, gets the business understanding , they do the requirement gathering and start the Data Acquisition , once that Is done the next step is of Data Analysis, where the Analyst do some preprocessing of the data and point out meaningful insights from the Data after that ,the modelling of data is done , this might include trying different models with different parameters , training the models and evaluating the model on the validation set. These all steps are usually performed by data scientist, Data analyst or a team with a mixture of both. Then comes the part of deploying the trained model, by using either API web framework, or some web services where the mostly the Data Engineer work to make the pipeline of the model and deploy the model and developers help to integrate the model with the already running application or creating a whole new application for the model.

Once the model, is deployed the operation team monitors the model performance, and go for retraining the model from time to time if required.

So, where is the problem coming from?

Let’s take a general prospective of a Data Scientist, You got the data in the CSV format and you were asked to build a ML model for so and so task, you opened your Jupiter notebook and started working, imported libraries, cleaned the data, did some analysis , scaled your data and tried some general ML models like Decision tree or Random forest . Let’s say you get some good result on Random Forest. Then to get even better result you did a hyperparameter tuning and found the best possible result. Just to remind your code is still on your local machine and is written based on your understanding, the libraries you are comfortable with.

Then after your work, you sent the code, to a guy in the data engineering team, to help you with the deployment. The guy copied the code in his repositories, trained the model and then deployed it.

Once deployed the developer work to integrate it with the existing code, now as a Data Scientist you might not have optimized the code that well, but the developer will have to do so, also some ML algorithms are not something that the developer is really concerned with, so she might also get stuck at some point.

Common solution of this is either, the developer spends more time with understanding the toolkit of data scientist or sit together with data scientist and solve the issues, both of which are hectic and time consuming. And more over this is not their Job.

Once this stage if crossed, the developer will have to align with the operation team, to integrate it with the team, now timing of this can be different for both the team. So maybe the update will wait for month end.

Now, let’s assume everything is done, and models runs well for some time. After some time, the operation team noticed the model performance to degrade with the new data. So the general process is, that the Data Scientist will again do the whole thing of cleaning, and modelling with new data. and then Data Engineer will run the pipeline, then deployment and then the model will be pushed in production.

Again, newer data will come, again the whole process will run!

Time, money, resources (both computational and Human) wasted!

So, to solve this!

Let’s do one thing, instead of deploying the model, we deploy a whole pipeline creating the model. Then we can also monitor the model and set triggers to rerun the model in the deployment server itself and change the parameters of it for better performance. And this trigger will automatically happen based on some threshold value of evaluation.

So, one problem of doing the cumbersome, cycle again is solved.

And Lets also do one more thing, we can generalize the coding pattern, like usage of some common libraries , this will also allow , the people to focus more on their individual roles.

Mlops is the new way to go!

?

POOJA JAIN

Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022

2 å¹´

Awesome?? Great share Shubhankit Sirvaiya

要查看或添加评论,请登录

Shubhankit Sirvaiya的更多文章

  • Doppelganger: Your Data has a Twin?

    Doppelganger: Your Data has a Twin?

    Zooming in on Big Data Big Data is a mess, although it contains a lot of data in all forms possible, when you aim is to…

    3 条评论
  • The Wrong way of Digitization?

    The Wrong way of Digitization?

    If you are the owner of a Product Based company, you must have realized that if you exclude Quality and Infrastructure,…

    5 条评论
  • “Data” - The Coal of 4th Industrial Revolution!!!

    “Data” - The Coal of 4th Industrial Revolution!!!

    2.5 quintillion bytes of data are produced by humans every day.

    5 条评论
  • The Next Big thing for Data Science!!

    The Next Big thing for Data Science!!

    What could be the next big thing for Data Science??? Will it me some new technology?? Some Advanced technical…

    2 条评论

社区洞察

其他会员也浏览了