Technical Debts in ML : MLOps
Only a small fraction of real-world ML systems is composed of the ML code in an actual production scenario

Technical Debts in ML : MLOps

DataBuzz(4/n):-

This is the fourth article of the Data Buzz Series (Simplifying AI Research once a week).4/n represents the third article of the n upcoming articles??.

You can find my previous articles?here

Hidden Technical Debt in Machine Learning Systems

If you have started your journey in the field of Machine Learning Operations(MLOps) as a student or an industry practitioner, this article will definitely help you to learn, unlearn or revise the fundamental steps to follow before productionalizing any ML model.

Technical Debts refers to the possible grey areas we should focus on which can make pay heavy price for an output error.

In this article, I will briefly explain such debts in context of building a Machine Learning model. All the references and detailed explanation of them are available in this research paper here like below :

No alt text provided for this image


(I have personally highlighted important sections, key areas and added references to this wonderful research paper on Hidden Technical Debt in Machine Learning Systems by Google, do go through it! Save time, skim through the important areas first ! ??)

To save you some time, here are the main highlights of the likely debts to be paid in a ML system development, if unaware of:-

  1. Model Complexity eroding model boundaries -

As the complexity of a Machine Learning Model increases, higher are the chances of mix and match of the input training data. This is called as Entanglement.

Complex models might also need another sequential/cascading ML model to learn and improve their own performance, a concept called as Correction Cascading.

And, in a complex system, a complex model might be correlated with many services and applications, not all of might be known to the Machine Learning Engineer/Developer. This might affect the model performance and make it difficult to decode a problem, hence appropriate steps should be taken to regularly check for dependencies on/by these Undeclared Consumers. Refer Section 2 here on how to solve this issue.

2. Data Dependencies -

A Data Dependency is often more expensive than a code dependency , and often Unstable data and Underutilized data dependencies which often affect the quality of data ingested for model training. Refer Section 3 here for better understanding.

3. Feedback Loops -

Direct Feedback loop and Hidden Feedback loops are the two possible ways of giving a feedback to a ML model to improve it's performance. While direct feedbacks are easy to visualize and assess, the real trick lies in finding and evaluating the hidden feedback loops to a model. Refer Section 4 here to mitigate these issues.

4.Anti patterns in ML system development -

Some common anti patterns in a ML system development are reusing an open sourced code for model, using higher number of data pipelines than required or may be missing out on an unrequired experimental piece of code in the editor or using multiple coding languages in the system which might make testing the system slow and inconvenient. Such concepts are often called as Glue Code,Pipeline Jungles,Dead Experimental Codepaths or Common smells. Refer Section 5 here to solve such issues.

I have added notes and highlighted all the above concepts in the paper so far. So, do read the paper for a faster understanding !

Moving on,

5. System Configuration -

Often it becomes difficult to manage and maintain a model, unless and until a systematic and unified process are used for model development, productionalizing and maintenance. Having a uniform system, helps to ensure better data ingestion, reproduce model running instances, compare model run times and improves model security. Refer Section 6 of the paper here to know more.

6. Dynamic changes in external world -

There can be quite a lot of change in the actual data going into the model in a production environment which might have a completely different distribution than the data on which the model was trained upon. This is called as a Data Drift and it is highly probable in a dynamic world. Steps to mitigate such issues have been highlighted in the paper, kindly go though section 7 here to make the model robust against such fluctuating data.

To summarize, these are the key areas to be wary of before productionalizing any Machine Learning Model. Please feel to free to comment any more step that could have been added on to the list !

I’ll go through one research paper on anything related to AI and Machine Learning every week and highlight key areas/add notes so that it becomes useful for readers to skim through the important sections quickly???

Do share my article if you like it and subscribe to my newsletter to stay Updated on the AI/ML Research!

Any suggestions/ discussions are most welcome in the comments!


Bonus Tip - For beginners in MLOps with prior experience in building a Machine Learning model, this is a great course from Andrew NG.


Happy Learning!






要查看或添加评论,请登录

Souptik Majumder的更多文章

  • Big Time Series forecast on Small Devices

    Big Time Series forecast on Small Devices

    DataBuzz(3/n):- This is the third article of the Data Buzz Series (Simplifying AI Research once a week).3/n represents…

    1 条评论
  • NLP 101: Keep or remove these-?????

    NLP 101: Keep or remove these-?????

    Data Buzz (2/n):- This is the second article of the Data Buzz Series (Simplifying AI Research once a week). 2/n…

社区洞察

其他会员也浏览了