登录查看更多内容

What the hell is post-deployment data science?

??Hakim Elakhrass

post-deployment data science | OSS | co-founder @ nannyML

发布日期: 2022年7月6日

Hey, this is Hakim.

I hope you enjoyed the first edition of this newsletter on post-deployment data science and NannyML. We talk about going open source and building in public. If you haven't read it, please check it out?here .

You might be asking yourself, what the hell is post-deployment data science? And isn't it just MLOps or machine learning monitoring? Is it just another buzzword?

In this edition, I explain post-deployment data science, how it's different from monitoring, its challenges, and what's next with NannyML.

What is post-deployment data science?

Post-deployment data science is all data science work done on a model after it has been deployed in production. This means monitoring model performance, business impact, understanding feedback loops, and optimizing the model to perform better over time.

It can also include extracting insights from business processes that a model handles.

For example, if your credit scoring model performance changes and you know why, you might be able to point to a change in how your customers take loans. This information will help the business make decisions better.

How is it different than monitoring?

There are two types of machine learning monitoring. Operational monitoring and ML-Specific monitoring. Operational monitoring looks at uptime, inference time, and deployment status and is covered under the MLOps practices. ML-specific monitoring is monitoring data drift, concept drift, and model performance.

While ML-specific monitoring is fully covered under post-deployment data science, operational monitoring is not.

Additionally, Model Monitoring is reactive. It tracks data drift, concept drift, and performance changes after they have happened and attempts to explain them. In post-deployment data science, you try to make post-deployment data science as proactive as possible in addressing possible issues using ML models in production.

Post-deployment data science generally covers all the data science work you do once a model has been deployed. Including identifying feedback loops, optimizing performance, ensuring the models drive business impact, and most importantly, fixing issues with ML performance after they have been identified.

Why is it important?

As more ML models go into production, the impact of models on our society will increase. An automated system will make critical decisions across businesses and the government, impacting every part of our lives. Conversely, model's failing will put people's lives at risk, and companies to lose billions.

We need to be able to manage all these models in production. If a data science project is planned right, then most of a model's life will be spent in the real world making impactful decisions.

Pratibha Kumari J. 2 个月前

The Data Scientist Role

Data & Analytics 1 年前

The Data Science Workflow

?? Matt Dancho ?? 5 年前

Being able to make sure these decisions stay sound across the model's lifecycle, as well as being able to learn from them, will be an essential part of the future.

What are the challenges?

The first challenge is knowing a model's performance. Unfortunately, many models do not have ground truth in production, meaning you do not know the actual outcome of the prediction for a long time, if ever.

For example, in credit scoring, when you predict if someone should get a loan or not, you do not know if this was a good decision until they either pay it back or default, which can be years into the future.

The second challenge is that models fail silently. The model will make a prediction if the data is in the correct format. Models will always make predictions if the data is in the correct format. If you don't know the performance of those predictions, then it can fail without anyone knowing about it for a long time.

The third challenge is a lot of data drift is virtual. Most changes to the model's input data do not affect performance, so it can make alerting systems very noisy. Noisy alerting means people can too easily ignore alerts that matter because they get used to false alarms, making them sometimes more problematic than helpful.?

Finally, feedback loops might change the relationship between the technical metrics of the model and the business metrics.

For example, you might make a credit scoring model with a ROC AUC of 0.7, allowing you to keep default rates below 5%. But as you keep making more predictions on the same customer base and the business takes action based on this that changes the real-world results of the predictions, you might need a ROC AUC of 0.9 to achieve the same business results.

What's next?

In the next edition, I'll share more about the open-source library roadmap of NannyML. We have an exciting new release coming up soon.

We've also been diligently researching performance estimation for regression. Expect more news on that soon ;)

If facing some of these challenges in post-deployment data science, check out our?open-source tool on github .

I'd love to hear about how you manage your machine learning models once deployed.

Thanks for reading!

Hakim

Post Deployment Data Science

2,632 位关注者

Evgeniy Kirichenko

Providing services in Staffing and Recruiting with TechDuck.

2 年

Hakim, thanks for sharing!

Michelangelo Puliga

A scientist with a long experience in data and models, but also a business developer and product creator. Innovation & ethics are my guides.

2 年

PostAtomic data-science is the key to survive the harsh conditions of production environments . Real data emanate high level of radiation, can fool your instruments, go over the scale (like outliers) and destroy your protection equipment. So yes, post-deployment data science is a most need tool in such moments. Do not forget to bring with you a torch, batteries, and a helmet !

查看更多评论

要查看或添加评论，请登录

查看全部

What the hell is post-deployment data science?

??Hakim Elakhrass

post-deployment data science | OSS | co-founder @ nannyML

领英推荐

What's next?

Post Deployment Data Science

2,632 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Gartner vs Forrester on Data Science Platforms and Machine Learning Solutions

Why is Data Science important?

Exclusive Sneak Peak At What Is Data Science!

What is Data Science?

Step-by-Step Guide to Data Science at ONLEI Technologies

Getting Started with Data Science at ONLEI Technologies

The Significance of Data Science in the Modern World

Role of Data Science in the Business World

Optimal Data Science (product)

THE DATA DRIVEN DECISION MAKING - The Alternative Approach

领英推荐

What's next?

Post Deployment Data Science

2,632 位关注者

All about ML Monitoring in Finance

2024年9月27日

Retraining is not all you need

2024年7月19日

Using Bayesian Stats to Evaluate Your Model During a Pilot

2024年6月21日

PSI, or Not PSI, That Is the Question

2024年6月7日

People are catching up with post-deployment data science

2024年6月3日

Industrial Machine learning Use Cases

2024年5月22日

Can we detect LLM hallucinations?

2024年5月8日

Exploring LLM hallucination detection methods

2024年2月9日

Post-deployment hype

2023年11月3日

Data science is changing

2023年9月22日

社区洞察

其他会员也浏览了

Gartner vs Forrester on Data Science Platforms and Machine Learning Solutions

Why is Data Science important?

Exclusive Sneak Peak At What Is Data Science!

What is Data Science?

Step-by-Step Guide to Data Science at ONLEI Technologies

Getting Started with Data Science at ONLEI Technologies

The Significance of Data Science in the Modern World

Role of Data Science in the Business World

Optimal Data Science (product)

THE DATA DRIVEN DECISION MAKING - The Alternative Approach