What the hell is post-deployment data science?

What the hell is post-deployment data science?

Hey, this is Hakim.

I hope you enjoyed the first edition of this newsletter on post-deployment data science and NannyML. We talk about going open source and building in public. If you haven't read it, please check it out?here .

You might be asking yourself, what the hell is post-deployment data science? And isn't it just MLOps or machine learning monitoring? Is it just another buzzword?

In this edition, I explain post-deployment data science, how it's different from monitoring, its challenges, and what's next with NannyML.

What is post-deployment data science?

Post-deployment data science is all data science work done on a model after it has been deployed in production. This means monitoring model performance, business impact, understanding feedback loops, and optimizing the model to perform better over time.

It can also include extracting insights from business processes that a model handles.

For example, if your credit scoring model performance changes and you know why, you might be able to point to a change in how your customers take loans. This information will help the business make decisions better.

How is it different than monitoring?

There are two types of machine learning monitoring. Operational monitoring and ML-Specific monitoring. Operational monitoring looks at uptime, inference time, and deployment status and is covered under the MLOps practices. ML-specific monitoring is monitoring data drift, concept drift, and model performance.

While ML-specific monitoring is fully covered under post-deployment data science, operational monitoring is not.

Additionally, Model Monitoring is reactive. It tracks data drift, concept drift, and performance changes after they have happened and attempts to explain them. In post-deployment data science, you try to make post-deployment data science as proactive as possible in addressing possible issues using ML models in production.

Post-deployment data science generally covers all the data science work you do once a model has been deployed. Including identifying feedback loops, optimizing performance, ensuring the models drive business impact, and most importantly, fixing issues with ML performance after they have been identified.

Why is it important?

As more ML models go into production, the impact of models on our society will increase. An automated system will make critical decisions across businesses and the government, impacting every part of our lives. Conversely, model's failing will put people's lives at risk, and companies to lose billions.

We need to be able to manage all these models in production. If a data science project is planned right, then most of a model's life will be spent in the real world making impactful decisions.

Being able to make sure these decisions stay sound across the model's lifecycle, as well as being able to learn from them, will be an essential part of the future.

What are the challenges?

The first challenge is knowing a model's performance. Unfortunately, many models do not have ground truth in production, meaning you do not know the actual outcome of the prediction for a long time, if ever.

For example, in credit scoring, when you predict if someone should get a loan or not, you do not know if this was a good decision until they either pay it back or default, which can be years into the future.

The second challenge is that models fail silently. The model will make a prediction if the data is in the correct format. Models will always make predictions if the data is in the correct format. If you don't know the performance of those predictions, then it can fail without anyone knowing about it for a long time.

The third challenge is a lot of data drift is virtual. Most changes to the model's input data do not affect performance, so it can make alerting systems very noisy. Noisy alerting means people can too easily ignore alerts that matter because they get used to false alarms, making them sometimes more problematic than helpful.?

Finally, feedback loops might change the relationship between the technical metrics of the model and the business metrics.

For example, you might make a credit scoring model with a ROC AUC of 0.7, allowing you to keep default rates below 5%. But as you keep making more predictions on the same customer base and the business takes action based on this that changes the real-world results of the predictions, you might need a ROC AUC of 0.9 to achieve the same business results.

What's next?

In the next edition, I'll share more about the open-source library roadmap of NannyML. We have an exciting new release coming up soon.

We've also been diligently researching performance estimation for regression. Expect more news on that soon ;)

If facing some of these challenges in post-deployment data science, check out our?open-source tool on github .

I'd love to hear about how you manage your machine learning models once deployed.

Thanks for reading!

Hakim

Evgeniy Kirichenko

Providing services in Staffing and Recruiting with TechDuck.

2 年

Hakim, thanks for sharing!

回复
Michelangelo Puliga

A scientist with a long experience in data and models, but also a business developer and product creator. Innovation & ethics are my guides.

2 年

PostAtomic data-science is the key to survive the harsh conditions of production environments . Real data emanate high level of radiation, can fool your instruments, go over the scale (like outliers) and destroy your protection equipment. So yes, post-deployment data science is a most need tool in such moments. Do not forget to bring with you a torch, batteries, and a helmet !

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了