Peraplayaa review.Claim Your Free 999 Pesos Bonus Today

Welcome to Continual Learnings

A weekly newsletter for practitioners building ML-powered products.

What we're reading this week

ChatGPT: Woah, did this really come out since our last newsletter? You’ve all seen this already, but here are some of my takeaways:

ChatGPT is data flywheel innovation, not a modeling innovation. OpenAI is betting that LLMs are already pretty good, what’s missing is incorporating human feedback into the loop and improving models iteratively in production
This is not AGI (at least yet ??). The ChatGPT examples I’ve seen are pretty balanced between staggering displays of competence and incompetence. There are tons of examples online of ChatGPT hallucinating information and getting hacked to change its behavior. As promising as LLMs are, they’re not going to have the impact they are capable of?unless we can develop robust guardrails and iteration loops around models
As a corollary, one productive way to think about the impact of LLMs like ChatGPT is further reducing the cost of modeling. 10 years ago, building deep learning prototypes required months and a team of PhDs. Today, anyone with access to a browser can build a sophisticated prototype in minutes. Most of the work in building ML systems in the future will be turning that prototype into a robust, reliable production system

MLOps: A Holistic Approach: There’s a fair amount of W&B marketing fluff here, but also some useful insights about what’s hard today and how to scale up some of the core MLOps processes

Why I’m optimistic about our alignment approach: Jan Leike, a researcher at OpenAI, expands on why they are taking a narrower view to the Alignment problem: focusing reinforcement learning from human feedback on narrow AI systems that are deployed today. While Alignment / Safety research is often too future-looking to be relevant to today’s practitioners, OpenAI's approach is pragmatic, and, in my opinion, pretty similar to how every company will build ML systems in 4-5 year

Production ML papers to know

In this series, we cover important papers to know if you build ML-powered product.

How is ML operationalized in Industry?

You’ve probably seen countless blogs and courses on how to do MLOps. While a great starting point, at best they represent the views of one company (and at worst, they can be?outright misleading).

So what does MLOps look like across the industry? How do ML practitioners really get models into production? And what are the challenges they face?

This recent paper?looks to ML engineers across industries to answer these questions. So, what’s really going on with MLOps?

The MLOps Process

The paper defines the MLOps process as “a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production”.

These responsibilities are “staggering”, with ML Ops “widely considered to be hard” - perhaps because our current understanding is “limited to a fragmented landscape” of white papers, thought pieces and a “cottage industry” of start-ups aiming to address MLOps issues.

This paper aims to clarify MLOps by identifying what it typically involves and the gaps. We have summarized the (extensive) findings below. This starts with the common practices for successful ML experimentation, deployment, and sustaining production performance, and ends with a summary of MLOps pain points and anti-patterns that need to be addressed.

The Three “Vs” of MLOps

The paper identifies three properties of the ML workflow that dictate success for a ML deployment:?Velocity?(prototype and iterate on ideas quickly);?Validation?(test changes, prune bad ideas, and proactively monitor pipelines for bugs as early as possible); and,?Versioning?(manage multiple versions of production models and datasets for querying, debugging, and minimizing production pipeline downtime).

These properties are present - and sometimes in tension - in the common practices and pain points discussed below.

Developing models

ML engineering was found to be very experimental and iterative - it is beneficial to prototype ideas quickly and demonstrate practical benefits early. Here are some keys to successful prototyping:

Good Project Ideas Start With Collaborators.?Ideas, such as new features, often come from domain experts or data scientists
Iterate on the data, not necessarily the model.?Experiments that provide more data or context to the model often work better
Account for diminishing returns.?Work on ideas with the largest gains, as these gains will diminish through the stages of deployment
Small changes are preferable to larger changes.?Keep code changes as small as possible and support config-driven development to reduce bugs

Evaluating and deploying models

The goal of model evaluation is to prevent bad models from making it to production without compromising velocity. Here are some keys to successful evaluation and deployment:

Validation datasets should be dynamic.?Engineers should update validation sets systematically in response to live failure modes and model underperformance on important user subpopulations
Validation systems should be standardized.?Keeping in mind that it might be difficult given the point above, and the tension this creates with pursuing velocity
Spread a deployment across multiple stages.?In the study, a ‘shadow stage’ was often helpful in convincing stakeholders of the benefits of a new deployment
ML evaluation metrics should be tied to product metrics.?This should be an explicit step in an engineers’ workflow and align with other stakeholders to make sure the right metrics were chosen.

Sustaining Model Performance

According to the paper, sustaining high performance in production pipelines is hacky for a lot of organizations. Instead,?sustaining models [should] require deliberate software engineering and organizational practices. These include:

Create new versions: frequently label and retrain on live data.?It could be every day (“you don’t really need to worry about if your model has gone stale if you’re retraining it every day”), or when a pre-defined threshold for pipeline performance is triggered
Maintain old versions as fallback models.?This reduces downtime when a model is broken by having a fallback model to revert to
Maintain layers of heuristics.?For example, you can add a heuristics layer on top of anomaly detection model to filter surfaced anomalies based on domain experience
Validate data going in and out of pipelines.?Continuously monitor production models with checks on expected features, their types and how complete the data was
Keep it Simple.?This manifested in different ways, with some preferring simple models where possible, and others utilizing higher-capacity deep learning models to simplify their pipeline.

Persistent MLOps Pain Points

The paper highlights persistent pain points - expressed as tensions and synergies between the three “Vs” covered earlier - and uses them to suggest opportunities for future tooling. The main pain points were:

Mismatch Between Development and Production Environments.?Examples include data leakage, Jupyter Notebooks usage, and non-standardized code quality (with production code sometimes not reviewed because ML was “experimental in nature” and reviews were a “barrier to velocity”!)
Handling A Spectrum of Data Errors.?The spectrum includes hard (mixing or swapping columns), soft (such as a few null-valued features in a data point), and drift errors. As a side note, the paper found it’s hard to create meaningful alerts, which can lead to?alert fatigue?for the team
Taming the Long Tail of ML Pipeline Bugs.?These bugs are long-tailed, which makes them hard to write tests for, and creates a “sense of paranoia”
Multi-Staged Deployments Seemingly Take Forever.?Multiple participants complained that the process from conception to validating the idea took too long. This means that if ideas can be invalidated in earlier stages of deployment, then overall velocity will increase

MLOps anti-patterns

The paper highlights some MLOps anti-patterns, like:

Industry-Classroom Mismatch.?The skills required to do MLOps effectively are learned in “the wild”, not in school
Keeping GPUs Warm.?Sometimes, teams focus on running a lot of experiments rather than the right ones. Hyperparameter searches are often overrated.
Retrofitting an Explanation.?Engineers sometimes “just try everything and then backfit some nice-sounding explanation for why it works”
Undocumented Tribal Knowledge.?High velocity experimentation makes it hard to maintain documentation. We’ve all seen the model everyone is afraid to touch because the developer left the company

Conclusions

This paper has?so many?useful nuggets of wisdom about production ML, so you should just read it. Here are a few of our high-level conclusions:

MLOps is fragmented and changing quickly
Production ML is not just about ML, it’s also about the business context. For example: shadow deployment stages to build buy-in, business metrics to validate mode value, retrofitted explanations to justify what works, and the continual need for velocity to demonstrate practical progress early
Industry approaches diverge from those grounded in academia. For example: in validation (hold out datasets with one metric versus dynamic updating of evaluation sets), and in dealing with distribution shifts (daily retrains on fresh data to generate new models)
MLOps differs in practice from software best practices. For example: code isn’t reviewed as frequently, pipelines can be undocumented, and the experimental nature of ML persists into production

This paper is a fascinating snapshot of a nascent field, and well worth reading in order to support the development of your MLOps practices.

The paper can be found?here.

Thanks for reading!

Feel free to get in touch if you have any questions: you can message us on socials or simply reply to this email.

You can also find previous issues on?our blog?and on?twitter.?

The Gantry team

How do people actually operationalize ML in 2022?

Gantry

Continuously improve your ML-powered products

Welcome to Continual Learnings

What we're reading this week

Production ML papers to know

How is ML operationalized in Industry?

The MLOps Process

The Three “Vs” of MLOps

Developing models

Evaluating and deploying models

Sustaining Model Performance

Persistent MLOps Pain Points

MLOps anti-patterns

Conclusions

Thanks for reading!

Continual Learnings

1,227 位关注者

更多精彩文章

社区洞察

Welcome to Continual Learnings

What we're reading this week

Production ML papers to know

How is ML operationalized in Industry?

The MLOps Process

The Three “Vs” of MLOps

Developing models

Evaluating and deploying models

Sustaining Model Performance

Persistent MLOps Pain Points

MLOps anti-patterns

Conclusions

Thanks for reading!

Continual Learnings

1,227 位关注者

How to teach an old model new tricks

2023年3月14日

Putting Responsible AI into Practice

2023年2月23日

How to measure language model performance

2023年2月16日

Why do ML Projects Fail?

2023年1月26日

Monolith: The Recommendation System Behind TikTok

2023年1月10日

MLOps at Industrial-Scale: Lessons from Google

2022年12月23日

From prompt magic to prompt engineering?

2022年12月14日

Do You Really Need a Feature Store?

2022年11月30日

社区洞察