KISS at OpenAI, #batchforlife, and data science conspiracies

KISS at OpenAI, #batchforlife, and data science conspiracies

Keep it simple stupid. From openAI, simple batch processing, and the simplicity of AWS Sagemaker Algorithms. That’s the theme this week.

Our industry is moving fast, as data scientists we sometime’s forget that at the core of our work is simple elegance. We need to strive for simple models, simple architectures, and simple data transformations (when possible of course).

This week we will look at all the conversations that happened about simplicity in data science, learn about batch processing post-deployement (#batchforlife), and how easy to use sagemaker Algorithms are (??)… oh and a data science conspiracy that impacts all of us.


KISS at Open AI

Greg, Elon, and Bojan talking about model complexity
Shots fired

Friendly remind: The fancier the model, the less likely it is to work. The art of benchmarking with simple models is something junior data scientist (and some seniors) struggle with. Something in the water this week was inspring conversations about simplicity in ML across linkedin and twitter.

One of the things that is unfortunately not simple is evaluating the quality of speech to text models. Especially in an obscure language like flemish. With words like muggengeheugen, a new dialect every 2 streets (as Niels Nuyttens to pronounce this world and then ask Wiljan Cools too..).

Normally you need to humans to review all the files and you calculate a mean opinion score. But Silke Plessers wrote a blog researching using PCA-based Reconstruction error to automatically evaluate quality.

And also unfortunately for Greg at openAI, LLM evaluation is also not simple. Hopefully this research will lead to the automated (quantitative) evaluation of LLMs.

#batchforlife

Deploying models in production is already complex enough. Thankfully most machine learning use cases today need batch processing, not streaming. Maria Vechtomova and co. at Marvelous MLOps wrote a great post about deploying models in batch mode.

Evening as streaming use cases become more common, batch processing isn’t going anywhere. Especially in AWS Sagemaker, where batch transforms make it simple.

Diagram about batch processing

Sagemaker Algorithms and their beautiful simplicity

Sagemaker Algorithms allows you to very simply take a model from training to deployment.

Check out the full blog about SageMaker Algorithms and how to deploy them.

Speaking of Sagemaker Algorithms ??

They took 'er algos ??

Data science conspiracy

There is a conspiracy that effects all data science.
… continue reading about the conspiracy on substack

Shout out to all the great people in the MLOps and Post-Deployment Data Science community

Thanks

Thanks Rapha?l Hoogvliets for some great conversations this week and giving some insight into how you run your batch processes

G?khan ?iflikli thanks for writing a great blog on feature drift. One of the best pieces on the topic, check it out -> https://www.gokhan.io/python/model-monitoring-nannyml/
Bernardo holding it down on reddit
Last but not least,

And in the final hour Raghu Venkat . Indeed it is! Great to have you around.

And of course very one else mentioned in this edition Maria Vechtomova , Silke Plessers , Bojan Tunguz, Ph.D. . There are of course many more people, but these are what i could remember, thanks even if i didnt mention you here.

Don't forget to check out the data science conspiracy:

Read the rest on Substack


Stijn (Stan) Christiaens

Co-founder & Chief Data Citizen at Collibra

1 年

I’m here for the generated images

Rapha?l Hoogvliets

Tech Lead | Follow me for MLOps stuff | Creating the future's technical debt, today

1 年

Thank you too ??Hakim, it’s a pleasure :)

G?khan ?iflikli

Director of AI & ML | PhD | Text-to-image in Ads space

1 年

Thanks for the shoutout in the article ??Hakim, happy to be involved!

Maria Vechtomova

MLOps Tech Lead | 10+ years in Data & AI | Databricks MVP | Public speaker

1 年

Thanks for the great conversations, ??Hakim Elakhrass, looking forward to future collaboration!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了