DATA Pill #039 - unleashing ML power at Lyft and Spotify, BigData funeral and lots of news
Hi,
Today we prescribe:
A record number of new, powerful ML case studies,?
Kafka, Cassandra and Flink tutorials.
A large portion of data content.
ARTICLES
Powering Millions of Real-Time Decisions with LyftLearn Serving | 7 min | ML | Hakan Baba & Mihir Mathur | Lyft Engineering Blog
The key component for our ML platform is LyftLearn Serving. LyftLearn Serving is a robust, performant, and decentralized system for deploying and serving ML models; it can be used by any team at Lyft to easily infer models online through network calls.
In this article you will learn about the major components and important design decisions, key ideas, lessons learned and the next steps.
Unleashing ML Innovation at Spotify with Ray | 10 min | ML | Divita Vohra, Keshi Dai, David Xia & Praveen Ravichandran | Spotify Engineering Blog
The machine learning journey at Spotify. How ML has developed. What the lifecycle of ML projects at Spotify looks like and what the next steps are.?
Enabling MLOPs in Three Simple Steps | 7 min | MLOps | Dustin Liu | Toward Data Science Blog
Dustin shares his experience with a project he engaged in based on involving the implementation of a multi-class classification prediction system, utilizing the financial transactional data, comprising over 10 million records and over 70 classes.
Through this project, he constructed a simplified MLOPs integration from an end-to-end ML flow perspective, that can be implemented in three steps:
1. Adding Data Extraction & Transformation Governance.
2. Productionalizing ML code.
3. Model Experiment Tracking and Management.
Data integrity vs. Data quality | 5 min | Data Engineering | Saeed Mohajeryami, PhD | Personal Blog
Data integrity and data quality are related yet distinct concepts in data engineering. Saeed digs deeper into them and explains each one of them and highlights their differences. Here are a few techniques that are used to ensure the integrity and important checks that should be considered regarding data quality.
In MORE LINKS you will find 5 ways to fix Broken Data Lineage, the new GitHub code search, a Kafka use case from etsy and an article titled: Big Data is Dead
TUTORIAL
MLOPS | CICD with Airflow | 7 min | MLOps | Tapan Kumar Patro | Personal Blog
This one introduces you to MLOps, demonstrates the usefulness, tells more about what processes / flows / pipelines are. What's more, you can find here a tutorial on how to build your own pipeline and consolidate each job.
领英推荐
In MORE LINKS you will find Flink SQL and Cassandra tutorials.
NEWS
Microsoft launches Teams Premium with features powered by OpenAI | 2 min | AI | Tom Warren | The Verge Blog
OpenAI’s GPT-3.5 model got Microsoft Teams to the next level. Notes, mentions and a full transcript are all available, with each speaker’s contributions highlighted in a neat timeline of topics and chapters. Are you hungry for more? Dive into the text to read about AI-powered intelligent recap features, some existing Teams features and better meeting protections.
In MORE LINKS news from Anaconda, dbt and Google
PODCAST
Data Update - The best managers look for evidence in data - business intuition is no longer enough when making decisions. Two stories on how data-driven approach helped solve different business problems | 27 min | MLOps | Adrian Dembek, Piotr Menclewicz | Radio DaTa Podcast
If you hear that your company should be data-driven, but you're not sure what this means in practice, in this episode we share two stories of data driven companies. Both of them are examples of data literate companies from different perspectives. In the first story, you can learn how the big tech company were allowed to detect problems and start solving the right ones. The second one is how an e-commerce company prepared more effective promotions and increased revenue.
In MORE LINKS - AI in Healthcare podcast
?
CONFS EVENTS AND MEETUPS
GoDataFest 2023 | 15-17 Feb | In-person, Amsterdam
GoDataFest brings together conversations about the latest data technology into an event experience.? Engage with product specialists and experienced practitioners of leading tech. Learn from the experiences of industry-leading enterprises. GoDataFest is proudly powered by Xebia.
Optimizing data in Apache Iceberg: Performance strategies & Foundations of Data Teams | 16 Feb | Double Webinar
Last reminder before next week’s webinar:
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill?
Adam from the GetInData | Part of Xebia