DATA Pill #062 - Netflix's Data Mesh, Lyft’s ML, Ubers lakehouse and (best?) open-source LLM

DATA Pill #062 - Netflix's Data Mesh, Lyft’s ML, Ubers lakehouse and (best?) open-source LLM

Hi,

How meaty is the newest DATA Pill?

Cutting-edge innovations from Uber, Lyft, Google and more.

Discover Data Lakehouse ACID Upserts and real-time ML breakthroughs.

Immerse yourself in the world of data, AI and machine learning.


ARTICLES

Fast Copy-On-Write within Apache Parquet for Data Lakehouse ACID Upserts | 11 min | Data Engineering | Xinli Shang, Kai Jiang, Huicheng Song, Jianchun Xu, Mohammad Islam | Uber Engineering Blog

Dive into the exciting world of Uber's latest innovations. This article discusses the growing trend of building lakehouses on top of storage table formats like Apache Hudi, Apache Iceberg and Delta Lake for various use cases, including incremental ingestion. Explore the implementation of a row-level secondary index and innovative modifications in Apache Parquet to enhance the upsert data process, leading to significantly faster speeds compared to the traditional copy-on-write methods used in Delta Lake and Hudi.

No alt text provided for this image

Building Real-time Machine Learning Foundations at Lyft | 10 min | ML | Konstantin Gizdarski, Martin Liu | Lyft Engineering Blog

Lyft's case study on how to develop foundations that would enable the hundreds of ML developers at Lyft to efficiently develop new models and enhance existing models with streaming data.

No alt text provided for this image


The Top 3 Data Architecture Trends (And How LLMs Will Influence Them) | 5 min | Data Architecture | Hanzala Qureshi | Towards Data Science Blog

This one gives the lowdown on three major trends that are shaping data architecture and spill the beans on how LLMs can be super useful in each of these areas. So, whether it's the coolness of context-driven analytics, the importance of data governance, or the buzz around Co-Pilot, this article spills all the details on what's happening in the world of data architecture.


Introduction to dbt Cloud - features, capabilities and limitations | 6 min | Data Analytics | Rados?aw Dziadosz | GetInData | Part of Xebia Blog

Discover how it revolutionizes data engineering workflows, empowering teams to build scalable and maintainable data pipelines effortlessly.


In MORE LINKS you will read about Google Bard’s, new visual feature that is a Game Changer

{ MORE LINKS }



TUTORIALS

Falcon LLM: Deploy open source LLM in your private cluster with Hugging Face and GKE Autopilot | 12 min | LLM | Marcin Zab?ocki | GetInData | Part of Xebia Blog

This tutorial delves into the seamless deployment of an open-source Language Model (LLM) private cluster by leveraging Hugging Face and Google Kubernetes Engine (GKE) Autopilot.

No alt text provided for this image

In MORE LINKS you will read about generative AI foundation model for summarization and question answering using your own data

{ MORE LINKS }



NEWS

Never Miss a Beat: Announcing New Monitoring and Alerting capabilities in Databricks Workflows | 4 min | Data Engineering | Roland F?ustlin, Frank Wisniewski | Databricks Blog

Enhanced monitoring and observability features in Databricks Workflows! This includes a new real-time insights dashboard to see all your production job runs in one place, advanced and detailed task tracking for every workflow, and new alerting capabilities to help you catch issues before problems arise.


Google Cloud expands availability of enterprise-ready generative AI | 5 min | AI | Warren Barkley | Google Cloud Blog

Google Cloud announces the general availability (GA) of four important foundation models for Vertex AI. These include Imagen, PaLM 2 for Chat, Codey, and Chirp. For each of these models, organizations can access APIs on Model Garden and do prompt design and tuning on Generative AI Studio. Also, Multimodal Embeddings API in preview which lets customers combine the power of Vertex AI’s generative AI models with their proprietary data, to generate embeddings, or interchangeable vector representations, of their text and image data.



TOOLS

MyMLOps | 2 min | MLOps

Experience the convenience of visualizing your MLOps stack directly in your browser with MyMLOps. This project offers a user-friendly tool stack builder, providing brief insights into various tools and their categories. You can also share your customized stack with others.



DATA TUBE

Architecture of Netflix's Data Mesh. Data mesh use cases | 41 min | Data Mesh | Jordan Lewis, Vlad Sydorenko | CockroachDB

Dig into Netflix's Data Mesh architecture, use cases and how it optimizes data insights without database slowdowns. Discover how Netflix's Data Mesh Platform addresses challenges with multiple writes, providing a powerful solution for your data needs.?

You will learn all about the following:

  • CDC use cases and shortcomings
  • How Netflix uses changefeeds today
  • What Netflix’s data mesh architecture looks like
  • What others can learn from Netflix’s architecture



PODCAST

How Data Engineering Teams Power Machine Learning With Feature Platforms | 1 h 4 min | ML | Tobias Macey, Razi Raziuddin | Data Engineering Podcast

Razi Raziuddin delves into the significance of data engineering and machine learning feature platforms. The discussion centers around these platforms' crucial role in supporting the machine learning workflow and how data engineering teams can enable data scientists and ML engineers to develop and maintain their features effectively.?



CONFS EVENTS AND MEETUPS

An Intro to LLMs: Key Challenges and Best Practices when Deploying at Scale | Webinar | 5 pm CEST | 27th July

Discover the potential of Generative AI and LLMs for your organization with Seldon's technical team. They will guide you through the opportunities and challenges of these game-changing technologies, demonstrating how leveraging LLMs can automate tasks at scale and in a personalized manner. While these advances in ML have unlocked numerous use cases, the experts will also address important considerations such as data privacy, consistency and ethical concerns, to help you make the most of these innovations.

________________________


Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?


Adam from the GetInData | Part of Xebia


Daniel Milano Nardi

Engenharia em Seguran?a do Trabalho pós gradua??o cursando Engenharia Mecanica / Kaizem .*. Melhoria contínua

1 年

gostaria de fazer da equipe de estagiario de voces iria aprender muito mais . Brasil

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了