?? DATA Pill #111 - Stream enrichment with Flink SQL, Ray Infrastructure

?? DATA Pill #111 - Stream enrichment with Flink SQL, Ray Infrastructure

This issue covers Flink SQL stream enrichment, Pinterest's Ray Infrastructure, video classifiers, data team value, and optimizing Apache Iceberg. Plus, explore our podcast on LLM innovations and AI agent tutorials.

Enjoy the read!

ARTICLES

Video annotator: a framework for efficiently building video classifiers using vision-language models and active learning | 6 min | Machine Learning | Amir Ziai, Aneesh Vartakavi, Kelli Griggs, Eugene Lok, Yvonne Jukes, Alex Alonso, Vi Iyengar, Anna Pulido | Netflix Tech Blog

Read about a framework that leverages active learning and large vision-language models to streamline the annotation process, empowering domain experts and enhancing model efficiency.

Unlocking Business Value and proving the value of data teams | 7 min | Data Engineering | Robert Sahlin | Personal Blog

In this blog post, the author discusses the necessity for data teams to prove their value in today's economic climate, the importance of productionizing data, and practical steps to support operational use cases effectively. The post explores how data teams can demonstrate their impact and navigate the complexities of operationalizing analytical data.

How Apache Iceberg is Built for Open Optimized Performance | 22 min | Data Engineering | Alex Merced | dremio blog

Apache Iceberg is a table format designed for data lakehouses. It offers both ACID transactions and rich metadata to enhance performance. This article explores the powerful mechanisms within Iceberg that enable query engines, such as Dremio, to optimize data queries and improve overall efficiency.?

In MORE LINKS you will read about:

  • ETA (Estimated Time of Arrival) Reliability at Lyft

{ MORE LINKS }

TUTORIAL

Stream enrichment with Flink SQL | 14 min | Stream Processing | Marek Maj | GetInData | Part of Xebia Blog

In this article, Marek compares different types of joins available in the Flink SQL engine for effective and efficient stream enrichment.

In MORE LINKS you will read about:

  • Why we no longer use LangChain for building our AI agents
  • Ray Infrastructure at Pinterest

{ MORE LINKS }

PODCAST

Meryem Arik on LLM Deployment, State-of-the-art RAG Apps, and Inference Architecture Stack | 38 min | LLM | Meryem Arik, Srini Penchikala | The Stack Overflow Podcast

Meryem Arik, Co-founder/CEO at TitanML, discusses innovations in Generative AI and LLMs, covering the current state of LLMs, their deployment, state-of-the-art RAG applications, and the inference architecture stack for LLM applications.

DATA TUBE

Understanding User Behavior using Knowledge Graphs | 7 min | AI | RelationalAI

This demo showcases how RelationalAI, deeply integrated with the Snowflake Data Cloud, can uncover user behavior patterns by running powerful graph algorithms directly on your existing Snowflake data.

CONFS EVENTS AND MEETUPS

Data Summer School | Amsterdam | 12-16th August

Join one of our four specialized Data Science, Data Analysis, Analytics Engineering, or Data Literacy cohorts. Each cohort offers targeted, hands-on training sessions scheduled throughout the week for an immersive learning experience.

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?

Adam from the GetInData | Part of Xebia

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了