?? DATA Pill #111 - Stream enrichment with Flink SQL, Ray Infrastructure
This issue covers Flink SQL stream enrichment, Pinterest's Ray Infrastructure, video classifiers, data team value, and optimizing Apache Iceberg. Plus, explore our podcast on LLM innovations and AI agent tutorials.
Enjoy the read!
ARTICLES
Video annotator: a framework for efficiently building video classifiers using vision-language models and active learning | 6 min | Machine Learning | Amir Ziai, Aneesh Vartakavi, Kelli Griggs, Eugene Lok, Yvonne Jukes, Alex Alonso, Vi Iyengar, Anna Pulido | Netflix Tech Blog
Read about a framework that leverages active learning and large vision-language models to streamline the annotation process, empowering domain experts and enhancing model efficiency.
Unlocking Business Value and proving the value of data teams | 7 min | Data Engineering | Robert Sahlin | Personal Blog
In this blog post, the author discusses the necessity for data teams to prove their value in today's economic climate, the importance of productionizing data, and practical steps to support operational use cases effectively. The post explores how data teams can demonstrate their impact and navigate the complexities of operationalizing analytical data.
How Apache Iceberg is Built for Open Optimized Performance | 22 min | Data Engineering | Alex Merced | dremio blog
Apache Iceberg is a table format designed for data lakehouses. It offers both ACID transactions and rich metadata to enhance performance. This article explores the powerful mechanisms within Iceberg that enable query engines, such as Dremio, to optimize data queries and improve overall efficiency.?
In MORE LINKS you will read about:
TUTORIAL
Stream enrichment with Flink SQL | 14 min | Stream Processing | Marek Maj | GetInData | Part of Xebia Blog
In this article, Marek compares different types of joins available in the Flink SQL engine for effective and efficient stream enrichment.
领英推荐
In MORE LINKS you will read about:
PODCAST
Meryem Arik on LLM Deployment, State-of-the-art RAG Apps, and Inference Architecture Stack | 38 min | LLM | Meryem Arik, Srini Penchikala | The Stack Overflow Podcast
Meryem Arik, Co-founder/CEO at TitanML, discusses innovations in Generative AI and LLMs, covering the current state of LLMs, their deployment, state-of-the-art RAG applications, and the inference architecture stack for LLM applications.
DATA TUBE
Understanding User Behavior using Knowledge Graphs | 7 min | AI | RelationalAI
This demo showcases how RelationalAI, deeply integrated with the Snowflake Data Cloud, can uncover user behavior patterns by running powerful graph algorithms directly on your existing Snowflake data.
CONFS EVENTS AND MEETUPS
Data Summer School | Amsterdam | 12-16th August
Join one of our four specialized Data Science, Data Analysis, Analytics Engineering, or Data Literacy cohorts. Each cohort offers targeted, hands-on training sessions scheduled throughout the week for an immersive learning experience.
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill?
Adam from the GetInData | Part of Xebia