?? DATA Pill #107 - dbt 1.8 is just wow, How Twitter processes 4 billion events in real-time daily

?? DATA Pill #107 - dbt 1.8 is just wow, How Twitter processes 4 billion events in real-time daily

Hi,

Pill no. 107 is ready to serve! This edition explores load balancing in Apache Kafka and innovative ways to handle tabular data with RAG systems.

ARTICLES

How We Solve Load Balancing Challenges in Apache Kafka | 11 min | Data Enigneering | Yifan Huang | Agoda Engineering Blog

Explore how Kafka's partitioning and load-balancing strategies help efficiently manage our daily data flow while addressing common challenges like workload imbalances and different hardware capabilities.

Tabular Data, RAG, & LLMs: Improve Results Through Data Table Prompting | 10 min | LLM | Eduardo Rojas Oviedo, Ezequiel Lanza | Intel Tech Blog

This post explores how a RAG system can help analysts quickly identify market trends, investment opportunities, and economic risks. The focus is handling tabular data embedded within documents to provide accurate and efficient insights.

How Twitter processes 4 billion events in real-time daily | 5 min | Real-time analytics |? Vu Trinh | Personal Blog

Twitter handles 400 billion real-time events daily, generating a petabyte of data from diverse sources. By transitioning from a lambda to a Kappa architecture, Twitter has improved latency, throughput, and accuracy in their data processing pipelines.

dbt 1.8 is just wow | 8 min | Data Engineering | Charles Verleyen | Astrafy Blog

Delve into the release's core feature, "unit testing," and explore other notable features, like the "empty" flag. This blog includes code snippets and a public repository, allowing readers to test these new features in a sandbox project immediately.

TOOL

Marimo | Data Engineering

Marimo is a reactive Python notebook: run a cell or interact with a UI element, and Marimo automatically runs dependent cells (or marks them as stale), keeping code and outputs consistent. Marimo notebooks are stored as pure Python, executable as scripts, and deployable as apps.

TUTORIAL

DREAM: Distributed RAG Experimentation Framework | 7 min | RAG | Aishwarya Prabhat | MLOps Community

DREAM is a Distributed RAG Experimentation Framework that simplifies the complex process of determining the best combination of RAG parameters for your use case. By leveraging a Kubernetes-native architecture and various open-source technologies, DREAM enables efficient experimentation, evaluation, and tracking of RAG methods in a distributed manner.

Rust vs Python: Choosing the Right Language for Your Data Project | 8 min | Data Engineering | Amberle McKee | Data Camp Blog

Let’s compare Rust and Python. We'll look at how they stack up on various topics to help you make an informed decision on which to use for your project.

PODCAST

Data Migration Strategies for Large Scale Systems | 1 h | Data Engineering | Tobias Macey, Sriram Panyam | Data Engineering Podcast

Any software system will eventually need migration or evolution, especially when dealing with the data layer, which adds complexity. Sriram Panyam, with experience in high-traffic data migration projects, shares his insights on ensuring their success.?

CONFS EVENTS AND MEETUPS

The AI Summit London | London | 12-13th June

The AI Summit London unites the most forward-thinking technologists and business professionals to explore the real-world applications of AI. Think unparalleled opportunities for learning, deep-dive discovery, and non-stop networking (not to mention the incredible line-up of heavyweight speakers).

_______________________

Ezequiel Lanza

Open Source AI Evangelist @ Intel | LF AI&Data TAC Chairperson/ Board | Open Ecosystems | AI/ML | Gen AI | Cloud Native | Speaker

9 个月

Thanks for sharing our work with Eduardo Rojas Oviedo !

回复

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了