DATA Pill #095 - Real-Time RAG, pick between Kimball, One Big Table, and Relational Modeling
Hi,
Monday evening, it’s time to take your DATA Pill.
As always, great content and a little surprise - a promo code for the conference.
Enjoy!?
ARTICLES
Apache Kafka is NOT real real-time data streaming! | 4 min | Data Streaming | Kai Waehner | Personal Blog
This blog post explores the architecture of NASDAQ that combines critical stock exchange trading with low-latency streaming analytics.
Hive Metastore – Did We Replace It With A Vendor Lock? | Oz Katz, Einat Orr | 7 min | Data Engineering | lakeFS blog
This blog considers in what sense Hive’s Metastore is “open” and why we believe the leading candidates to replace it are closed, in a way that is meant to limit us to using a specific vendor’s data ecosystem.
News Recommendation: the challenging area in building recommendation systems | 8 min | Recommendation Systems | Adam Cierlik | GetInData | Part of Xebia Blog
Exploring the ever-changing world of news recommendation systems? This blog dives deep into how to blend user preferences with real-time news context for a genuinely personalized reading experience.
In MORE LINKS you will read about: Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data Platform and A Deep Dive into the Latest Performance Improvements of Stateful Pipelines in Apache Spark Structured Streaming
TUTORIALS
Evaluate LLMs with Hugging Face Lighteval on Amazon SageMaker | 8 min | LLM? | Philipp Schmid? | Personal Blog
Let’s learn how to evaluate LLMs using Hugging Face lighteval. LightEval supports the evaluation suite used in Hugging Face Open LLM Leaderboard.
In MORE LINKS you will read about: Easy Introduction to Real-Time RAG
DATA TUBE
How to pick between Kimball, One Big Table, and Relational Modeling as a data engineer | 42 min | Data Engineering | Data with Zach
领英推荐
We'll be covering:?
- When to use One Big Table modeling vs Kimball
- How to use Struct and Array and Array of Struct to get what you want
PODCAST
Optimizing both hardware and software for GenAI | 26 min | Gen AI | Ryan Donovan, Raymond Lo | The Stack Overflow Podcast
Ryan and Ben chat with Raymond Lo, AI software evangelist at Intel, about the AI PC, the software that powers AI breakthroughs, and optimizing hardware and software in unison to improve generative AI performance. Bonus: what’s the difference between a GPU optimized for graphics and a VPU or NPU optimized for AI??
CONFS EVENTS AND MEETUPS
Big Data Technology Warsaw Summit | Warsaw and Online | 10th and 11th April
Join the independent conference with an agenda with presentations arranged into nine categories – find your most desired topics! There are, for example:
And more! Learn from speakers from companies like Dropbox, IKEA, Cloudera, Allegro, Ververica, and Freenow.?
Shhh… Use the DataPill200 code to get the 200 PLN discount!
Journey to the Cloud | Zurich | 20th March
Gain expert insights into migrating sensitive workloads securely and optimizing costs. Dive into detailed case studies, including the migration and modernization journeys of Just Eat Takeaway.com and Truecaller, to see these principles in action.
Don't miss out on this invaluable opportunity to learn from industry leaders and propel your business forward with confidence!
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill ?
Adam from the GetInData | Part of Xebia
DataExpert.io 创始人 | 高级数据工程师| 7年经验FAANG工程师
8 个月I worked with Troy! He’s awesome