DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake
Hi,
This week’s DATA Pill brings you the latest on data architecture upgrades, dynamic BI solutions, and Kafka’s release. Check out articles on embeddings for tech docs, Netflix’s partner management overhaul, Demandbase’s switch from ClickHouse, and much more.
ARTICLES
The advent of the Open Data Lake | 7 min | Data Engineering | Julien Le Dem | The Symphatetic Ink Blog
Julien Le Dem maps out the shift from Hadoop to Open Data Lake, showing how cloud-native architecture eliminates data silos and enhances scalability.
Demandbase Ditches Denormalization By Switching off ClickHouse | 4 min | Data Engineering | StarRocks Engineering
Demandbase moved from ClickHouse to CelerData Cloud, cutting storage costs and simplifying data pipelines to handle real-time updates at scale.
TUTORIALS
Embeddings are underrated | 6 min | ML | Kayce Basques | Technical Writing Blog
Embeddings bring new power to technical docs, enabling content connections without complex models. Learn how these vectors organize data at a massive scale.
In MORE LINKS you will read:
NEWS
Introducing Apache Kafka? 3.9 | 5 min | Data Streaming | Confluence Blog
Kafka 3.9 wraps up the 3.x series with flexible KRaft quorum management, streamlined ZooKeeper migration, and production-ready tiered storage.
TOOL
IdentityRAG combines identity resolution with retrieval-augmented generation to provide accurate, unified views of customer data, which is ideal for comprehensive LLM responses.
PODCAST
An Opinionated Look At End-to-end Code Only Analytical Workflows | 56 min | Data Analytics | Tobias Macey, Burak Karakan | Data Engineering Podcast
Burak Karakan explains the benefits of fully code-driven analytics workflows, making integrations faster and more cohesive across the data stack.
CONFS, EVENTS AND MEETUPS
Big Data Technology Warsaw 2025 - CFP | 24th November
The Big Data Technology Warsaw Summit returns on April 9-10, 2025! Submit your speaking proposal and join over 500 professionals as they dive into the latest in data engineering and big data technology.
_______________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill
Adam from the GetInData | Part of Xebia