?? DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros
Hi,
This week in DATA Pill: explore the rise of Apache Iceberg as the "modern Hadoop," discover tricks to boost LLM performance by 1000%, and dive into debates on Spark vs. DuckDB and Polars. Don’t miss these insights and more—let’s get started!
ARTICLES
Apache Iceberg: The Hadoop of the Modern Data Stack? | 6 min | Data Engineering | Dani | Data Engineer Things
Apache Iceberg is likened to Hadoop for its role in managing evolving datasets with ACID compliance and schema evolution. However, rapid adoption may lead to technical debt and bottlenecks without proper planning.
My LLM’s outputs got 1000% better with this simple trick | 5 min | LLM | Nikhil Anand | AI Advances
Learn how a technique called "logit transformation" and filtering functions improved LLM accuracy and fluency during an Adobe Research experiment.
TUTORIALS
Should You Ditch Spark for DuckDb or Polars? | 35 min | Data Engineering | Miles Cole | Personal Blog
Benchmarking DuckDB and Polars against Spark for smaller workloads reveals performance and cost advantages—though engine maturity varies.
In MORE LINKS you will read:
领英推荐
WEBINAR ON-DEMAND
LLMOps: from Demo to Production-Ready GenAI Systems | 46 min | LLMops | Marek Wiewiórka | GetInData | Part of Xebia
Explore how LLMOps tackles challenges like prompt sensitivity, cost control, and model tuning for operationalizing GenAI systems.
DATA TUBE
OpenLineage:From operators to hooks | 52 min | Data Engineering | Maciej Obuchowski | Apache Airflow
Dive into Airflow’s latest OpenLineage updates, enhancing data pipeline lineage coverage with AIP-62 and beyond.
CONFS, EVENTS AND MEETUPS
Big Data Technology Warsaw Summit | Warsaw and Online | 9th and 10th April
Join over 600 attendees and 90 speakers for technical sessions, workshops, and networking opportunities in one of the biggest Big Data events of the year.
____________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill
Adam from the GetInData | Part of Xebia
Adam Kawa, wow, that sounds like a data treasure trove! perfect for brushing up skills before the holidays. what are you most excited to dive into?
Enabling Growth Through UX & AI | Building Precious | Ex-Google Policy Specialist | Ex-Lawyer
2 个月Adam Kawa, what's better than unwrapping new data skills before the holidays?
Data Platform MVP in Microsoft Fabric Technology area. Co-organizer of both DataWeekender conference and Dutch Fabric User Group.
2 个月Impressive list, thanks for the mention.