?? DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros

?? DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros

Hi,

This week in DATA Pill: explore the rise of Apache Iceberg as the "modern Hadoop," discover tricks to boost LLM performance by 1000%, and dive into debates on Spark vs. DuckDB and Polars. Don’t miss these insights and more—let’s get started!

ARTICLES

Apache Iceberg: The Hadoop of the Modern Data Stack? | 6 min | Data Engineering | Dani | Data Engineer Things

Apache Iceberg is likened to Hadoop for its role in managing evolving datasets with ACID compliance and schema evolution. However, rapid adoption may lead to technical debt and bottlenecks without proper planning.

My LLM’s outputs got 1000% better with this simple trick | 5 min | LLM | Nikhil Anand | AI Advances

Learn how a technique called "logit transformation" and filtering functions improved LLM accuracy and fluency during an Adobe Research experiment.

TUTORIALS

Should You Ditch Spark for DuckDb or Polars? | 35 min | Data Engineering | Miles Cole | Personal Blog

Benchmarking DuckDB and Polars against Spark for smaller workloads reveals performance and cost advantages—though engine maturity varies.

In MORE LINKS you will read:

  • Microsoft Fabric and Databricks Mirroring
  • Exploring Flink CDC
  • Two ways to perform CI/CD for SQL databases in Fabric using YAML Pipelines
  • Best 5 Frameworks To Build Multi-Agent AI Applications
  • Real-Time AI Stock Advisor with Ollama (Llama 3) & Streamlit

{ MORE LINKS }

WEBINAR ON-DEMAND

LLMOps: from Demo to Production-Ready GenAI Systems | 46 min | LLMops | Marek Wiewiórka | GetInData | Part of Xebia

Explore how LLMOps tackles challenges like prompt sensitivity, cost control, and model tuning for operationalizing GenAI systems.

DATA TUBE

OpenLineage:From operators to hooks | 52 min | Data Engineering | Maciej Obuchowski | Apache Airflow

Dive into Airflow’s latest OpenLineage updates, enhancing data pipeline lineage coverage with AIP-62 and beyond.

CONFS, EVENTS AND MEETUPS

Big Data Technology Warsaw Summit | Warsaw and Online | 9th and 10th April

Join over 600 attendees and 90 speakers for technical sessions, workshops, and networking opportunities in one of the biggest Big Data events of the year.

____________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia

Adam Kawa, wow, that sounds like a data treasure trove! perfect for brushing up skills before the holidays. what are you most excited to dive into?

回复
Dilini Galanga

Enabling Growth Through UX & AI | Building Precious | Ex-Google Policy Specialist | Ex-Lawyer

2 个月

Adam Kawa, what's better than unwrapping new data skills before the holidays?

回复
Kevin Chant

Data Platform MVP in Microsoft Fabric Technology area. Co-organizer of both DataWeekender conference and Dutch Fabric User Group.

2 个月

Impressive list, thanks for the mention.

回复

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了