?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular
This edition includes tips on orchestrating dbt models with Airflow, taking LLM projects from concept to production, and insights from industry experts.
You can also catch up on Databricks and Snowflake news and learn to run multiple notebooks in Microsoft Fabric.?
Enjoy!
ARTICLES
How we orchestrate 2000+ DBT models in Apache Airflow | 13 min | Data Engineering | Alexandre Magno Lima Martins | Apache Airflow Blog
This text explains how Airflow orchestrated a DBT Core project, creating an intuitive pipeline for data analysts and product owners to develop and maintain their data models. With just SQL and basic Git knowledge, anyone in the business can turn their models into Airflow DAGs within minutes, ready for execution with built-in alerting, data quality tests, and access control. Importantly, they can understand Airflow DAGs only after interacting with the UI. Key areas covered include:
An LLM Journey: From POC to Production | 12 min | LLM | Adva Nakash Peleg | CyberArk Engineering Blog
This blog explores the journey of taking an LLM project from concept to completion, highlighting key steps, tips, and considerations to ensure success.
Data skew in Flink SQL | 10 min | Data Processing | Maciej Maciejko | GetInData | Part of Xebia Blog
Real-time data processing is vital for businesses, and Apache Flink excels in this area. This blog explores strategies to tackle data skew in Flink SQL, ensuring efficient and balanced processing.
In MORE LINKS you will read about:
NEWS
Databricks + Tabular | 3 min | Data Engineering | Adam Conway, Ali Ghodsi, Arsalan Tavakoli-Shiraji, Reynold Xin | Databricks blog
Databricks announces its acquisition of Tabular, Inc., bringing together the creators of Apache Iceberg? and Delta Lake to lead in data compatibility. This blog will outline Databricks' plans to collaborate with the Iceberg and Delta Lake communities to achieve format compatibility and evolve towards a single open standard of interoperability.?
领英推荐
In MORE LINKS you will read about:
TUTORIAL
Run multiple notebooks in parallel using run Multiple in Microsoft Fabric | 7 min | Data Orchestration | Adrian Chodkowski | Seequality blog
Orchestration manages multiple systems and tasks to make workflows run smoothly and efficiently. This tutorial shows how to manage and run various notebooks from a main notebook using the runMultiple method in Microsoft Fabric. You'll learn to easily create and execute notebooks with built-in dependencies, helping streamline your data processing tasks.?
DATA TUBE
Data Streaming Platform Demo | 6 min | Data Streaming | Maciej Kluczny | GetInData | Part of Xebia
In this video, you will dive into platform architecture and see how real-life streaming application works based on SQL queries using Apache Flink and Jupiter Notebooks.
ON-DEMAND WEBINAR?
Demand Forecasting at Scale | 55 min | AI | Albert Heijn, Ruben van de Geer, Rogier van der Geer, Daniel van Dijk | Xebia
Watch how Albert Heijn optimized their demand forecasting services. Learn why they chose a custom solution, the necessary processes, people, and technology, and the challenges to scaling forecasts.
CONFS EVENTS AND MEETUPS
RADAR AI | Online | 26-27th June
ChatGPT was only the beginning. Generative AI is now revolutionizing every industry. Join us for RADAR: AI Edition, exploring how businesses and individuals can unlock their full potential with AI.
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill?
Adam from the GetInData | Part of Xebia