?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

This edition includes tips on orchestrating dbt models with Airflow, taking LLM projects from concept to production, and insights from industry experts.

You can also catch up on Databricks and Snowflake news and learn to run multiple notebooks in Microsoft Fabric.?

Enjoy!

ARTICLES

How we orchestrate 2000+ DBT models in Apache Airflow | 13 min | Data Engineering | Alexandre Magno Lima Martins | Apache Airflow Blog

This text explains how Airflow orchestrated a DBT Core project, creating an intuitive pipeline for data analysts and product owners to develop and maintain their data models. With just SQL and basic Git knowledge, anyone in the business can turn their models into Airflow DAGs within minutes, ready for execution with built-in alerting, data quality tests, and access control. Importantly, they can understand Airflow DAGs only after interacting with the UI. Key areas covered include:

  • Mono vs. Multi DAG approach
  • Project structure and DAGs layout
  • DAG generation pipeline
  • Creation of DBTOperator
  • Conclusion and plans

An LLM Journey: From POC to Production | 12 min | LLM | Adva Nakash Peleg | CyberArk Engineering Blog

This blog explores the journey of taking an LLM project from concept to completion, highlighting key steps, tips, and considerations to ensure success.

Data skew in Flink SQL | 10 min | Data Processing | Maciej Maciejko | GetInData | Part of Xebia Blog

Real-time data processing is vital for businesses, and Apache Flink excels in this area. This blog explores strategies to tackle data skew in Flink SQL, ensuring efficient and balanced processing.

In MORE LINKS you will read about:

  • What 10 Years at Uber, Meta and Startups Taught Me About Data Analytics
  • What We Learned from a Year of Building with LLMs (Part I)
  • Is star-schema a thing in 2024? A closer look at the OBTs

{ MORE LINKS }

NEWS

Databricks + Tabular | 3 min | Data Engineering | Adam Conway, Ali Ghodsi, Arsalan Tavakoli-Shiraji, Reynold Xin | Databricks blog

Databricks announces its acquisition of Tabular, Inc., bringing together the creators of Apache Iceberg? and Delta Lake to lead in data compatibility. This blog will outline Databricks' plans to collaborate with the Iceberg and Delta Lake communities to achieve format compatibility and evolve towards a single open standard of interoperability.?

In MORE LINKS you will read about:

  • Introducing Polaris Catalog: An Open Source Catalog for Apache Iceberg

{ MORE LINKS }

TUTORIAL

Run multiple notebooks in parallel using run Multiple in Microsoft Fabric | 7 min | Data Orchestration | Adrian Chodkowski | Seequality blog

Orchestration manages multiple systems and tasks to make workflows run smoothly and efficiently. This tutorial shows how to manage and run various notebooks from a main notebook using the runMultiple method in Microsoft Fabric. You'll learn to easily create and execute notebooks with built-in dependencies, helping streamline your data processing tasks.?

DATA TUBE

Data Streaming Platform Demo | 6 min | Data Streaming | Maciej Kluczny | GetInData | Part of Xebia

In this video, you will dive into platform architecture and see how real-life streaming application works based on SQL queries using Apache Flink and Jupiter Notebooks.

ON-DEMAND WEBINAR?

Demand Forecasting at Scale | 55 min | AI | Albert Heijn, Ruben van de Geer, Rogier van der Geer, Daniel van Dijk | Xebia

Watch how Albert Heijn optimized their demand forecasting services. Learn why they chose a custom solution, the necessary processes, people, and technology, and the challenges to scaling forecasts.

CONFS EVENTS AND MEETUPS

RADAR AI | Online | 26-27th June

ChatGPT was only the beginning. Generative AI is now revolutionizing every industry. Join us for RADAR: AI Edition, exploring how businesses and individuals can unlock their full potential with AI.

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?

Adam from the GetInData | Part of Xebia

要查看或添加评论,请登录

社区洞察

其他会员也浏览了