登录查看更多内容

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2023年10月23日

+ 关注

Hi,

Recover after the weekend with a DATA Pill.

Inhale, exhale… your newest dose is here.

ARTICLES

5 Best Data Observability Platforms | 7 min | Data Observability | Shawn Fergus | Shippyyard Blog

The pros and cons of the 5 best Data Observability Platforms. In this article, Shawn compares Datadog, Splunk, and what else? Check it out.

Accelerating Innovation at JetBlue Using Databricks | 8 min | DataOps | Sai Ravuru and Yared Gudeta | Databricks Blog

How JetBlue utilizes Azure, Databricks, and generative AI for customer experience. Shifting to Databricks Lakehouse architecture improves scalability and cost efficiency, while their BlueSky AI system boosts operational efficiency and satisfaction.

Data engineering at Meta: High-Level Overview of the internal tech stack | 12 min | Data Engineering | Alex M. | Meta Engineering Blog

This article provides an overview of the internal tech stack that we use on a daily basis as data engineers at Meta. The idea is to shed some light on the work we do, and how the tools and frameworks contribute to making our day-to-day data engineering work more efficiently, and to share some of the design decisions and technical tradeoffs that we made along the way.

To dbt or not to dbt | 10 min | Data Science | Pragun Bhutani | intercom-rad Blog

Delve into data management challenges, including SQL complexity, and present Intercom's use of dbt for data transformation. It highlights advantages like organized code and staging models, but acknowledges challenges like a learning curve and boilerplate code.

TUTORIALS

Have you ever wondered if it’s possible to use Kedro and Airflow together? Dive into the step-by step tutorial on how to deploy Kedro pipelines on GCP Composer and Airflow.

Building a Real-Time Data Architecture with Apache Kafka, Flink, and Druid | 10 min | Real Time Analytics | David Wang | Data Engineering Things

This article explores how, when combined, Apache Kafka, Flink and Druid create a real-time data architecture that eliminates these waiting states and enables various real-time data applications, including alerting, monitoring, dashboards, analytics and personalized recommendations. These tools provide a purpose-built pipeline for real-time data applications and have been used by major companies like Lyft, Pinterest, Reddit and Paytm to achieve the data freshness, scale and reliability required for real-time use cases.

领英推荐

Data Engineering: From Zero ETL in the Past to LLM as…

Dr. RVS Praveen Ph.D 1 年前

What Is Big Data Technologies: How To Learn?…

Ze Learning Labb 2 个月前

?? DATA Pill #140 - Apache Kafka + Vector Database +…

Adam Kawa 2 个月前

In MORE LINKS you will find setting up the new dbt Semantic Layer and testing with DBeaver

{ MORE LINKS }

NEWS

Manage your big data needs with HDInsight on AKS | 4 min | Balaji Sankaran | Cloud | Microsoft Blog

Microsoft is launching a public preview of HDInsight on the Azure Kubernetes Service (AKS), a rearchitected cloud-native big data service with Apache Spark, Apache Flink and Trino workloads. It offers seamless integration with Azure analytics services like Power BI and Azure Data Factory with easy use and robust security. It simplifies library management and resource handling while promoting cost-effective analytics setups.

In MORE LINKS you will find Docker with Neo4j, LangChain, and Ollama launches new GenAI stack for developers

{ MORE LINKS }

PODCASTS

Versioning and MLOps for Generative AI | 38 min | AI | Ben Lorica and Yucheng Low | The Data Exchange Podcast

In this talk, Yucheng Low addresses the challenges of managing large-scale machine learning assets and the need for version control, emphasizing their platform's collaborative versioning system for diverse data types and open-source integration. The discussion also touches on data deletion challenges and the value of flexibility and openness in data formats.

In MORE LINKS you will listen to about Do LLMs Make Ethical Choices

{ MORE LINKS }

CONFS, EVENTS, AND MEETUPS

Introduction to GenAI: How to get the most for your business from the latest AI revolution | Webinar | 9th November 2023

Join our upcoming webinar, where we dive into Language Model Models (LLMs) and GenAI. Discover the new way of interacting with LLM models, gain insights into the current LLM landscape, explore their novel possibilities, learn about GenAI’s branches and modalities, and address potential challenges. We’ll also showcase practical use cases, making this a must-attend event for AI enthusiasts and professionals.

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?

Adam from the GetInData | Part of Xebia

DATA Pill

2,558 位关注者

要查看或添加评论，请登录

Adam Kawa的更多文章

Migrating a Petabyte-Scale Hadoop Cluster to Kubernetes: A Fully Open-Source Approach

2025年3月28日

Migrating a Petabyte-Scale Hadoop Cluster to Kubernetes: A Fully Open-Source Approach

Can a data platform be scalable, reliable, and easy to manage while avoiding vendor lock-in? At GetInData | Part of…
?? DATA Pill #149 - Date Lakehouse - is it a holy grail we have been looking for?

2025年3月24日

?? DATA Pill #149 - Date Lakehouse - is it a holy grail we have been looking for?

Hi, Welcome to this week’s DATA Pill! Discover how AI is reshaping industries, explore practical data engineering tips,…
?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

2025年3月17日

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

Hi, Welcome to this week’s DATA Pill! We’ve got two Microsoft Fabric tutorials, AI insights from IBM Research, key data…
?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

2025年3月10日

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

Hi, This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty…
?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

2025年3月2日

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

Hi, The data world is moving fast. I bring you the latest in data engineering, AI, and analytics, from SQL tips to AI…

1 条评论
?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

2025年2月24日

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Hi, This week’s DATA Pill covers aligning data with business goals, key data trends for 2025, Apache Iceberg, and…

1 条评论
Mastering LLMs: 3 Blogs You Need to Read

2025年2月21日

Mastering LLMs: 3 Blogs You Need to Read

Large Language Models (LLMs) are at the forefront of technological innovation, transforming industries like e-commerce,…

1 条评论
?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

2025年2月17日

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

Hi, Train embeddings 400x faster, boost LLMs with knowledge graphs, and integrate real-time AI. Explore reasoning…

4 条评论
?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

2025年2月10日

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

Hi, Data engineering is shifting fast—ETL is evolving, AI is transforming search, and workflows are being redefined…
?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

2025年2月3日

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

Hi, This week, we're covering the latest in AI, data engineering, and distributed systems. From optimizing ETL…

1 条评论

See all articles

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES

TUTORIALS

领英推荐

NEWS

PODCASTS

CONFS, EVENTS, AND MEETUPS

DATA Pill

2,558 位关注者

Adam Kawa的更多文章

社区洞察

其他会员也浏览了

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

A Very Modern Data Stack

Revolutionizing Data Engineering with Delta Lake and Azure Databricks

December 2024 Top Ten (by Dagster Labs)

Introduction to the 21-Day Data Engineering Journey

?? DATA Pill #113 - The majesty of Apache Flink and Paimon, AI/ML in Kubernetes

Data Council 2022: Building Lakehouse with Delta Lake

A Data Quality Framework using DBT & Databricks

ARTICLES

TUTORIALS

领英推荐

NEWS

PODCASTS

CONFS, EVENTS, AND MEETUPS

DATA Pill

2,558 位关注者

Adam Kawa的更多文章

Migrating a Petabyte-Scale Hadoop Cluster to Kubernetes: A Fully Open-Source Approach

?? DATA Pill #149 - Date Lakehouse - is it a holy grail we have been looking for?

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Mastering LLMs: 3 Blogs You Need to Read

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

社区洞察

其他会员也浏览了

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

A Very Modern Data Stack

Revolutionizing Data Engineering with Delta Lake and Azure Databricks

December 2024 Top Ten (by Dagster Labs)

Introduction to the 21-Day Data Engineering Journey

?? DATA Pill #113 - The majesty of Apache Flink and Paimon, AI/ML in Kubernetes

Data Council 2022: Building Lakehouse with Delta Lake

A Data Quality Framework using DBT & Databricks