DATA Pill #076 - Distributed Computing MMA: Ray vs Spark, SQL cookbook for dbt
Hi,
DATA Pillers, we need to cook.
Or maybe just dig into a SQL cookbook for dbt?
Enjoy your newest dose of knowledge!
ARTICLES
Ray vs Spark — The Future of Distributed Computing | 11 min | Data Science | Philippe Dagher | Personal Blog
Let’s explore how Ray, designed for low-latency and high-throughput AI/ML workloads, might be a future-proof choice in the ever-changing world of distributed computing, providing valuable insights for decision-makers, researchers and developers.
The Vestas Data Platform | 4 min | Data Engineering | Peter Enevoldsen | The Vestas Technology Blog
Vestas has introduced a modular cloud-based Vestas Data Platform to support its sustainable energy solutions, by facilitating digital integration and stream data processing, contributing to the green transition and addressing challenges in the wind-power industry.
Data Observability’s Newest Frontiers: DataFinOps and DataBizOps | 10 min | Data Observability | Sanjeev Mohan | Personal Blog
This text explains the rising importance of data observability, which ensures data's quality, reliability and now also its DataFinOps and DataBizOps. DataFinOps controls expenses as data complexities increase, while DataBizOps acts as a map to measure productivity and cost reduction.
In MORE LINKS you will find 5 Lessons Learned from Testing Databricks SQL Serverless + DBT
TUTORIALS
SQL cookbook for dbt: Transforming Big Data with Incremental Models | 8 min | Data Engineering | Hugo Lu | Data Engineer Things
Let's dive into the requisite dbt skills you’ll need to effectively run lots of big data dbt models quickly — and the use-case models like this apply to.
From pipelines to platform | 13 min | Data Engineering | Robert Sahlin | Data Engineering Things
Let's explore the concept of a "data flywheel" for generating value from analytical data at scale, addressing the challenges faced by data engineers, and advocating for automated communication.
In MORE LINKS you will find using data contracts with Confluent Schema Registry
领英推荐
NEWS
Overcoming complexity: the biggest new dbt Cloud features from Coalesce 2023 | 6 min |? Luis Maldonado | Cloud | dbt Blog
dbt Cloud has introduced major updates to address customer concerns and make data management more efficient. These enhancements, including dbt Mesh, dbt Explorer and the Semantic Layer, help data teams collaborate, track data lineage and control data platform costs more effectively.
In MORE LINKS you will find Docker with GKE Stateful High Availability (HA) Controller
TOOLS
Prompt Engineering Guide | LLM?
This prompt engineering guide contains all the latest papers, learning guides, models, lectures, references, new LLM capabilities and tools related to prompt engineering.
In MORE LINKS you will find Ponder
DATA TUBE
OpenLineage in Airflow: A Comprehensive Guide | 25 min | Data Engineering | Maciej Obuchowski | Apache Airflow
This talk will cover the benefits of using OpenLineage, how it is implemented in Airflow, practical examples of how to take advantage of it, and what’s in our roadmap. Whether you’re an Airflow user or provider maintainer, this session will give you the knowledge to make the most of this tool.
In MORE LINKS you will listen to about from Parquet to Arrow to OpenLineage and DAG Authoring without PhD
CONFS, EVENTS, AND MEETUPS
A High-Level Approach for Solving MLOps Challenges | Webinar | 9th November 2023
Key Takeaways:
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill ?
Adam from the GetInData | Part of Xebia
Principal, SanjMo & Former Gartner Research VP, Data & Analytics | Author | Podcast Host | Medium Blogger
1 年Thanks Adam Kawa for including me. Much appreciated.
Co-Founder, CEO at Sync Computing
1 年thanks for including us Adam Kawa!
Founder at Orchestra
1 年Yes Adam Kawa ??