DATA Pill #076 - Distributed Computing MMA: Ray vs Spark, SQL cookbook for dbt

DATA Pill #076 - Distributed Computing MMA: Ray vs Spark, SQL cookbook for dbt


Hi,

DATA Pillers, we need to cook.

Or maybe just dig into a SQL cookbook for dbt?

Enjoy your newest dose of knowledge!



ARTICLES

Ray vs Spark — The Future of Distributed Computing | 11 min | Data Science | Philippe Dagher | Personal Blog

Let’s explore how Ray, designed for low-latency and high-throughput AI/ML workloads, might be a future-proof choice in the ever-changing world of distributed computing, providing valuable insights for decision-makers, researchers and developers.


The Vestas Data Platform | 4 min | Data Engineering | Peter Enevoldsen | The Vestas Technology Blog

Vestas has introduced a modular cloud-based Vestas Data Platform to support its sustainable energy solutions, by facilitating digital integration and stream data processing, contributing to the green transition and addressing challenges in the wind-power industry.


Data Observability’s Newest Frontiers: DataFinOps and DataBizOps | 10 min | Data Observability | Sanjeev Mohan | Personal Blog

This text explains the rising importance of data observability, which ensures data's quality, reliability and now also its DataFinOps and DataBizOps. DataFinOps controls expenses as data complexities increase, while DataBizOps acts as a map to measure productivity and cost reduction.


In MORE LINKS you will find 5 Lessons Learned from Testing Databricks SQL Serverless + DBT

{ MORE LINKS }



TUTORIALS

SQL cookbook for dbt: Transforming Big Data with Incremental Models | 8 min | Data Engineering | Hugo Lu | Data Engineer Things

Let's dive into the requisite dbt skills you’ll need to effectively run lots of big data dbt models quickly — and the use-case models like this apply to.


From pipelines to platform | 13 min | Data Engineering | Robert Sahlin | Data Engineering Things

Let's explore the concept of a "data flywheel" for generating value from analytical data at scale, addressing the challenges faced by data engineers, and advocating for automated communication.


In MORE LINKS you will find using data contracts with Confluent Schema Registry

{ MORE LINKS }



NEWS

Overcoming complexity: the biggest new dbt Cloud features from Coalesce 2023 | 6 min |? Luis Maldonado | Cloud | dbt Blog

dbt Cloud has introduced major updates to address customer concerns and make data management more efficient. These enhancements, including dbt Mesh, dbt Explorer and the Semantic Layer, help data teams collaborate, track data lineage and control data platform costs more effectively.

In MORE LINKS you will find Docker with GKE Stateful High Availability (HA) Controller

{ MORE LINKS }



TOOLS

Prompt Engineering Guide | LLM?

This prompt engineering guide contains all the latest papers, learning guides, models, lectures, references, new LLM capabilities and tools related to prompt engineering.

In MORE LINKS you will find Ponder

{ MORE LINKS }



DATA TUBE

OpenLineage in Airflow: A Comprehensive Guide | 25 min | Data Engineering | Maciej Obuchowski | Apache Airflow

This talk will cover the benefits of using OpenLineage, how it is implemented in Airflow, practical examples of how to take advantage of it, and what’s in our roadmap. Whether you’re an Airflow user or provider maintainer, this session will give you the knowledge to make the most of this tool.


In MORE LINKS you will listen to about from Parquet to Arrow to OpenLineage and DAG Authoring without PhD

{ MORE LINKS }



CONFS, EVENTS, AND MEETUPS

A High-Level Approach for Solving MLOps Challenges | Webinar | 9th November 2023

Key Takeaways:

  • What signals to watch for that might mean you have MLOps Fatigue
  • How to define the challenge/problem you need to solve in a way that makes finding solutions easier and faster
  • A few examples on how this framework is applied to the real world, so that it’s easy to apply in practice

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill ?

Adam from the GetInData | Part of Xebia

Sanjeev Mohan

Principal, SanjMo & Former Gartner Research VP, Data & Analytics | Author | Podcast Host | Medium Blogger

1 年

Thanks Adam Kawa for including me. Much appreciated.

Jeff Chou

Co-Founder, CEO at Sync Computing

1 年

thanks for including us Adam Kawa!

Hugo Lu

Founder at Orchestra

1 年

Yes Adam Kawa ??

回复

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了