GroupBy #13: Explaining Kubernetes To My Uber Driver, Data Modelling For Data Engineers
I intended to edit the image to have cooler vibe, but it would make the Golden Retriever not “golden“ anymore…

GroupBy #13: Explaining Kubernetes To My Uber Driver, Data Modelling For Data Engineers

Plus: Data Engineering Design Patterns Book Release, Reddit DE project


NOTE

This issue is first published at GroupBy newsletter.

Original issue: Here

GroupBy is the place where I compile valuable data engineering resources for you to learn and grow.

So, if you find my work valuable and want to receive a weekly issue, subscribe here:

?? vutr.substack.com


??┆Book

??┆Data Engineering Design Patterns

?┆Simon Sp?ti

?? What You Will Learn by the End of This Book
If you are a Data Engineer, don’t skip this book. That’s all I want to say.

??Side Project

40+ hours of debugging and you still want some more?

??┆Data Engineering with Reddit, Airflow, Celery, Postgres, S3, AWS Glue, Athena, Redshift

? Yusuf Ganiyu

In this article, we’ll walk through the process of creating a data pipeline that fetches data from Reddit, uses Apache Airflow for orchestration, stores the data in Amazon S3, processes it with AWS Glue, queries with Amazon Athena, and finally, loads it into Amazon Redshift for analysis.
source

?? Learning resource

I love to learn, and I assume you do too.

??┆5 Free Courses to Begin your Data Engineering journey

??┆How I’d learn ML in 2024 (If I Could Start Over)

? Boris Meinardus

All you need to learn ML in 2024 is a laptop and a list of the steps you need to take.

?? Engineering

I have to believe in a world outside my own mind. — Memento (2000)

??┆Explaining Kubernetes To My Uber Driver

? Jessica Wang

source

??┆How Google takes the pain out of code reviews, with 97% dev satisfaction

? Erika Bussmann

A study of Google's code review tooling (Critique), AI-powered improvements, and recent statistics

??┆API-First Approach to Kafka Topic Creation

? Varun Chakravarthy, Basar Onat, Seed Zeng, Luke Christopherson

DoorDash’s Engineering teams revamped Kafka Topic creation by replacing a Terraform/Atlantis based approach with an in-house API, Infra Service. This has reduced real-time pipeline onboarding time by 95% and saved countless developer hours.

??┆S3 Express speculations

? Paul Masurel

This blog post addresses two different subjects:

??┆OneTable: Interoperability for Apache Hudi, Iceberg & Delta Lake

? Dipankar Mazumdar

TL;DR: OneTable provides a seamless way to interoperate between different table formats by translating table format metadata.

? Data

The one thing that this job has taught me is that truth is stranger than fiction. — Predestination (2014)

??┆Data Pipeline Observability: A Model For Data Engineers

? Eitan Chazbani

Data pipeline observability is your ability to monitor and understand the state of a data pipeline at any time.

??┆Data Modelling For Data Engineers

? Mike Shakhomirov

The definitive guide for beginners

??┆Understanding Master Data Management integration challenges

? Piethein Strengholt

In this post, we’ll explore some of these challenges in detail, offering insights into how they can be effectively managed to ensure your MDM strategy delivers the most value.

??┆What Does Your Data Quality Really Need? Understanding the Data Quality Maturity Curve

? Noel Gomez

In this piece, we examine the Data Quality Maturity Curve—a representation of how data quality works itself out at different stages of your organizational and analytical maturity…

??┆The Consumer-Defined Data Contract

? Chad Sanderson

This is the consumer-defined data contract. The consumer-defined contract is created by the owners of data applications, with requirements derived from their needs and use cases.

?? AI┆ML┆Data Science

You know, Burke, I don’t know which species is worse. — Ripley, Aliens (1986)

??┆MLOps is Mostly Data Engineering

? Kostas Pardalis

After a few years and with the hype gone, it has become apparent that MLOps overlap more with Data Engineering than most people believed.

??┆Semantic Layers are the missing piece for AI-Enabled Analytics

? Brian Bickell, David Jayatillake

Semantic layers provide both a knowledge graph and a constrained interface for an LLM.

??┆Recursive Embedding and Clustering

? Gustavo Pereira

source

??┆How we’re experimenting with LLMs to evolve GitHub Copilot

? Sara Verdi

Learn how we're experimenting with generative AI models to extend GitHub Copilot across the developer lifecycle.

?? Catch up

…Next Saturday night, we're sending you back to the future! — Dr. Emmett Brown, Back to the Future (1985)

??┆Introducing Databricks Vector Search Public Preview

??┆Introducing Gemini: our largest and most capable AI model

→ Everyone is excited but the truth is …

??┆Announcing Purple Llama: Towards open trust and safety in the new world of generative AI

??┆Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer

??┆JetBrains?AI Assistant


“Hasta la vista, baby”

-T800, Terminator 2: Judgment Day (1991)


Before you leave...

?? I love learning from people who are smarter and more experienced than me by consuming their data engineering resources on the Internet.

?? These resources will be compiled every week in the form of a GroupBy newsletter by me, which I first publish on Substack.

Then, I deliver it again on LinkedIn to make it more accessible to all of you.

So, if you want to learn and grow with me, subscribe to my Substack here:

?? vutr.substack.com

?? Which will motivate me a lot.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了