GroupBy #16: Uber's Anomaly Detection & Alerting System, many layers of data lineage
Robot trying to be a human, human trying to be a robot.

GroupBy #16: Uber's Anomaly Detection & Alerting System, many layers of data lineage

Plus: Data modeling side project, Data Engineer roadmap 2024.


NOTE

This issue is first published at GroupBy newsletter.

Original issue: HERE

GroupBy is the place where I compile valuable data engineering resources for you to learn and grow.

So, if you find my work valuable and want to receive a weekly issue, subscribe here:

?? vutr.substack.com


?? It will steal 37 seconds from you


NEWSLETTER UPDATE.

FOR READER THAT ALREADY SUBSCRIBED:

THIS UPDATE WILL NOT AFFECT YOU READING EXPERIENCE AND NUMBER OF EMAIL YOU WILL RECEIVE WEEKLY.

You still receive only ONE EMAIL EVERY WEEK:

The GROUPBY WEEKLY issue.

(like the one you’re reading)


From beginning of 2024, I will launch a sub-newsletter with co-exist with this newsletter . This mean my newsletter will contain two sub-newsletter:

  • GroupBy.Weekly compiled resource of data engineer (like the one you’re reading).Every Tuesday
  • Dimensions.My blog-style writing about what I've learned in data engineering field.Every Saturday

Subscriber who subscribed:
Subscriber have the control over which newsletter they want to receive:
FOR READER THAT ALREADY SUBSCRIBED:

?? Side Project

40+ hours of debugging and you still want some more?

??┆Data Modeling Project: Design For Global Superstore Sales

? Nnamdi Samuel

This project's central goal is creating a structured database design that includes a central table of facts and the required dimension tables to establish connections between different elements. This will enable meaningful comparisons and analysis.

I am always looking for a data modeling project. Finally, I found one.


?? Learning resource

I love to learn, and I assume you do too.

??┆The Ultimate Roadmap for Data Engineers in 2024

? Vishal Barvaliya

In this blog, we'll reveal the layers of the ultimate roadmap for eager newcomers through the essential skills that define the data engineering.

I agree with most steps in this roadmap; just want to add data modeling and dbt into it.


?? Engineering

I have to believe in a world outside my own mind. — Memento (2000)

??┆Understanding Parquet, Iceberg and Data Lakehouses at Broad

? David Gomes

I've heard a lot about Avro, Parquet, ORC, Arrow and Feather, but I also keep hearing about Iceberg and Delta Lake. As a "database person", I’ve been struggling to understand all of these different things, and how they relate to Data Lakes and Data Lakehouses (and what exactly are these?). So, I’ve decided to study them, and consolidate my knowledge in writing.

??┆Deployment of Exabyte-Backed Big Data Components

? Anuj Maurice

In this post, we'll explain how we built our RU (rolling update) framework to power a frictionless deployment experience on a large-scale Hadoop cluster, achieving a >99% success rate free from interruptions or downtime and reducing significant toil for our SRE and Dev teams.

??┆uVitals - An Anomaly Detection & Alerting System

? Uber Engineering Blog

But what about the long tail of issues that lurk in the shadows, sometimes remaining undetected until they cause chaos? For these, traditional strategies may not suffice.

??┆Apache Airflow at Adyen: Our journey and challenges to achieve reliability at scale

? Natasha S

In this blogpost, we shared a few challenges that we encountered while aiming to achieve reliability at scale at Adyen with Airflow.

??┆3 years managing Kubernetes clusters, my 10 lessons.

? Herve Khg

In this article, I wish to share with you the ten most valuable lessons I've learned as a Kubernetes cluster manager.

? Data

The one thing that this job has taught me is that truth is stranger than fiction. — Predestination (2014)

??┆Super Tables: The road to building reliable and discoverable data products

? Cliff Leung

Super Tables (ST) are pre-computed, denormalized, and consistently consolidated attributes and insights of entities or events that are optimized for common and efficient analytic use cases.

??┆How to plan to data roadmap for 2024 - elevating your data strategy

? Benjamin Rogojan

...I wanted to provide some tips to help those either in leadership positions or who want to break into these positions plan out their data roadmap for 2024.

??┆The many layers of data lineage

? Borja Vazquez

In this post we’ll discuss how we can learn from the field of cartography and Google Maps to extract the untapped potential of data lineage, and build this ideal interface to improve data literacy and observability.

??┆Discovery and Consumption of Analytics Data at Twitter

? Sriram Krishnan

In this blog, we will discuss the higher-level design and usage of of Data Access Level, how it fits in within the overall data platform ecosystem, and share some observations and lessons learned.

?? AI┆ML┆Data Science

You know, Burke, I don’t know which species is worse. — Ripley, Aliens (1986)

??┆[1hr Talk] Intro to Large Language Models

??? Andrej Karpathy

And so now, we return to the original question that took us down this long and winding path - should we even care about connecting enterprise data to natural language queries by LLMs?

??┆How To Train Your Own GenAI Model

? Alessandro Joabar

If I was to summarize the goal of this article, it's that we're going to learn to light a campfire with a lighter (GPT2) and not a flamethrower (GPT3.5).

??┆Running demand forecasting machine learning models at scale

? Maarten Sukel

This blog post delves into the learnings and challenges on our journey towards implementing and scaling state-of-the-art deep learning approaches. We’ll shed light on how to use the newest machine-learning approaches in a controlled and reliable manner.

??┆Airbnb at KDD 2023

? Alex Deng

Airbnb had a significant presence at KDD 2023 with two papers accepted into the main conference proceedings and 11 talks and presentations. In this blog post, we’ll summarize our team’s contributions and share highlights from an exciting week of research talks, workshops, panel discussions, and more.

??┆Monte Carlo, Puppetry and Laughter: The Unexpected Joys of Prompt Engineering

? Ben Bernard

This article will be an exploration of prompt techniques we’ve used for our internal productivity tooling at Instacart.

Before you leave...

?? I love learning from people who are smarter and more experienced than me by consuming their data engineering resources on the Internet.

?? These resources will be compiled every week in the form of a GroupBy newsletter by me, which I first publish on Substack.

Then, I deliver it again on LinkedIn to make it more accessible to all of you.

So, if you want to learn and grow with me, subscribe to my Substack here:

?? vutr.substack.com

?? Which will motivate me a lot.


“Hasta la vista, baby”

-T800, Terminator 2: Judgment Day (1991)



要查看或添加评论,请登录

社区洞察