GroupBy #14: What it takes to be a Senior IC at Meta, Netflix Data Engineering Summit
Santa Claus is drinking hot chocolate in the egg-like spaceship while shipping gifts for the entire universe.

GroupBy #14: What it takes to be a Senior IC at Meta, Netflix Data Engineering Summit

Plus: GCP Data Engineering Project, Conceptual vs logical vs physical data models

NOTE

This issue is first published at GroupBy newsletter.

Original issue: Here

GroupBy is the place where I compile valuable data engineering resources for you to learn and grow.

So, if you find my work valuable and want to receive a weekly issue, subscribe here:

?? vutr.substack.com


I won't be able to see you guys until next Tuesday, the 26th, so...

?? Side Project

40+ hours of debugging and you still want some more?

???┆GCP Data Engineering Project: Building and Orchestrating an ETL Pipeline with Apache Beam and Apache Airflow

? Jana Polianskaja

The pipeline is designed to handle batch transactional data and leverages various Google Cloud Platform (GCP) servicesGCS is used to store and manage the transactional data

  • Composer, a managed Apache Airflow service, is utilized to orchestrate Dataflow jobs
  • Dataflow, based on Apache Beam, is responsible for data processing, transformation, and loading into BigQuery
  • BigQuery serves as a serverless data warehouse
  • Looker, a business intelligence and analytics platform, is employed to generate daily reports

source

?? Learning resource

I love to learn, and I assume you do too.

??┆Distributed Systems lecture series

? Dr. Martin Kleppmann

These videos form an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge.

If you looking for recommendation on Data Engineering Books (by Google, Chat GPT, Reddit, Twitter, whatever…), Designing Data-Intensive Applications will surely be in the list.

I read the book not too long ago and searched for more resources by the author on the internet. Somehow (I forgot how), I discovered a YouTube playlist about distributed systems by from the author.

It’s pirate’s treasure.

You’ll not regret after consuming this.

Trust me.


?? Engineering

I have to believe in a world outside my own mind. — Memento (2000)

??┆The 7 rules for successful job hopping in data engineering

? Zach Wilson

Remember, the job hopping rules are:

??┆Kubernetes for Data Engineers

? Daniel Beach

We want to give Data Engineers an introduction to Kubernetes. It's a tool everyone talks about, but not that many folks get a chance to get their hands dirty with.

??┆ClickHouse is in the house

? zeev - Vimeo Engineering Blog

In this post, I’ll outline our journey from a traditional architecture anchored in Apache Phoenix on HBase to our embrace of ClickHouse just eighteen months ago.

??┆What it takes to be a Senior IC at Meta

? Analytics at Meta

At Meta, senior individual contributors (ICs) are an important part of how we think about growing careers and building effective organizations in data science and data engineering.

??┆Our First Netflix Data Engineering Summit

? Netflix Technology Blog

Earlier this summer Netflix held our first-ever Data Engineering Forum. Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines.

Thanks for scrolling this far (not so far)! ?? Subscribe to my weekly newsletter: vutr.substack.com in case you want to scroll my newsletter right in your mailbox :D

? Data

The one thing that this job has taught me is that truth is stranger than fiction. — Predestination (2014)

???┆Data Ecosystems, Moats, Semantic Layers, and More w/ Tristan Handy

?? Joe Reis, Matthew Housley ,Tristan Handy

Tristan Handy (CEO of dbt Labs) joins the show to chat about the data tooling landscape, business moats, semantic layers, the data engineering ecosystem, and much more.

??┆Conceptual vs logical vs physical data models

? Sonny Rivera

Data modeling is not about creating diagrams for documentation sake. It's about creating a shared understanding between the business and the data teams, building trust, and delivering value with data.


??┆Unstructured Data Unravelled

? Sven Balnojan

Key lesson: Never discard data because you think it's “unstructured.” All data is, and no data is.

??┆How we built consistent product launch metrics with the dbt Semantic Layer.

? Jordan Stein

This blog post walks through the end-to-end process we used to set up product analytics for the dbt Semantic Layer using the dbt Semantic Layer.

??┆The Data Quality Resolution Process

? Mark Freeman

Below I provide details, code examples, diagrams, and communication strategies to help you resolve data quality issues when you are a low-data maturity company.

?? AI┆ML┆Data Science

You know, Burke, I don’t know which species is worse. — Ripley, Aliens (1986)

??┆Extracting skills from content to fuel the LinkedIn Skills Graph

? Ji Yan

In this blog, we'll examine how we use AI to extract skills from various content sources across LinkedIn and map these skills to our Skills Graph.

??┆Why Meta is fighting for Open Source LLMs while Microsoft wants to regulate them.

? Devansh

How Open Source vs Licensed Debate became important for Big Tech business strategy

??┆Declarative Feature Engineering at PayPal

? Marina Lyan

The idea is to allow data scientists to write a declaration of what their features look like rather than explicitly specify how to construct them on top of different execution platforms.

??┆Improving Uber Eats Home Feed Recommendations via Debiased Relevance Predictions

? Uber Engineering Blog

In this blog post, we focus on tackling arguably one of the most important such biases: the position bias. Position bias refers to the phenomenon in which users tend to order more from stores ranked higher compared to stores that are ranked lower, irrespective of how relevant that store truly is to the user.

??┆Personalizing the DoorDash Retail Store Page Experience

? Luming Chen, Yuan Meng, Anthony Zhou

In this post, we show how we built a personalized shopping experience for our new business vertical stores, which include grocery, convenience, pets, and alcohol, among many others.

?? Catch up

…Next Saturday night, we're sending you back to the future!

??┆BigQuery


?? It will steal 7 seconds from you

Random thoughts, ideas.

I’m planning for a new kind of content for my newsletter.

I will let you guys know soon.

If one of these things happens in the next months:

?? “Soon” is a dangerous word.


“Hasta la vista, baby”

-T800, Terminator 2: Judgment Day (1991)


Before you leave...

?? I love learning from people who are smarter and more experienced than me by consuming their data engineering resources on the Internet.

?? These resources will be compiled every week in the form of a GroupBy newsletter by me, which I first publish on Substack.

Then, I deliver it again on LinkedIn to make it more accessible to all of you.

So, if you want to learn and grow with me, subscribe to my Substack here:

?? vutr.substack.com

?? Which will motivate me a lot.





要查看或添加评论,请登录

Vu Trinh的更多文章

社区洞察

其他会员也浏览了