GroupBy #14: What it takes to be a Senior IC at Meta, Netflix Data Engineering Summit
Plus: GCP Data Engineering Project, Conceptual vs logical vs physical data models
NOTE
This issue is first published at GroupBy newsletter.
Original issue: Here
GroupBy is the place where I compile valuable data engineering resources for you to learn and grow.
So, if you find my work valuable and want to receive a weekly issue, subscribe here:
I won't be able to see you guys until next Tuesday, the 26th, so...
?? Side Project
40+ hours of debugging and you still want some more?
The pipeline is designed to handle batch transactional data and leverages various Google Cloud Platform (GCP) servicesGCS is used to store and manage the transactional data
?? Learning resource
I love to learn, and I assume you do too.
These videos form an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge.
If you looking for recommendation on Data Engineering Books (by Google, Chat GPT, Reddit, Twitter, whatever…), Designing Data-Intensive Applications will surely be in the list.
I read the book not too long ago and searched for more resources by the author on the internet. Somehow (I forgot how), I discovered a YouTube playlist about distributed systems by from the author.
It’s pirate’s treasure.
You’ll not regret after consuming this.
Trust me.
?? Engineering
I have to believe in a world outside my own mind. — Memento (2000)
Remember, the job hopping rules are:
We want to give Data Engineers an introduction to Kubernetes. It's a tool everyone talks about, but not that many folks get a chance to get their hands dirty with.
In this post, I’ll outline our journey from a traditional architecture anchored in Apache Phoenix on HBase to our embrace of ClickHouse just eighteen months ago.
At Meta, senior individual contributors (ICs) are an important part of how we think about growing careers and building effective organizations in data science and data engineering.
Earlier this summer Netflix held our first-ever Data Engineering Forum. Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines.
Thanks for scrolling this far (not so far)! ?? Subscribe to my weekly newsletter: vutr.substack.com in case you want to scroll my newsletter right in your mailbox :D
? Data
The one thing that this job has taught me is that truth is stranger than fiction. — Predestination (2014)
Tristan Handy (CEO of dbt Labs) joins the show to chat about the data tooling landscape, business moats, semantic layers, the data engineering ecosystem, and much more.
领英推荐
Data modeling is not about creating diagrams for documentation sake. It's about creating a shared understanding between the business and the data teams, building trust, and delivering value with data.
Key lesson: Never discard data because you think it's “unstructured.” All data is, and no data is.
This blog post walks through the end-to-end process we used to set up product analytics for the dbt Semantic Layer using the dbt Semantic Layer.
Below I provide details, code examples, diagrams, and communication strategies to help you resolve data quality issues when you are a low-data maturity company.
?? AI┆ML┆Data Science
You know, Burke, I don’t know which species is worse. — Ripley, Aliens (1986)
? Ji Yan
In this blog, we'll examine how we use AI to extract skills from various content sources across LinkedIn and map these skills to our Skills Graph.
? Devansh
How Open Source vs Licensed Debate became important for Big Tech business strategy
The idea is to allow data scientists to write a declaration of what their features look like rather than explicitly specify how to construct them on top of different execution platforms.
In this blog post, we focus on tackling arguably one of the most important such biases: the position bias. Position bias refers to the phenomenon in which users tend to order more from stores ranked higher compared to stores that are ranked lower, irrespective of how relevant that store truly is to the user.
In this post, we show how we built a personalized shopping experience for our new business vertical stores, which include grocery, convenience, pets, and alcohol, among many others.
?? Catch up
…Next Saturday night, we're sending you back to the future!
??┆BigQuery
?? It will steal 7 seconds from you
Random thoughts, ideas.
I’m planning for a new kind of content for my newsletter.
I will let you guys know soon.
If one of these things happens in the next months:
?? “Soon” is a dangerous word.
“Hasta la vista, baby”
-T800, Terminator 2: Judgment Day (1991)
Before you leave...
?? I love learning from people who are smarter and more experienced than me by consuming their data engineering resources on the Internet.
?? These resources will be compiled every week in the form of a GroupBy newsletter by me, which I first publish on Substack.
Then, I deliver it again on LinkedIn to make it more accessible to all of you.
So, if you want to learn and grow with me, subscribe to my Substack here:
?? Which will motivate me a lot.