DATA Pill #019 - GCP, dbt, AWS and Sociopaths in the Modern Data Stack
Hello!
Before we begin, we have a huge announcement to make!
Our community has become a partner of the biggest Big Data Conference in Northern Poland:
This is another step in our project towards knowledge-sharing, so we are kind of… proud ;)
And for this occasion, we have a 10% discount code (ON code, no No-code ;)):
DATAPILL10
More about DataMass in the conference section.?
But before we get to that, a looooot about dbt, GCP and “the Office” meme.
ARTICLES?
The Modern Data Stack Through ‘The Gervais Principle’ | 17 min read | Data Flow | Lauren Balik from Upright Analytics on Medium
This one gets off to an interesting start: Data doesn’t move left-to-right in an organization, it moves through Losers, the Clueless and Sociopaths.?
What if we looked at data flow in terms of the pathological nature of organizations on a vertical axis, not a horizontal one?
Orchestrating dbt Google Cloud PART 2 | 10 min read | dbt &GCP | Enrique Lopez de Lara | Pythian Blog
In this article Enrique defines and demonstrates how to deploy some Google Workflows to orchestrate tasks.
End-to-End DBT project in Google Cloud Platform (Part 1) | 11 min read | dbt &GCP | Mohamed Dhaoui | Dev Genius Blog
One more series of posts - very detailed! All about running dbt projects on GCP and building a dbt-based data platform!
Part 1: Main concepts around DBT and how to organize a DBT project and run it on Google BigQuery
Part 2: How to package the DBT project and deploy it onto the Google Cloud Platform
Part 3: Gives precise details about running dbt with Workflows.
Serverless Messaging: Latency Compared | 5 min | AWS | Bite-Sized Serverless
A comparison of the AWS serverless messaging systems.
SQS Standard can deliver a message to a consumer in as fast as 14 ms and is seldomly slower than 100 ms, assuming low batch sizes. Kinesis with Enhanced Fan-Out is only slightly slower and allows for multiple consumers and a long history of events.
Since we're talking about AWS, here's a role in an interesting AWS project.?
The Modern Metadata Platform: What, Why, and How? | 13 min read | Data Stock | Mars Lan | Metaphor Blog
Metadada management seemed to be a solved problem. With the Modern Data Platform and democratisation of data, we let a bunch of new folks into this candy store with data, which means new challenges. Metadata started to look and smell like a Big Data problem. The idea on how to keep everything intact is a Modern Metadata Platform. Written by the authors of DataHub (now developing their own product: metaphor.io) with a nice walkthrough from the need to the solution.
By clicking MORE LINKS you will find LinkedIn, Allegro and McDonald's case studies.
?NEWS?
Announcing Public Preview of Data Lineage in Unity Catalog | 5 min read | Data Lineage |? Paul Roome, Sachin Thakur and Tao Feng | Databricks Blog?
领英推荐
Better late than never ;) Databricks have finally announced the public preview of data lineage in Unity Catalog, available on AWS and Azure.
Announcing the GetInData Modern Data Platform - a self-service solution for Analytics Engineers | 10 min read | Data Platform | Micha? Rudko | GetInData Blog?
The Modern Data Platform (or Modern Data Stack) is on the lips of basically everyone in the data world right now. The need for a more self-service approach towards data-driven insight development has been observed in many of our clients for some time now.
?DATA LIBRARY
Best Resources for DevOps | 5 min read | DevOps | Java Revisited | Twitter
A collection of meaty DevOps materials, like this Road Map by Vrashabh Sontakke
?TOOLS
?Modin: Scale your Pandas workflows by changing a single line of code | 10 min to dig GitHub?
Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs out of memory.
By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows.?
?
DataTube
From Nothing to Something: Klarna’s Journey With Recommendation Systems | 24 min | Anil Sharma | GAIA
Klarna’s journey from zero recommendation models to a state of five use cases in one year.
A recording from the GAIA Conference 2022.
CONFS AND MEETUPS
DATAMASS SUMMIT? | 29-30 September | Gdańsk
To specify the subject of the summit: Big Data, Data Science, Machine Learning and AI, all in the context of cloud solutions.
One-day workshops, a one-day conference. A lot of case studies are planned for this event.
A few points from the agenda:
Remember about a 10% discount with code: DATAPILL10!
PS, maybe there will be a chance to meet and network in our community?
___________________________
See you next week ??
Adam Kawa from GetInData
PS, are there any “The Office” fans in here?