登录查看更多内容

DATA Pill #033 - 4 ways to optimize BigQuery, 30 data models in DBT, 4 enablers of being data-driven, and a look back at the 2022 predictions

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2022年12月26日

+ 关注

Hi,

Holiday time is almost over but I hope you will find some time to read the next Data Pill!

We managed to find some really “meaty” content for you. Enjoy!

ARTICLES

What I Got Wrong: Looking Back at My 2022 Predictions for the Modern Data Stack | 14 min | Modern Data Stack | Prukalpa | Personal Blog

Before the predictions for 2023, let’s take a step back into the past and check the reflections on six major trends from 2022 that Prukalpa made at the beginning of this year. What did we get right? What didn’t quite go as expected? What did we completely miss? Read more about?

where we started and where are we now with:?

1.Data Mesh

2.Metrics Layer

3.Reverse ETL

4.Active Metadata & Third-Gen Data Catalogs

5.Data Teams as Product Teams

6.Data Observability

T-Mobile Supports 5G Rollout with Azure Synapse Analytics and Power BI | 7 min | Data | Microsoft Blog

A short story about building a nationwide 5G network. Read how T-Mobile, who use Power BI built a centralized source of data, maintaining high levels of performance and functionality using a data lakehouse supported by Microsoft Azure Data Factory, Azure Synapse Analytics and Azure Databricks.

Migrating over 30 data models from plain SQL to DBT in just 5 days | 4 min | dbt | Ramtin Javanmardi | Mentimeter Blog

All the reasons why the company felt compelled to migrate over 30 of their models and sunset the old models explained. They did the migration in three distinct steps, which you can read about. A great example of how big migrations of business-critical models do not have to be boring or feel stressful.

AWS Disaster Recovery Strategies – PoC with Terraform | 10 min | AWS | Martin Perez Rodrigues | Xebia Blog

In this article you can explore proof-of-concept written in Terraform, where they will for example create the front-end layer of three-tier architecture.?

How Einride is taking road freight to new places—on the cloud and on the road | 12 min | Google Cloud | Matt Chaban | Google Cloud Blog

Einride is rethinking every piece of the freight system, from trailers to local deliveries to the remote and autonomous platforms to operate them. If you want to check how they plan to create a sustainable, resilient delivery network using AI and tech, read this blog post.

Improving Video Voice Dubbing Through Deep Learning | 12 min |? TensorFlow | Paul McCartney, Vivek Kwatra, Yu Zhang, Brian Colonna, Mor Miller | Google Developers

Did you know that most of the videos on Youtube are in English but less than 20% of the world’s population speak English as a first or second language? This is why voice dubbing is increasingly used to transform video in other languages. In this blog post you can read about the research of voice dubbing quality using deep learning.

{ MORE LINKS }

TUTORIALS

Meshing MLOPS on Azure with MLFlow | 6 min | MLOps |? Keshav Singh | Personal Blog

In this blog Keshav will establish the ML life cycle leveraging MLFlow – an open source machine learning platform and framework for managing the ML life cycle. It is a short, hands-on demonstration of the MLOPs standardization on a Mesh Platform.?

4 ways to optimize your BigQuery tables for faster queries | 15 min | BigQuery | Kelvin Gakuo | Airbyte Blog

Read this step-by-step tutorial where you will explore design patterns of your BigQuery storage that you can use to increase the speed and performance of your queries. To optimize your workloads on BigQuery, you can optimize your storage by:

1. Partitioning your tables.

2. Clustering your tables.

3. Pre-aggregating your data into materialized views.

Brij kishore Pandey 4 个月前

9 Predictions for Data in 2023

Tomasz Tunguz 2 年前

Data Bricks - The New Way to Manage Data Efficiently

Miracle Software Systems, Inc 7 个月前

4. Denormalizing your data.

In this blog post you will also read about BigQuery storage and compute costs and how to investigate BigQuery performance issues and more.?

{ MORE LINKS }

NEWS

Snowflake introduces Add-On for Microsoft Visual Studio | 2 min | Snowflake | Christian Lauer | Snowflake?

The add-on makes it possible for developers to gain access to Snowflake from within the VS Code architecture. This extension also connects the user to Snowflake and enables them to write and execute SQL queries, but also to see the results without ever leaving the VS Code. After one has successfully signed in, they can see and change their active database, schema, role and whole warehouse

Grafana Releases New Frontend Observability SDK and Backend Profiling Database | 6 min | Grafana | Matt Capbell | InfoQ

Recently Grafana announced two new additions to its suite of observability and monitoring tools

Debezium 2.1.0.Final Released | 5 min | Database | Jiri Pechanec | Debezium Blog

You might recently noticed that Debezium went a bit silent for the last few weeks. No, we are not going away. In fact the elves in Google worked furiously to bring you a present under a Christmas tree - Debezium Spanner connector.

PODCAST

Update your model’s view of the world in Real Time with streaming Machine Learning using River | 1 h 16 min | ML | The Python Podcast.__init__?

River is a framework for building streaming machine learning projects that can constantly adapt to new information. Listen to the podcast episode, where Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River. You will also find the answers to questions, for example:

What is "online" machine learning?
How is the River framework implemented?
What are some of the challenges that users of River might run into if they come from a batch learning background?
When is River the wrong choice?

Top 6 Worst Apache Kafka JIRA Bugs | 1 h 10 min | guest: Anna McDonald | Confluent

After listening to this episode you will get to know the details about how batching works, the replication protocol, how Kafka’s networking stack dances with Linux’s one and which is the most important Scala class to read if you’re only going to read one.

Anna gives Kris the details about the bugs that she found and about some of the scariest, most surprising and most enlightening corner cases.

DATA TUBE

Customer showcase: Miro (hosted by dbt Labs) | 60 min | Modern Data Stack | dbt Labs

In this video, Felipe Leite and Stephen Pastan from Miro unpack their shift to a Modern Data Stack and share the vital technical changes they made to build a scalable and tech-forward data stack. Watch this to discover how to efficiently scale your analytics stack when your data and data team grows 10x in 2 years and start prioritizing what gets done when there's that much growth.

CONFS EVENTS AND MEETUPS

Near Real-Time Anomaly Detection With Delta Live Tables and Databricks Machine Learning | 9 January 2023 at 9am GMT; 10am CET | Live webinar?

Join the webinar featuring Achraf Hamid, Data Scientist at Mailinblack, who will explore the importance of anomaly detection for businesses. The session will also examine how to solve common anomaly challenges, and achieve a near real-time anomaly detection system using the Databricks Lakehouse Platform.

Speakers:

Achraf Hamid, Data Scientist at MAILINBLACK
Michael Shtelma, Lead Specialist Solutions Architect at DATABRICKS
Alex Ott, Senior Specialist Solutions Architect at DATABRICKS

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

Adam from the GetInData | Part of Xebia

DATA Pill #033 - 4 ways to optimize BigQuery, 30 data models in DBT, 4 enablers of being data-driven, and a look back at the 2022 predictions

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES

TUTORIALS

领英推荐

NEWS

Grafana Releases New Frontend Observability SDK and Backend Profiling Database | 6 min | Grafana | Matt Capbell | InfoQ

PODCAST

DATA TUBE

CONFS EVENTS AND MEETUPS

DATA Pill

2,473 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Managing Big Data with Azure Data Lake: Architecture and Best Practices

Simplifying Analytics with Azure Databricks' Open Lakehouse Architecture

Mapping Microsoft's Data Analytics Landscape – Comparing Databricks, Synapse and Fabric

Sneak Peek into Trino with Azure HDInsight on AKS

Azure Data and Power BI News (February 2023)

NuoData open data lake-house

Microsoft Fabric Data Warehouse - The Polaris engine

Exploring Azure Synapse Analytics: Dedicated Pools vs. Serverless Pools

Data Technology Trend #8: Data Next

ARTICLES

TUTORIALS

领英推荐

NEWS

Grafana Releases New Frontend Observability SDK and Backend Profiling Database | 6 min | Grafana | Matt Capbell | InfoQ

PODCAST

DATA TUBE

CONFS EVENTS AND MEETUPS

DATA Pill

2,473 位关注者

?? DATA Pill #132 - MinIO, Iceberg, Polars, chDB, NEO, and more!

2024年11月25日

DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

2024年11月18日

?? DATA Pill #130 - Top 7 Alternatives to Apache Flink, How to run data science projects

2024年11月11日

?? DATA Pill #129 - From ETL to AI, dbt: Incremental but Incomplete

2024年11月4日

?? DATA Pill #128 - dbt? at BlaBlaCar, What CDC is (and isn’t)

2024年10月28日

?? DATA Pill #127 - dbt Semantic Layer, CSVs Into Graphs Using LLMs

2024年10月21日

?? DATA Pill #126 - 6 Best LLM Tools To Run Models Locally, Unified Data + AI Governance with Unity Catalog

2024年10月14日

?? DATA Pill #125 - Exposing dbt models in Looker, RAG with Postgres

2024年10月7日

Subject: ?? DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

2024年9月30日

?? DATA Pill #123 - Stateless vs. Stateful Stream Processing, BigQuery Engine for Apache Flink

2024年9月23日

社区洞察

其他会员也浏览了

Managing Big Data with Azure Data Lake: Architecture and Best Practices

Simplifying Analytics with Azure Databricks' Open Lakehouse Architecture

Mapping Microsoft's Data Analytics Landscape – Comparing Databricks, Synapse and Fabric

Sneak Peek into Trino with Azure HDInsight on AKS

Azure Data and Power BI News (February 2023)

NuoData open data lake-house

Microsoft Fabric Data Warehouse - The Polaris engine

Exploring Azure Synapse Analytics: Dedicated Pools vs. Serverless Pools

Data Technology Trend #8: Data Next