登录查看更多内容

DATA Pill #014 - Future-Aware Data Engineering & Post-Deployment Data Science

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2022年8月15日

+ 关注

Hi everyone ??,

Today we have one clickbait,?

one “put the cat amongst the pigeons” kinda article ??

one podcast that has the potential to go viraland more.

Let’s take a look;)

ARTICLES?

Mercado shares a continuous intelligence framework that enables them to deliver 79% of our shipments in less than 48 hours (due to increased demand).

Data used to support decision-making in key processes:

Carrier Capacity Optimization - monitor the percentage of network capacity utilized across every delivery zone and identify where delivery targets are at risk in almost real time.
Outbound Monitoring - enables them to identify places with lower delivery efficiency and drill into the status of individual shipments.
Air Capacity Monitoring - Provides capacity usage monitoring for aircrafts running each of our shipping routes.

Airflow's Problem | 7 min read | Airflow | Stephen Bailey | Data People Etc.

Let’s put the cat amongst the pigeons ;) Why the author doesn’t like Airflow and disputes the data mesh times we should seek as an alternative.?

Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery | 7 min read | Cloud | Steef-Jan Wiggers | InfoQ Blog

Previously, customers had to use ETL tools such as Dataflow or self-developed Python tools to copy data from Bigtable into BigQuery; however, now they can query data directly with BigQuery SQL.

{ MORE LINKS }

NEWS?

Python models | 10 min read | Databricks Blog?

Update on the future feature of dbt, python models.

A dbt Python model is a function that reads in dbt sources or models, applies a series of transformations and returns a transformed dataset. DataFrame operations define the starting points, the end state and each step along the way. This is similar to the role of CTEs in dbt SQL models.

{ MORE LINKS }

TUTORIALS?

Iceberg Tables: Powering Open Standards with Snowflake Innovations | 7 min read | Data Lake | James Malone | Snowflake

Snowflake is used to solve three challenges commonly related to large data sets: control, cost, and interoperability. Iceberg Tables combine unique Snowflake capabilities with the Apache Iceberg and Apache Parquet open source projects to solve this. This article explains how Iceberg Tables are supposed to help with that.

领英推荐

9 Predictions for Data in 2023

Tomasz Tunguz 2 年前

The Great Data Debate: Unbundling or Bundling?

Prukalpa ? 2 年前

Learn Data Science From Scratch by : 10 Skills You…

Abhinavan Sarikonda ? 2 年前

{ MORE LINKS }

PODCAST

Future-Aware Data Engineer | 42? min | Data Engineering | ?? Pawe? Leszczyński | GetInData

Will this go viral? It’s already widely commented and shared material. …

It is the story of past and current inventions like Facebook by Mark Zuckerberg vs the airplane by the Wright brothers. What is the Dunning-Krueger effect and what does it have in common with Wikipedia? Why did Jacek Kuroń not have to pay his phone bills? We're going to look at these inventions through the lens of Yuval Noah Harari, Daniel Kahneman, and Slavoj Zizek. Seems like the perfect authors' trio for the ideal data-related holiday podcast.?

Post-Deployment Data Science | 33? min | ML | Hakim Elakhrass | DataCamp

Many machine learning practitioners dedicate most of their attention to creating and deploying models that solve business problems. However, what happens post-deployment? Moreover, how should data teams go about monitoring models in production?

Takeaway: Data scientists need to cultivate a thorough understanding of a model’s potential business impacts, as well as the technical metrics of the model.

DataTube

WHOOPS, THE NUMBERS ARE WRONG! SCALING DATA QUALITY NETFLIX | 0,5 h | Michelle Ufford | Netflix | DataWorks Summit

We just found out that there exists a named development pattern of data pipeline DAGs that concern data quality called “Write-Audit-Publish”.

It’s like “blue-green deployment but for data”. I know, it’s obvious, but hey, it’s good to have names for simple things ;)

The original name shows up in this Netflix presentation.

You’re probably curious about how people apply this pattern in tools like dbt.

We only found one video and some slides - you will find them by clicking on MORE LINKS button ?

If you know of some interesting sources on this subject, please leave a comment ;)

{ MORE LINKS }?

CONFS AND MEETUPS

How to simplify data and AI governance | 16 August | Online | databricks & Milliman

How to manage user identities, set up access permissions and audit controls, discover quality data and leverage automated lineage across all workloads
How to securely share live data across organizations without any data replication
How Databricks customer Milliman is leveraging Unity Catalog to simplify access management and reduce storage complexity

Speakers: Paul Roome, Liran Bareket, Dan McCurley

—---

That’s it for today! Please don't hesitate to forward this on.

See you next week ??

Adam Kawa from GetInData

DATA Pill

2,473 位关注者

NannyML

2 年

Thanks for sharing! ??

1 次回应

要查看或添加评论，请登录

查看全部

DATA Pill #014 - Future-Aware Data Engineering & Post-Deployment Data Science

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES?

NEWS?

TUTORIALS?

领英推荐

PODCAST

DataTube

CONFS AND MEETUPS

DATA Pill

2,473 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

DATA Pill #061 - Apache Celeborn, 8 Futuristic Databases to Watch in 2023

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

Architecting Data Pipelines

Azure Data and Power BI News (February 2023)

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

Data Engineer's Arsenal: Tools, Technologies, and Tactics

Unveiling the Data Tapestry: A Data Engineer's Guide to Collection and Ingestion

Data Engineering with Apache Airflow, Snowflake, Snowpark, dbt & Cosmos, Astronomer

30 Days of Data Science: Essential Tips for Aspiring Data Professionals

Embark on Your Data Odyssey: Unveiling the Data Science Guidebook for Success

ARTICLES?

NEWS?

TUTORIALS?

领英推荐

PODCAST

DataTube

CONFS AND MEETUPS

DATA Pill

2,473 位关注者

?? DATA Pill #132 - MinIO, Iceberg, Polars, chDB, NEO, and more!

2024年11月25日

DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

2024年11月18日

?? DATA Pill #130 - Top 7 Alternatives to Apache Flink, How to run data science projects

2024年11月11日

?? DATA Pill #129 - From ETL to AI, dbt: Incremental but Incomplete

2024年11月4日

?? DATA Pill #128 - dbt? at BlaBlaCar, What CDC is (and isn’t)

2024年10月28日

?? DATA Pill #127 - dbt Semantic Layer, CSVs Into Graphs Using LLMs

2024年10月21日

?? DATA Pill #126 - 6 Best LLM Tools To Run Models Locally, Unified Data + AI Governance with Unity Catalog

2024年10月14日

?? DATA Pill #125 - Exposing dbt models in Looker, RAG with Postgres

2024年10月7日

Subject: ?? DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

2024年9月30日

?? DATA Pill #123 - Stateless vs. Stateful Stream Processing, BigQuery Engine for Apache Flink

2024年9月23日

社区洞察

其他会员也浏览了

DATA Pill #061 - Apache Celeborn, 8 Futuristic Databases to Watch in 2023

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

Architecting Data Pipelines

Azure Data and Power BI News (February 2023)

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

Data Engineer's Arsenal: Tools, Technologies, and Tactics

Unveiling the Data Tapestry: A Data Engineer's Guide to Collection and Ingestion

Data Engineering with Apache Airflow, Snowflake, Snowpark, dbt & Cosmos, Astronomer

30 Days of Data Science: Essential Tips for Aspiring Data Professionals

Embark on Your Data Odyssey: Unveiling the Data Science Guidebook for Success