登录查看更多内容

DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2022年12月5日

+ 关注

Hi,

Today will be without anecdotes, memes and funny comparisons.?

There is no time for this.

We have plenty of topics for you.

Let's get started right away!

?

ARTICLES

Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline | 8 min | MLOps | Binance Blog

This article shows how Binance solves various business problems, including fraud, P2P scams, and stolen payment details. Read and understand better why they are using MLOps, how they effectively ensure the production model considers the latest data pattern, and see the standard operating procedure for real-time model development with a feature store.

Setting the Table: Benchmarking Open Table Formats | 6 min | Modern Data Stack | Brooklyn Data Co. Blog

Modern Data Stack is growing rapidly. Also, open table storage formats are getting more attention. Take a look at the article on the BROOKLYN DATA CO. blog and read how they ran a set of comprehensive workloads against each of them to test the performance of inserts and deletes, and the effect of these updates on the performance of subsequent reads.

Creative Testing: AI is the new A/B | 8 min | AI | Team Twigeo | Twigeo Blog

New limits for mobile tracking have led publishers like Meta and Google to shift from AB testing to dynamic creative formats, reliant on algorithms. It may look like marketers are losing creative control, but algorithms are reactive and the results sustain better campaign performance over time.?

In part 1 Stuart discusses the challenges Canva faced with current search architecture, the requirements needed for a new architecture and the considerations to take into account in designing a new solution. In the second part, we’ll dive into the details of our new search pipeline architecture.

Why Should I Care About Table Formats Like Apache Iceberg? | 7 min | Apache Iceberg | Alex Merced | Dremio Blog

Reducing your data warehouse footprint with an Apache Iceberg-based data lakehouse will open up your data to best-in-breed tools, reduce redundant storage/compute costs, and enable cutting-edge features like partition evolution/catalog branching to enhance your data architecture.?

In the past, the Hive table format did not go far enough to make this a reality, but today Apache Iceberg offers robust features and performance for querying and manipulating your data on the lake.?

Now is the time to turn your data lake into a data lakehouse and start seeing the time to insight shrink along with your data warehouse costs..

{ MORE LINKS }

?

NEWS

Exciting new GitHub features powering machine learning | 5 min | ML | Seth Juarez | GitHub Blog

In November, GitHub released Universe announcements. How do they affect ML? Here are the findings from building machine learning projects directly on GitHub.?

Jupyter Notebooks: Not only can I see the cells that have been added, but I can also see side-by-side the code differences within the cells, as well as the literal outputs. I can see at a glance the code that has changed and the effect it produces thanks to NbDime running under the hood.

While the rendering additions to GitHub are fantastic, there’s still the issue of executing the things in a reliable way when I’m away from my desk. Here’s a couple of gems to make these issues go away:

GPUs for Codespaces
Zero-config notebooks in Codespaces
Edit your notebooks from VS Code, PyCharm, JupyterLab, on the web, or even using the CLI (powered by Codespaces)

AWS Announces DataZone, a New Data Management Service to Govern Data | 2 min | AWS | Daniel Dominguez | InfoQ Blog

At AWS re:Invent, Amazon Web Services announced Amazon DataZone, a new data management service that makes it faster and easier for customers to catalog, discover, share and govern data stored across AWS, on-premises and third-party sources.

{ MORE LINKS }

领英推荐

Iceberg: Building AI Apps on a Solid Data Foundation

Brij kishore Pandey 7 个月前

Data Bricks - The New Way to Manage Data Efficiently

Miracle Software Systems, Inc 11 个月前

Apache Iceberg: Managing Big Data with Ease

Sateesh Rai PMP?,TOGAF? 2 个月前

NEWS

How to create a Devcontainer for your Python project | 8 min | MLOps & Docker | Jeroen Overschie | GoDataDriven Blog

Dev Containers can help us:

?Get a reproducible development environment
?Instantly onboard new team members onto your project
?Better align the environments between team members
?Keep your dev environment up-to-date & reproducible, which saves your team time with going into production later

Let’s explore how we can set up a Dev container for your Python project!

{ MORE LINKS }

PODCAST

Data Journey with Kevin Goldsmith (Anaconda) - Data & analytics used internally at Anaconda, SQL vs. Python, Layoffs and hiring in the tech sector, Agile data projects | 50 min | Analytics & Data | host: Adam Kawa ; guest: Kevin Goldsmith | Radio DaTa

Data and analytics used internally by Anaconda
The role and responsibilities of CTO at Anaconda
SQL vs. Python in data science
Hiring and layoffs in the tech industry
An agile approach to data engineering and data science projects

Data Analytics Career Orientation | 1 h | Analytics | host: Jon Krohn; guest: Luke Barousse | Super Data Science

Talk with Luke Barousse, a full-time YouTuber who produces content to help aspiring data scientists, founder of MacroFit, a data-driven company that helps with meal planning.

how data science can help you while working on a submarine?
helpful hacks for data science beginners

{ MORE LINKS }

DataTube

Trino at Apple | 23 min | Analytics | Vinitha Gankid | Trino

Listen to how engineers from Apple shared the current usage of Trino at their company. They discuss how they support Trino as a service for multiple end-users, and the critical features that drew Apple to Trino. They wrap up with some challenges they faced and some development they have planned to contribute to Trino.

?{ MORE LINKS }

CONFS EVENTS AND MEETUPS

move(data) The Data Practitioner Conference | 1-8 December | Online?

In the conference, speakers who have spent countless hours working on data integration take part. Best practices, horror stories, tools and workflows that will improve the way you work.

Security Best Practices with Databricks ?| 14 December | Live Webinar

How to build a secure Databricks environment which complies with industry best practices;
Where to find the best practices for your chosen Cloud provider;
How to stay informed proactively about security risks before they manifest;
About staying vigilant on any settings changes to remain compliant.

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

Adam Kawa from GetInData

DATA Pill

2,558 位关注者

要查看或添加评论，请登录

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

2025年3月17日

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

Hi, Welcome to this week’s DATA Pill! We’ve got two Microsoft Fabric tutorials, AI insights from IBM Research, key data…
?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

2025年3月10日

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

Hi, This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty…
?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

2025年3月2日

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

Hi, The data world is moving fast. I bring you the latest in data engineering, AI, and analytics, from SQL tips to AI…

1 条评论
?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

2025年2月24日

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Hi, This week’s DATA Pill covers aligning data with business goals, key data trends for 2025, Apache Iceberg, and…

1 条评论
Mastering LLMs: 3 Blogs You Need to Read

2025年2月21日

Mastering LLMs: 3 Blogs You Need to Read

Large Language Models (LLMs) are at the forefront of technological innovation, transforming industries like e-commerce,…

1 条评论
?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

2025年2月17日

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

Hi, Train embeddings 400x faster, boost LLMs with knowledge graphs, and integrate real-time AI. Explore reasoning…

4 条评论
?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

2025年2月10日

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

Hi, Data engineering is shifting fast—ETL is evolving, AI is transforming search, and workflows are being redefined…
?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

2025年2月3日

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

Hi, This week, we're covering the latest in AI, data engineering, and distributed systems. From optimizing ETL…

1 条评论
?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

2025年1月27日

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

Hi, Dive into the latest trends, tutorials, and innovations shaping the data world. ARTICLES Exploring the Potential of…

2 条评论
?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

2025年1月20日

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

Hi, This week's highlights dive into AI-ready data strategies, real-time GenAI architectures, and a deep dive into the…

2 条评论

See all articles

DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

Hi,

?

ARTICLES

?

NEWS

领英推荐

NEWS

PODCAST

DataTube

CONFS EVENTS AND MEETUPS

DATA Pill

2,558 位关注者

Adam Kawa的更多文章

社区洞察

其他会员也浏览了

The Dawn of the AI-Native Data Stack - Part 1

Preview of Databricks DataAI Summit: Databricks vs. Snowflake Battle

Data Wars: Vector Strikes Back

Subject: ?? DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

?? DATA Pill #112 - Decodable vs. Amazon MSF, Flink SQL - changelog and races

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

Bing New Search - End-to-End Azure Data Engineering Project using Microsoft Fabric.

Hi,

?

ARTICLES

?

NEWS

领英推荐

NEWS

PODCAST

DataTube

CONFS EVENTS AND MEETUPS

DATA Pill

2,558 位关注者

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Mastering LLMs: 3 Blogs You Need to Read

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

社区洞察

其他会员也浏览了

The Dawn of the AI-Native Data Stack - Part 1

Preview of Databricks DataAI Summit: Databricks vs. Snowflake Battle

Data Wars: Vector Strikes Back

Subject: ?? DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

?? DATA Pill #112 - Decodable vs. Amazon MSF, Flink SQL - changelog and races

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

Bing New Search - End-to-End Azure Data Engineering Project using Microsoft Fabric.