登录查看更多内容

DATA Pill #025 - Data Meshes Missing element, all-in-one data stack renesans and more

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2022年10月31日

+ 关注

Hi,

The new week has already started, so I am coming back with the next dose of DATA PILL.

Be sure you are ready, in today’s newsletter you will find ranking, report, a lot about cloud, and much more.

Do not waste time, here we go:

ARTICLES

Data Contracts: The Mesh Glue | 8 min | Data & ML | Luis Velasco | Toward Data Science Blog

Data Mesh + Open source components + data contracts

In this article Luis explains the “Data contract” concept, which ensures that information spread across different data products can be shared and reused along with a couple of technical implementations using open source components for one fundamental process in the data contracts lifecycle: its evaluation.

With the ultimate goal of building trust on “someone else's” data products, data contracts are artifacts that sit at the intersection of a (a) business glossary providing rich semantics, (b) a metadata catalog providing information about the structure (c) a data quality repository setting expectations about the content across different dimensions. To ease and promote data sharing.

State of AI Report 2022 | 10 min | AI | Nathan Benaich & Ian Hogarth | State of AI

The State of AI report 2022 has been released. Just wow - so much interesting content and recent developments summarized and analyzed in this report (not that new for someone that follows the AI field). There is also an investor's view on AI which is especially interesting.?

The China-US AI research gap has continued to widen
Safety is gaining awareness among major AI research entities
AI-driven scientific research continues to lead to breakthroughs, but major methodological errors like data leakage need to be interrogated further.

The MLSecOps Top 10 | 11 min | MLOps | The Institute for Ethical AI & Machine Learning

An initiative that aims to further the field of machine learning security by identifying the top 10 most common vulnerabilities in the machine learning life cycle. It also includes a set of practical hands-on examples of each of these vulnerabilities, as well as the best practices to address them - all the content is available open source.

MLOps' source of knowledge has not yet dried up. BTW, there's an interesting senior position in MLOps at Get in Data available! Check it out here

Why we're leaving the cloud | 6 min | Cloud | David Heinemeier Hansson | Basecamp?

Renting computers is (mostly) a bad deal for medium-sized companies with stable growth, like Basecamp. The savings promised in reduced complexity never materialized.?

The cloud excels at two ends of the spectrum:

The first end is when your application is so simple and low traffic that you really do save on complexity by starting with fully managed services.
The second is when your load is highly irregular. When you have wild swings or towering peaks in usage. When the baseline is just a sliver of your largest needs. Or when you have no idea whether you need ten servers or a hundred.

The Next Generation Of All-In-One Data Stacks | 11 min read | Data Stack | Ben Rogojan | Seattle Data Guy Blog?

Is the modern data stack even modern?

Isn’t it just a piecemeal of components from solutions we have known forever like SAP or Informatica?

Isn’t it just an unbundled version of Airflow?

All-In-One Data Stacks rises.

Ben shares examples of all-in-one solutions: Incorta, Keboola, Nexla, Mozart Data, Rivery.

{ MORE LINKS }

领英推荐

Pioneering the Next Generation of Vector Databases

Aishwarya Srinivasan 6 个月前

How IBM is building responsible AI with a data…

IBM Data, AI & Automation 3 个月前

RAG Pipeline Evaluation, Integrating Data Science and…

Open Data Science Conference (ODSC) 11 个月前

TOOLS AND TUTORIALS

Cube: API-First Business Intelligence | 5 min | BI?

A very nice semantic layer tool that is open source. Top features:

integration with dbt through: https://cube.dev/blog/dbt-metrics-meet-cube
sql interface
caching https://cube.dev/docs/caching

NEWS?

Scaling PyTorch models on Cloud TPUs with FSDP | 6 min | ML & MLOps | PyTorch Blog

To support model scaling on TPUs, we implemented the widely-adopted Fully Sharded Data Parallel (FSDP) algorithm for XLA devices as part of the PyTorch/XLA 1.12 release. This FSDP interface allowed us to easily build models with e.g. 10B+ parameters on TPUs and has enabled many research explorations.

{ MORE LINKS }

DATA LIBRARY

Data on Kubernetes 2022 | 17 pages | Kubernetes | DoK Community

A report from the DoK Community. Insights from over 500 executives and technology leaders on how data on Kubernetes has a transformative impact on organizations, regardless of size or tech maturity.?

Data on Kubernetes has a transformative impact on organizations. Respondents

see a direct link from running DoK and making big gains: the? majority of them (83%) attribute over 10% of their revenue to running data on Kubernetes. One-third of organizations saw their productivity increase two-fold.

?PODCAST

Project Lightspeed: Next-generation Spark Streaming | 41 min | Streaming | hosts: Ben Lorica; guests: Karthik Ramasamy | The Data Exchange Podcast

41 minutes about faster and simpler tools for new streaming applications.

?CONFS AND MEETUPS

Art of Scala | 16 November | Scala | Warsaw

A non-commercial conference organized by Scala enthusiasts for Scala engineers.

A Review of the Presentations at the DataMass Gdańsk Summit 2022 | Grzegorz Ko?pu?, Maciej Maciejko, Sylwia Ko?pu? | GetInData

This conference has passed, but from this review you can get many takeaways. Creme de la creme of DataMass 2022

?

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

Adam Kawa from GetInData

DATA Pill

2,557 位关注者

要查看或添加评论，请登录

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

2025年3月17日

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

Hi, Welcome to this week’s DATA Pill! We’ve got two Microsoft Fabric tutorials, AI insights from IBM Research, key data…
?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

2025年3月10日

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

Hi, This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty…
?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

2025年3月2日

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

Hi, The data world is moving fast. I bring you the latest in data engineering, AI, and analytics, from SQL tips to AI…

1 条评论
?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

2025年2月24日

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Hi, This week’s DATA Pill covers aligning data with business goals, key data trends for 2025, Apache Iceberg, and…

1 条评论
Mastering LLMs: 3 Blogs You Need to Read

2025年2月21日

Mastering LLMs: 3 Blogs You Need to Read

Large Language Models (LLMs) are at the forefront of technological innovation, transforming industries like e-commerce,…

1 条评论
?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

2025年2月17日

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

Hi, Train embeddings 400x faster, boost LLMs with knowledge graphs, and integrate real-time AI. Explore reasoning…

4 条评论
?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

2025年2月10日

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

Hi, Data engineering is shifting fast—ETL is evolving, AI is transforming search, and workflows are being redefined…
?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

2025年2月3日

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

Hi, This week, we're covering the latest in AI, data engineering, and distributed systems. From optimizing ETL…

1 条评论
?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

2025年1月27日

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

Hi, Dive into the latest trends, tutorials, and innovations shaping the data world. ARTICLES Exploring the Potential of…

2 条评论
?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

2025年1月20日

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

Hi, This week's highlights dive into AI-ready data strategies, real-time GenAI architectures, and a deep dive into the…

2 条评论

See all articles

DATA Pill #025 - Data Meshes Missing element, all-in-one data stack renesans and more

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES

领英推荐

TOOLS AND TUTORIALS

NEWS?

DATA LIBRARY

?PODCAST

?CONFS AND MEETUPS

?

DATA Pill

2,557 位关注者

Adam Kawa的更多文章

社区洞察

其他会员也浏览了

Cost-Effective Cloud Data Lakes, 10 Must-Read AI Books, and the Free ODSC East Open Pass

Governance in the Age of AI: A New Frontier in Data Management

Big Data Rules for AI: How to Build a Foundation That Actually Works

How 3DI’s JSON Format Reduces or Eliminates the Need for Graph Databases in Your AI Stack

Vector Database Revolution - Chroma, Pinecone, and Weaviate Explored

The Top 10 Data Science as a Service Companies Revolutionizing the Industry

Unleashing GenAI: How a Next-Gen Data Format is Revolutionizing AI Data Storage

Is Data Science Dead In 10 Years: Exploring The Future Of Data Science!

How to build your scale-up data infrastructure for AI workloads?

4 Database Trends Data-Intensive Businesses Need to Watch in 2025

ARTICLES

领英推荐

TOOLS AND TUTORIALS

NEWS?

DATA LIBRARY

?PODCAST

?CONFS AND MEETUPS

?

DATA Pill

2,557 位关注者

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Mastering LLMs: 3 Blogs You Need to Read

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

社区洞察

其他会员也浏览了

Cost-Effective Cloud Data Lakes, 10 Must-Read AI Books, and the Free ODSC East Open Pass

Governance in the Age of AI: A New Frontier in Data Management

Big Data Rules for AI: How to Build a Foundation That Actually Works

How 3DI’s JSON Format Reduces or Eliminates the Need for Graph Databases in Your AI Stack

Vector Database Revolution - Chroma, Pinecone, and Weaviate Explored

The Top 10 Data Science as a Service Companies Revolutionizing the Industry

Unleashing GenAI: How a Next-Gen Data Format is Revolutionizing AI Data Storage

Is Data Science Dead In 10 Years: Exploring The Future Of Data Science!

How to build your scale-up data infrastructure for AI workloads?

4 Database Trends Data-Intensive Businesses Need to Watch in 2025