登录查看更多内容

?? DATA Pill #121 - Local & Free Multi-Agent RAG Superbot, Data Mesh - Where Are We Now?

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2024年9月9日

+ 关注

Hi,

Get ready for your data fix.

This week's DATA Pill covers diving into GPU memory for LLMs, Kafka on object storage, and more. Time to geek out!

ARTICLES

How do we run Kafka 100% on the object storage? | 13 min | Data Engineering | Vu Trinh | The Deep Hub Blog

This article explains how AutoMQ makes Kafka entirely run on object storage, enhancing scalability and performance by separating storage from computing. It covers key aspects like cache management, Write Ahead Log (WAL), object storage, recovery processes, and metadata management.

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)? | 4 min | LLM | Mastering LLM Blog

Understanding how to estimate GPU memory requirements is crucial for deploying Large Language Models (LLMs) like GPT or LLaMA. This article provides a formula to calculate the necessary GPU memory based on model parameters, precision, and overhead, ensuring efficient hardware utilization and avoiding bottlenecks during model deployment.

Why Did Databricks Open-Source Unity Catalog? | 6 min | Data Engineering | StarRocks Engineering Blog

Databricks open-sourced Unity Catalog to strengthen the open data ecosystem and highlight the maturity of lakehouse architecture. This move, alongside their acquisition of Tabular, is poised to significantly impact the data analytics landscape and boost the importance of open-source solutions.?

TUTORIALS

Pinot for Low-Latency Offline Table Analytics | 16 min | Data Engineering | Ankit Sultana, Caner Balci | Uber Engineering Blog

Explore how Uber uses Apache Pinot for over 100 low-latency analytics use cases. Read about Pinot's integration with batch sources like Apache Hive, enabling high-performance queries on large datasets through a self-serve platform for seamless data ingestion.

In MORE LINKS you will read:

Data Quality in Streaming: A Deep Dive into Apache Flink
Kimball dimensional data warehouse modelling: enabling simplicity at scale

领英推荐

RAG Pipeline Evaluation, Integrating Data Science and…

Open Data Science Conference (ODSC) 11 个月前

The March 2024 MinIO Newsletter

MinIO 1 年前

Machine Learning and Big Data: Are They the Future?

Analytics Insight? 8 个月前

{ MORE LINKS }

PODCAST

Generative AI in the Enterprise with Steve Holden, Senior Vice President and Head of Single-Family Analytics at Fannie Mae | 39 min | Gen AI | Adel Nehme, Steve Holden | DataFramed

In the episode, Adel and Steve explore generative AI opportunities, building a GenAI program, use-case prioritization, fostering an AI-first culture, skills transformation, governance as a competitive edge, scaling challenges, future AI trends, and more.

DATA TUBE

Dagster, SDF, & the Evolution of the Data Platform (A Dagster Deep Dive) | 42 min | Data Platform | Lukas Schulte, Pedram Navid | Dagster

Explore how the combined strengths of Dagster’s orchestration and SDF’s transformation capabilities can enhance your developer experience, streamline your data pipelines, reduce costs, and enhance data quality and reliability.

Key Takeaways:

Unified Workflow Management: Seamlessly integrate and manage your data workflows.
Enhanced Data Quality: Ensure consistent and reliable data through advanced transformation techniques.
Improved Developer Experience: Experience lightning-fast execution and robust SQL validation with SDF

CONFS, EVENTS AND MEETUPS

Harnessing DuckDB in the Cloud | Webinar | 13th September

Explore Motherduck's innovative features powered by DuckDB. Learn how it enhances the data stack, use cases, and upcoming integrations.

Data Mesh - Where Are We Now? | Webinar | 16th September

Zhamak Dehgani introduced data mesh principles five years ago to decentralize data ownership and improve scalability. Organizations have since experimented with this approach. Join a webinar to learn about early insights, practical versions, and tips for successful implementation.

_______________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia

DATA Pill

2,553 位关注者

Taís L. Pereira

Data Analyst | Analytics Engineer

6 个月

Thank you for the mention!

2 次回应

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

6 个月

This DATA PILL is brimming with insights into the cutting edge of data technology! From exploring the potential of object storage for Kafka to delving into the complexities of streaming data quality, these topics are shaping the future of how we handle and analyze information. With advancements like Pinot enabling low-latency analytics and Dagster revolutionizing data platforms, what innovative use cases can you envision emerging from this convergence of technologies?

查看更多评论

要查看或添加评论，请登录

Adam Kawa的更多文章

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

2025年3月2日

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

Hi, The data world is moving fast. I bring you the latest in data engineering, AI, and analytics, from SQL tips to AI…

1 条评论
?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

2025年2月24日

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Hi, This week’s DATA Pill covers aligning data with business goals, key data trends for 2025, Apache Iceberg, and…

1 条评论
Mastering LLMs: 3 Blogs You Need to Read

2025年2月21日

Mastering LLMs: 3 Blogs You Need to Read

Large Language Models (LLMs) are at the forefront of technological innovation, transforming industries like e-commerce,…

1 条评论
?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

2025年2月17日

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

Hi, Train embeddings 400x faster, boost LLMs with knowledge graphs, and integrate real-time AI. Explore reasoning…

4 条评论
?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

2025年2月10日

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

Hi, Data engineering is shifting fast—ETL is evolving, AI is transforming search, and workflows are being redefined…
?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

2025年2月3日

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

Hi, This week, we're covering the latest in AI, data engineering, and distributed systems. From optimizing ETL…

1 条评论
?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

2025年1月27日

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

Hi, Dive into the latest trends, tutorials, and innovations shaping the data world. ARTICLES Exploring the Potential of…

2 条评论
?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

2025年1月20日

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

Hi, This week's highlights dive into AI-ready data strategies, real-time GenAI architectures, and a deep dive into the…

2 条评论
?? DATA Pill #139 - PySpark Fixes, Netflix Insights & BigQuery Tools!

2025年1月13日

?? DATA Pill #139 - PySpark Fixes, Netflix Insights & BigQuery Tools!

Hi, This week’s data goodies are here! From PySpark fixes to Netflix hacks and BigQuery tools, we’ve got everything you…

1 条评论
?? DATA Pill #138 - Parquet & AI = ??♂???? Archetypes of LLM apps

2025年1月6日

?? DATA Pill #138 - Parquet & AI = ??♂???? Archetypes of LLM apps

Hi, Here’s what’s new in data engineering this week! Dive into real-time pipelines, explore AI’s role in coding, and…

2 条评论

See all articles

?? DATA Pill #121 - Local & Free Multi-Agent RAG Superbot, Data Mesh - Where Are We Now?

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

Hi,

ARTICLES

TUTORIALS

领英推荐

PODCAST

DATA TUBE

CONFS, EVENTS AND MEETUPS

DATA Pill

2,553 位关注者

Adam Kawa的更多文章

社区洞察

其他会员也浏览了

Empowering Data Science and GenAI with Snowflake

Analytics and Data Science News for the Week of January 24; Updates from Anaconda, Databricks, Dataiku & More

Data and AI Newsletter

Analytics and Data Science News for the Week of August 23; Updates from Domino Data, Reliant AI, Starburst & More

Top 5 Skills Every Data Scientist Needs in 2025

Vertica Insights Newsletter | September 2022

Exciting Developments in Data Science: Trends for 2024

Data Science and AI Trends 2021 Rundown

CRISP-DM, CD4ML or ModelOps: looking beyond just data

The Crucial Role of Data Engineering in AI and Data Science

Hi,

ARTICLES

TUTORIALS

领英推荐

PODCAST

DATA TUBE

CONFS, EVENTS AND MEETUPS

DATA Pill

2,553 位关注者

Adam Kawa的更多文章

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Mastering LLMs: 3 Blogs You Need to Read

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

?? DATA Pill #139 - PySpark Fixes, Netflix Insights & BigQuery Tools!

?? DATA Pill #138 - Parquet & AI = ??♂???? Archetypes of LLM apps

社区洞察

其他会员也浏览了

Empowering Data Science and GenAI with Snowflake

Analytics and Data Science News for the Week of January 24; Updates from Anaconda, Databricks, Dataiku & More

Data and AI Newsletter

Analytics and Data Science News for the Week of August 23; Updates from Domino Data, Reliant AI, Starburst & More

Top 5 Skills Every Data Scientist Needs in 2025

Vertica Insights Newsletter | September 2022

Exciting Developments in Data Science: Trends for 2024

Data Science and AI Trends 2021 Rundown

CRISP-DM, CD4ML or ModelOps: looking beyond just data

The Crucial Role of Data Engineering in AI and Data Science