?? DATA Pill #121 - Local & Free Multi-Agent RAG Superbot, Data Mesh - Where Are We Now?

?? DATA Pill #121 - Local & Free Multi-Agent RAG Superbot, Data Mesh - Where Are We Now?

Hi,

Get ready for your data fix.

This week's DATA Pill covers diving into GPU memory for LLMs, Kafka on object storage, and more. Time to geek out!

ARTICLES

How do we run Kafka 100% on the object storage? | 13 min | Data Engineering | Vu Trinh | The Deep Hub Blog

This article explains how AutoMQ makes Kafka entirely run on object storage, enhancing scalability and performance by separating storage from computing. It covers key aspects like cache management, Write Ahead Log (WAL), object storage, recovery processes, and metadata management.

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)? | 4 min | LLM | Mastering LLM Blog

Understanding how to estimate GPU memory requirements is crucial for deploying Large Language Models (LLMs) like GPT or LLaMA. This article provides a formula to calculate the necessary GPU memory based on model parameters, precision, and overhead, ensuring efficient hardware utilization and avoiding bottlenecks during model deployment.

Why Did Databricks Open-Source Unity Catalog? | 6 min | Data Engineering | StarRocks Engineering Blog

Databricks open-sourced Unity Catalog to strengthen the open data ecosystem and highlight the maturity of lakehouse architecture. This move, alongside their acquisition of Tabular, is poised to significantly impact the data analytics landscape and boost the importance of open-source solutions.?

TUTORIALS

Pinot for Low-Latency Offline Table Analytics | 16 min | Data Engineering | Ankit Sultana, Caner Balci | Uber Engineering Blog

Explore how Uber uses Apache Pinot for over 100 low-latency analytics use cases. Read about Pinot's integration with batch sources like Apache Hive, enabling high-performance queries on large datasets through a self-serve platform for seamless data ingestion.


In MORE LINKS you will read:

  • Data Quality in Streaming: A Deep Dive into Apache Flink
  • Kimball dimensional data warehouse modelling: enabling simplicity at scale

{ MORE LINKS }

PODCAST

Generative AI in the Enterprise with Steve Holden, Senior Vice President and Head of Single-Family Analytics at Fannie Mae | 39 min | Gen AI | Adel Nehme, Steve Holden | DataFramed

In the episode, Adel and Steve explore generative AI opportunities, building a GenAI program, use-case prioritization, fostering an AI-first culture, skills transformation, governance as a competitive edge, scaling challenges, future AI trends, and more.

DATA TUBE

Dagster, SDF, & the Evolution of the Data Platform (A Dagster Deep Dive) | 42 min | Data Platform | Lukas Schulte, Pedram Navid | Dagster

Explore how the combined strengths of Dagster’s orchestration and SDF’s transformation capabilities can enhance your developer experience, streamline your data pipelines, reduce costs, and enhance data quality and reliability.

Key Takeaways:

  • Unified Workflow Management: Seamlessly integrate and manage your data workflows.
  • Enhanced Data Quality: Ensure consistent and reliable data through advanced transformation techniques.
  • Improved Developer Experience: Experience lightning-fast execution and robust SQL validation with SDF

CONFS, EVENTS AND MEETUPS

Harnessing DuckDB in the Cloud | Webinar | 13th September

Explore Motherduck's innovative features powered by DuckDB. Learn how it enhances the data stack, use cases, and upcoming integrations.

Data Mesh - Where Are We Now? | Webinar | 16th September

Zhamak Dehgani introduced data mesh principles five years ago to decentralize data ownership and improve scalability. Organizations have since experimented with this approach. Join a webinar to learn about early insights, practical versions, and tips for successful implementation.

_______________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia

Taís L. Pereira

Data Analyst | Analytics Engineer

6 个月

Thank you for the mention!

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

6 个月

This DATA PILL is brimming with insights into the cutting edge of data technology! From exploring the potential of object storage for Kafka to delving into the complexities of streaming data quality, these topics are shaping the future of how we handle and analyze information. With advancements like Pinot enabling low-latency analytics and Dagster revolutionizing data platforms, what innovative use cases can you envision emerging from this convergence of technologies?

回复

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了