?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

Hi,

This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty tutorials on RAG accuracy, caching, automation, and more. Let’s go!

ARTICLE

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data | 5 min | Data Engineering | Mehdi Ouazza | Personal Blog

DeepSeek’s smallpond extends DuckDB to distributed computing using Ray and a custom storage system, balancing scalability with added complexity.

How much does it cost to run DeepSeek-R1 locally? | 3 min | AI | Mehul Gupta | Data Science in your pocket

Running DeepSeek-R1 (671B params) locally? It’ll set you back ~$106K in hardware alone—GPUs, RAM, storage, and cooling make it an enterprise-scale investment.

In MORE LINKS you will read:

  • Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control

{ MORE LINKS }

TUTORIALS

Step-by-Step Guide to Boosting Enterprise RAG Accuracy | 8 min | RAG | Madhukar Kumar | Software, AI and Marketing

Improve retrieval from PDFs using semantic chunking, entity extraction, and knowledge graphs—enhancing RAG/KAG performance.

FacetController: How we made infrastructure changes at Lyft simple | 7 min | DevOps | Miguel Molina, Arvind Subramanian | Lyft Engineering Blog Lyft’s Kubernetes-based FacetController automates deployments, scales infra efficiently, and eliminates mass redeployments.

In MORE LINKS you will read:

  • The caching strategy of our Teads SSP
  • Are you ready for MLOps??
  • A practical n8n workflow example from A to Z — Part 1: Use Case, Learning Journey and Setup

{ MORE LINKS }

DATA TUBE

Agentic AI: A Progression of Language Model Usage | AI | 57 min | Insop Song | Stanford Online

A webinar on agentic LMs—covering planning, tool usage, and iterative workflows to enhance AI performance.

CONFS, EVENTS AND MEETUPS

OpenLineage Meetup @ Google | Warsaw | 31th March

Discuss the challenges of data lineage and how OpenLineage is simplifying metadata collection across pipelines.

PINNACLE PICKS

Your last week top picks:

AI-Ready Organization How AI is Changing the Hiring Process | 3 min | AI | Giovanni Lanzani | Xebia Blog

AI is transforming recruitment by automating screening and improving efficiency. However, human judgment remains irreplaceable. This article explores how organizations can optimize AI for fair and ethical hiring decisions.

Cloud Native Warsaw - March 2025 Edition | Warsaw | 12th March

Learn how to deploy AI inference workloads on Amazon EKS using Terraform, Triton Inference Server, and Prometheus Adapter for autoscaling, monitoring, and optimization.

SQL is all you need! | 5 min | Data Analytics | Paul Marcombes | Google Cloud - Community Blog

SQL is at the heart of modern data operations, eliminating the need for external tools and custom scripts. Learn how Nickel’s approach enables self-service analytics through a governed SQL function catalog using BigFunctions, an open-source framework.

____________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察