?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?
Hi,
This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty tutorials on RAG accuracy, caching, automation, and more. Let’s go!
ARTICLE
DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data | 5 min | Data Engineering | Mehdi Ouazza | Personal Blog
DeepSeek’s smallpond extends DuckDB to distributed computing using Ray and a custom storage system, balancing scalability with added complexity.
How much does it cost to run DeepSeek-R1 locally? | 3 min | AI | Mehul Gupta | Data Science in your pocket
Running DeepSeek-R1 (671B params) locally? It’ll set you back ~$106K in hardware alone—GPUs, RAM, storage, and cooling make it an enterprise-scale investment.
In MORE LINKS you will read:
TUTORIALS
Step-by-Step Guide to Boosting Enterprise RAG Accuracy | 8 min | RAG | Madhukar Kumar | Software, AI and Marketing
Improve retrieval from PDFs using semantic chunking, entity extraction, and knowledge graphs—enhancing RAG/KAG performance.
FacetController: How we made infrastructure changes at Lyft simple | 7 min | DevOps | Miguel Molina, Arvind Subramanian | Lyft Engineering Blog Lyft’s Kubernetes-based FacetController automates deployments, scales infra efficiently, and eliminates mass redeployments.
In MORE LINKS you will read:
DATA TUBE
Agentic AI: A Progression of Language Model Usage | AI | 57 min | Insop Song | Stanford Online
A webinar on agentic LMs—covering planning, tool usage, and iterative workflows to enhance AI performance.
CONFS, EVENTS AND MEETUPS
OpenLineage Meetup @ Google | Warsaw | 31th March
Discuss the challenges of data lineage and how OpenLineage is simplifying metadata collection across pipelines.
PINNACLE PICKS
Your last week top picks:
AI-Ready Organization How AI is Changing the Hiring Process | 3 min | AI | Giovanni Lanzani | Xebia Blog
AI is transforming recruitment by automating screening and improving efficiency. However, human judgment remains irreplaceable. This article explores how organizations can optimize AI for fair and ethical hiring decisions.
Cloud Native Warsaw - March 2025 Edition | Warsaw | 12th March
Learn how to deploy AI inference workloads on Amazon EKS using Terraform, Triton Inference Server, and Prometheus Adapter for autoscaling, monitoring, and optimization.
SQL is all you need! | 5 min | Data Analytics | Paul Marcombes | Google Cloud - Community Blog
SQL is at the heart of modern data operations, eliminating the need for external tools and custom scripts. Learn how Nickel’s approach enables self-service analytics through a governed SQL function catalog using BigFunctions, an open-source framework.
____________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill
Adam from the GetInData | Part of Xebia