Subject: ?? DATA Pill #098 - Deploy LLM in your Private Kubernetes Cluster, The Real Cost of Self-Hosting MLflow
Hi,
Another week, another data meat ready to serve.
The Skill Lake section strikes again! Join Data Learning Week.
Also, enjoy the tutorial on deploying LLM in your private Kubernetes cluster in 5 steps and more we found this week.
ARTICLES
Data Quality Error Detection powered by LLMs | 17 min | LLM | Simon Grah | Towards Data Science Blog
Read the first review of the introductory article on the Data Dirtiness Score, which explains the key assumptions and demonstrates how to calculate this score. It's the second in a series about cleaning data using Large Language Models (LLMs), with a focus on identifying errors in tabular data sets.
Unlocking Kafka's Potential: Tackling Tail Latency with eBPF | 7 min | Data Engineering | Maciej Mo?cicki, Piotr R?ysko | Allegro Tech Blog
This blog post describes Allegro’s team journey — how they used Kafka protocol sniffing and eBPF to identify and remove the performance bottleneck.
Evaluating Large Language Model (LLM) systems: Metrics, challenges, and best practices | 11 min | LLM | Jane Huang, Kirk Li, Daniel Yehdego | Data Science at Microsoft
This article thoroughly examines LLM system evaluation, distinguishing between model and system evaluation and scrutinizing online and offline strategies. It focuses on AI assessing AI and Responsible AI metrics. The article highlights the relevance of diverse evaluation tools and frameworks across application scenarios, urging readers to stay informed about evolving metrics and frameworks for a comprehensive understanding.
In MORE LINKS you will read about: How we expose data in BigQuery, The Real Cost of Self-Hosting MLflow
SKILL LAKE
Data Learning Week | Online | 8-11th April
Would you like to test one of our courses before investing money in it? Then come to our Data Learning Week, a series of 4 free hands-on workshops. Each session is a free first-trial lesson for the full training. We will also have a special bonus from the Academy for all workshop participants.?
Choose your topic, check agenda and sign up:
TUTORIALS
Deploy a custom Docker image on Azure ML using a blue-green deployment with Python | 13 min | ML | Timo Uelen | Xebia Blog
领英推荐
This tutorial dives into such a custom solution:
DATA TUBE
How to Deploy LLM in your Private Kubernetes Cluster in 5 STEPS | 17 min | LLM | Marcin Zab?ocki | GetInData | Part of Xebia
In this tutorial, Marcin Zab?ocki shows how to deploy LLM in your private Kubernetes cluster in 5 simple steps on the Mistral example.?
In MORE LINKS you will read about: Streams Forever: Kafka Summit London 2024 Keynote
PODCAST
ML for Finance and Storytelling through Data | 1 h 7 min | ML | Daniel Bashir, Ben Wellington
On challenges for ML in quantitative trading and investing, and telling stories through data.
CONFS EVENTS AND MEETUPS
Big Data Technology Warsaw Summit | Warsaw and Online | 10th and 11th April
Join the independent conference with an agenda with presentations arranged into nine categories – find your most desired topics! There are, for example:
And more! Learn from speakers from companies like Dropbox, IKEA, Cloudera, Allegro, Ververica, and Freenow.?
Shhh… Use the DataPill200 code to get the 200 PLN discount!
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill?
Adam from the GetInData | Part of Xebia
Awesome roundup in Data PILL #98! ?? It's super impressive how you highlighted the practical applications of LLMs and tackled topics like data quality error detection. Digging into the ethical use of data and LLMs could be a fantastic next step to sharpen your understanding even further. Have you thought about how these skills might shape your dream job in the tech world?