登录查看更多内容

Better Visualizations, Advanced ETL Techniques, RAG Pain Points, and Other February Must-Reads

Towards Data Science

Your home for data science. A publication sharing concepts, ideas and codes.

发布日期: 2024年2月29日

February might be the shortest month, but it certainly didn’t feel this way here at TDS, where our authors have been on top of their game, sharing strong contributions on timely topics?—?including some of the longest and most-read articles of the year so far.

Now that most of us have settled into the flow of things in 2024, we see our readers focus slightly less on career moves and more on core skills and concrete solutions to common issues. Our most-read and -discussed articles of the past month reflect that, and below you’ll find a representative sample of our February standouts.

Monthly Highlights

The Math Behind the Adam Optimizer. In a clear, accessible, and widely shared explainer, Cristian Leo unpacks the mathematical inner workings of the Adam (Adaptive Moment Estimation) optimizer and, along the way, helps us understand why it’s become such a popular choice among deep learning practitioners.
12 RAG Pain Points and Proposed Solutions. While retrieval-augmented generation continues to make waves as a powerful option for boosting LLMs’ performance, its shortcomings are becoming clearer, too. Wenqi Glantz offers a useful resource for anyone who’s felt stuck implementing a RAG system recently, compiling 12 common pitfalls as well as suggested workarounds.
Data Visualization 101: Playbook for Attention-Grabbing Visuals. For anyone looking to create “clearer, sharper and smarter visuals”—and who isn’t, really?—the latest data-visualization guide by Mariya Mansurova is essential reading, as it leverages numerous concrete examples (in Plotly) to showcase essential design principles in action.

Advanced ETL Techniques for Beginners. If you’re an early-stage data engineer who’d like to give your data-ingestion skills a boost, Mike Shakhomirov MBA ’s new tutorial is one you should definitely explore (and bookmark): it covers typical ingestion patterns and provides code snippets you can use to start tinkering on your own.
Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation. Interested in diving further into the exciting world of RAG? Leonie Monigatti explains the nitty-gritty details of pre-retrieval, retrieval, and post-retrieval optimizations, before walking us through the process of transforming a “naive” RAG pipeline into an advanced one.
Top Evaluation Metrics for RAG Failures. We turn to RAG one final time this week, this time for Amber R. ’s most recent contribution: a handy resource on troubleshooting unexpected or underwhelming performance, and on applying robust response and retrieval evaluation metrics to ensure all the pieces in your pipeline are working in harmony.
Building a Data Platform in 2024. Three years after first tackling this topic, we were thrilled to welcome back Dave Melillo , whose new post reevaluates the key components of efficient data platforms. He shares valuable insights based on his experience navigating the data challenges of various industries, and having worked with both “large corporations and nimble startups.”

An Extra Dose of?Python

Some of our most popular posts in the past few weeks covered the always-timely topic of Python programming for data and ML professionals. In case you missed them:

领英推荐

Stacking Up: ETL vs ELT

Pete DeOlympio 2 年前

ELT is Dead!

Michael Bitter 4 年前

An event driven single ELT pipeline in Snowflake

Kyoung Shin 3 年前

How should you go about learning Python as a total beginner? Egor Howell offers a clear and practical roadmap.
If you’re not familiar with the @property decorator yet, you definitely will be by the time you finish reading Siavash Yasini ’s comprehensive introduction.
Anyone into AI app-building should take a look at Naomi Kriger ’s hands-on tutorial on creating a speech-to-text-to-speech program with the pyttsx3 library.
Taking inspiration from Robert C. Martin’s time-tested Clean Code book, Patrick Brus outlines the core principles behind writing—you guessed it—clean and effective Python code.
For even more Python tutorials and project walkthroughs, don’t miss our recent roundup of advanced and niche use cases.

Our latest cohort of new?authors

Every month, we’re thrilled to see a fresh group of authors join TDS, each sharing their own unique voice, knowledge, and experience with our community. If you’re looking for new writers to explore and follow, just browse the work of our latest additions, including Sarthak Handa, Vadim Arzamasov, Mahyar Aboutalebi, Ph.D. ??, James W, Mohammed Mohammed, Kirsten Jiayi Pan, Matthew Chak, Ugur Yildirim, Mikayil Ahadli, Hamza Gharbi, Sami Abboud, Matthew Gunton, Eivind Kjosbakken, Eva Revear, Nithhyaa Ramamoorthy, Rami Krispin, Kennedy Selvadurai, PhD, Vassily Morozov, Patrick Beukema, Thomas Rouch, Ritanshi Agarwal, Rohan Nanda, Nikolaus Correll, Mert Ersoz, Dani Lisle, Roberta Rocca, Adil Rizvi, Matthew Turk, Celia Banks, Ph.D., Skylar Jean Callis, Ryan McDermott, Anand Subramanian, Aayush Agarwal, P.G. Baumstarck, Jose D. Hernandez-Betancur, Khin Yadanar Lin, and Daniel Kang, among others.

Thank you for supporting the work of our authors! If you’re feeling inspired to join their ranks, why not write your first post? We’d love to read it.

Until the next Variable,

TDS Team

Amber R.

Developing Generative AI and LLM Content and Messaging

7 个月

Excellent pieces!

2 次回应

要查看或添加评论，请登录

Better Visualizations, Advanced ETL Techniques, RAG Pain Points, and Other February Must-Reads

Towards Data Science

Your home for data science. A publication sharing concepts, ideas and codes.

Monthly Highlights

An Extra Dose of?Python

领英推荐

Our latest cohort of new?authors

更多精彩文章

社区洞察

其他会员也浏览了

An event driven single ELT pipeline in Snowflake

Building a Seamless ETL Pipeline with AWS Glue

How dbt fits into ETL /ELT

Twitch Stream Data Analytics ETL Pipeline

BI - From ETL > ELT > to NO ETL?

Zero ETL: Data's New BFF for a Chill Integration Scene

Real-Time ETLT: Meeting the Demands of Modern Data Processing

Our three favourite data analytics tools of 2015

Incremental Loading into Redshift from S3 (Using Matillion ETL)

Monthly Highlights

An Extra Dose of?Python

领英推荐

Our latest cohort of new?authors

Graph RAG, Automated Prompt Engineering, Agent Frameworks, and Other September Must-Reads

2024年10月3日

A Close Look at AI Pain Points, and How to (Sometimes) Resolve Them

2024年9月26日

How to Build Your Own Roadmap for a Successful Data Science Career

2024年9月19日

The Data All Around Us: From Sports to Household Management

2024年9月12日

The Latest on LLMs: Decision-Making, Knowledge Graphs, Reasoning Skills, and More

2024年9月5日

LLMs, AI Agents, the Economics of Generative AI, and Other August Must-Reads

2024年8月29日

LLM Agents, Text Vectorization, Advanced SQL, and Other Must-Reads by Our Newest Authors

2024年8月22日

Vision Transformers, Contrastive Learning, Causal Inference, and Other Deep Dives You Shouldn’t Miss

2024年8月15日

The Big Questions Shaping AI Today

2024年8月8日

SQL Optimization, Data Science Portfolios, and Other July Must-Reads

2024年8月1日

社区洞察

其他会员也浏览了

An event driven single ELT pipeline in Snowflake

Building a Seamless ETL Pipeline with AWS Glue

How dbt fits into ETL /ELT

Twitch Stream Data Analytics ETL Pipeline

BI - From ETL > ELT > to NO ETL?

Zero ETL: Data's New BFF for a Chill Integration Scene

Real-Time ETLT: Meeting the Demands of Modern Data Processing

Our three favourite data analytics tools of 2015

Incremental Loading into Redshift from S3 (Using Matillion ETL)