登录查看更多内容

Rust: Python’s New Best Friend – A Data Scientist’s Journey

Zoltan Varju

NLP and Data Science

发布日期: 2025年3月13日

As Python continues to dominate data science, a quiet revolution is happening underneath the surface. Increasingly, Rust is powering our most critical Python tools—bringing unprecedented performance while maintaining the Python interface we know and love. This hybrid approach transforms our work as data scientists, enabling rapid development and production-grade performance.

My journey with Rust began six years ago as a distant curiosity. I heard the name in conference talks and saw it climbing GitHub’s language popularity charts, but it remained just another programming language on my “maybe someday” list.

That changed when Hugging Face released their tokenizers package—a blazingly fast NLP preprocessing library written in Rust with Python bindings. The performance gains were impossible to ignore: what took seconds in pure Python implementations was now completed in milliseconds.

The Two Modes of Data Science Work

Data scientists typically work in one of two distinct modes:

Tool Users: This is where we conduct experiments, build models, analyze data, and generate insights. In this mode, we’re focused on delivering business value directly through our analysis and models.
Tool Makers: This is where we create infrastructure, utilities, and frameworks that enable more efficient work in the first mode. Here, we’re building the foundations that make our primary work sustainable and scalable.

About 95% of our time is spent in the first mode, where we see software as a means to an end, not the end itself. Our primary goal is delivering insights and solutions, with code being the vehicle rather than the destination.

This distinction explains why the Python+Rust combination is so powerful. When we’re in “tool user” mode, Python’s expressiveness and ecosystem are hard to beat. But Rust becomes an exceptional partner when we shift to “tool maker” mode—building components that must be fast, reliable, and resource-efficient.

The tools I’ve been gravitating toward—tokenizers, Ruff, Polars, and UV—exemplify this second mode. They’re not replacing Python for data science work but rather enhancing the infrastructure that makes that work possible and productive.

The Toolmaker’s Path vs. The One-Language Approach

Although I’m not a hardcore Rustacean, these developments inevitably drew me toward Rust. I’ve started learning the language and examining the source code of several high-quality projects. The Rust+Python combination feels like what I’d call “the toolmaker’s way”—build your performance-critical infrastructure in Rust, then expose a friendly Python API for widespread adoption.

This approach contrasts with Julia’s strategy. Julia aims to solve the two-language problem with a single language—an elegant, theoretically more cohesive approach. It’s gaining momentum, particularly in academic and research settings. The syntax feels natural for mathematical expressions, and the ability to go from high-level abstractions to low-level optimizations within the same language is appealing.

Yet, for now, the Python+Rust combination offers something uniquely practical: it leverages Python’s vast ecosystem while strategically replacing performance bottlenecks with Rust components. This hybrid approach doesn’t require wholesale migration to a new language—you can adopt it incrementally, one tool at a time.

The Two-Language Problem

Python has long faced what’s known as the “two-language problem“: we love Python for its readability and extensive ecosystem, but when performance matters, we’ve traditionally had to drop down to C, C++, or Fortran. This creates a significant cognitive load—maintaining expertise in two languages and managing their boundaries.

For decades, this was just the cost of doing business in the Python world:

NumPy, pandas, and SciPy? C and Fortran under the hood.
spaCy for NLP? C++ doing the heavy lifting.
Want to speed up your code? Learn Cython or write C extensions.
Building ML frameworks? Better get comfortable with C++.

Another approach is PyPy, an alternative implementation of Python with a JIT compiler that can significantly speed up pure Python code. While PyPy offers impressive performance gains, it has compatibility challenges with certain C extensions.

These approaches worked but brought their challenges: memory management headaches, segmentation faults, and build system complexity.

Enter Rust: The Game-Changer

Rust is addressing this problem in a uniquely compelling way. It offers C-like performance with memory safety guarantees and a modern developer experience. The transformation in my workflow over the past few years has been remarkable:

领英推荐

Things You Can Do with Python: Advanced and Special…

Towards Data Science 1 年前

GenAI Weekly — Edition 23

Shuveb Hussain 8 个月前

Python vs. R: The Ultimate Showdown for Data Scientists

Noorain Fathima 6 个月前

My Rust-Powered Python Toolkit

Hugging Face Tokenizers: My first encounter with Rust-powered Python. The performance difference was so dramatic it made me take notice of what Rust could offer.

Ruff: The Python linter that changed everything. Before Ruff, I was a huge fan of Black. What made me switch to Ruff wasn’t just its speed (though it is remarkably fast) but how it’s like Black on steroids. I love its extensive configuration options while still maintaining sensible defaults. Ruff’s comprehensive approach combines linting, formatting, and code quality checks in one tool, making it an essential part of my workflow.

Polars: A pandas-like DataFrame library that handles larger-than-memory datasets with ease. Operations that would bring pandas to its knees complete in seconds with Polars.

uv: My recent switch from poetry+pyenv to uv has been transformative. What I appreciate most isn’t just the speed (though installations that took minutes now complete in seconds) but having one cohesive tool for dependency management. It’s not about those saved seconds—they’re nice but not critical. What matters is having a clean, reliable, and coherent tool for managing my Python environments. uv delivers this with sensible defaults and straightforward usage patterns, making dependency management feel like less of a chore.

Astral (creators of Ruff and uv) is now working on a new Rust-based static type checker for Python. This development is particularly interesting as type checking becomes increasingly important in the Python ecosystem. Meta’s Pyre (written in OCaml) offers excellent performance and precision, while Google’s pytype provides another robust option.

Astral’s approach focuses on minimizing false positives on untyped code, making it easier for projects to adopt typing gradually. This addresses key limitations in existing solutions like mypy’s performance issues and pyright’s JavaScript dependencies.

Rediscovering the Joy of Coding with Zed

Perhaps most surprisingly, my editor itself is now Rust-powered. After years in PyCharm (which followed a long stint with Emacs), I’ve switched to Zed.

PyCharm served me well, but it became increasingly heavy and complex over time. While VS Code and PyCharm are excellent IDEs with comprehensive features, they can consume significant system resources. Despite having visual Git tools available, I often returned to the terminal for version control operations.

Zed brought back that “hacky feeling” I remembered from my Emacs days but with modern sensibilities and incredible performance. It’s a proper editor rather than a full IDE, which aligns perfectly with my workflow as someone who still appreciates command-line tools. The responsiveness creates an entirely different relationship with the code—there’s no waiting, just coding.

Why Good Developer Tools Matter for Data Scientists

The era when data scientists could get away with writing spaghetti code POCs and MVPs is over—if it was ever truly acceptable. Structuring your code, and maintaining a clean and reproducible development environment make your projects manageable and sustainable. We write code not just for computers but for our peers and our future selves!

Many of us in data science don’t come from computer science backgrounds. This lack of formal software engineering training is often used as an excuse: “OK, my code is messy, but it gets the job done.” What often distinguishes good data scientists from exceptional ones is their maturity in recognizing the need to follow software engineering best practices. The willingness to learn and apply these practices reflects a more profound understanding that sustainability matters as much as immediate results.

OOP principles, type hints, code formatters, and dependency management tools aren’t just software engineering niceties—they’re vital for sustainable data science work. This is why I’ve become increasingly invested in the quality of my development tools.

Looking Forward

We’re still in the early days of this transformation. Projects like Candle (Hugging Face’s Rust ML framework) suggest a future where more of our computational stack might benefit from Rust’s performance and safety guarantees.

For data scientists like myself, these tools provide immediate productivity boosts without requiring a complete reorientation of our workflow. Zed exemplifies this benefit—it’s not just a Rust showcase but a tool that makes me more productive while rekindling that “hacky feeling” I missed from my Emacs days.

If you haven’t explored these Rust-powered Python tools yet, I highly recommend giving them a try. Your development experience might never be the same again.

Future Tech Skills

2 周

Your article will surely provide valuable insights into this growing trend. Zoltan Varju

要查看或添加评论，请登录

Zoltan Varju的更多文章

Curating Wisdom: A Stoic Framework for Media Consumption

2025年3月26日

Curating Wisdom: A Stoic Framework for Media Consumption

“The happiness of your life depends upon the quality of your thoughts.” This wisdom from Marcus Aurelius reminds us…
Statistical Thinking as Philosophy: Essential Readings – Part I.

2025年3月19日

Statistical Thinking as Philosophy: Essential Readings – Part I.

“Philosophy of science without history of science is empty; history of science without philosophy of science is blind.”…
The illusion of homo rationalis: Large language models assume humans are rational decision-makers

2025年3月6日

The illusion of homo rationalis: Large language models assume humans are rational decision-makers

LARGE LANGUAGE MODELS (LLMs) have shown a remarkable ability to mimic human communication, making them increasingly…
Building Resilient Tech Teams: The Theory-Building Approach

2025年3月4日

Building Resilient Tech Teams: The Theory-Building Approach

In this article, I argue that tech teams’ success depends not only on their technical skills but also on their ability…

1 条评论
Falsification, Stoicism, and the Data-Driven Mindset: Building Resilient Strategies in Tech and Business

2025年2月26日

Falsification, Stoicism, and the Data-Driven Mindset: Building Resilient Strategies in Tech and Business

WHEN a carefully designed data science initiative falters despite months of development and substantial investment, the…

3 条评论
The Fourth Estate's Feedback Loop - When Reporting Shapes Reality

2025年2月20日

The Fourth Estate's Feedback Loop - When Reporting Shapes Reality

IN HIS SEMINAL work "The Open Society and Its Enemies," Karl Popper championed societies open to criticism and peaceful…

2 条评论
The consciousness conundrum: From science fiction to silicon minds

2025年2月13日

The consciousness conundrum: From science fiction to silicon minds

In the climactic scene of the 1982 film Blade Runner, a dying artificial being delivers a soliloquy about the memories…
When Math Makes Fools of Us All How a simple game show puzzle reveals the limits of human reason—and how computers might help

2025年2月11日

When Math Makes Fools of Us All How a simple game show puzzle reveals the limits of human reason—and how computers might help

In 1990, a seemingly innocuous puzzle published in Parade magazine sparked what might be called the Great Probability…
The Living Word: Why AI Can't Walk the Talk

2025年2月4日

The Living Word: Why AI Can't Walk the Talk

Three years ago, in a bland office at Google's headquarters, an engineer became convinced that the company's chatbot…

1 条评论

See all articles

Rust: Python’s New Best Friend – A Data Scientist’s Journey

Zoltan Varju

NLP and Data Science

The Two Modes of Data Science Work

The Toolmaker’s Path vs. The One-Language Approach

The Two-Language Problem

Enter Rust: The Game-Changer

领英推荐

My Rust-Powered Python Toolkit

Rediscovering the Joy of Coding with Zed

Why Good Developer Tools Matter for Data Scientists

Looking Forward

Zoltan Varju的更多文章

社区洞察

其他会员也浏览了

AI Text Detection in Python: How to Identify AI-Generated Content

A Gentle Introduction to XGBoost for Applied Machine Learning

Top 5 Python Frameworks For Machine Learning

The Rise of Python: A Tale of Triumph in the Realm of AI and ML

Why Newsdata.io is the best news API for you?

Data Imputation in Python: Bridging the Gaps in Your Dataset

Goodbye GIL, Hello Efficiency: How Python 3.13 can revolutionize ML Deployments

Python vs C++ for AI: Which Language Should You Choose???

Dask Library in Python: A Comprehensive Guide

From Code to Intelligence: Python's Role in AI and Machine Learning

The Two Modes of Data Science Work

The Toolmaker’s Path vs. The One-Language Approach

The Two-Language Problem

Enter Rust: The Game-Changer

领英推荐

My Rust-Powered Python Toolkit

Rediscovering the Joy of Coding with Zed

Why Good Developer Tools Matter for Data Scientists

Looking Forward

Zoltan Varju的更多文章

Curating Wisdom: A Stoic Framework for Media Consumption

Statistical Thinking as Philosophy: Essential Readings – Part I.

The illusion of homo rationalis: Large language models assume humans are rational decision-makers

Building Resilient Tech Teams: The Theory-Building Approach

Falsification, Stoicism, and the Data-Driven Mindset: Building Resilient Strategies in Tech and Business

The Fourth Estate's Feedback Loop - When Reporting Shapes Reality

The consciousness conundrum: From science fiction to silicon minds

When Math Makes Fools of Us All How a simple game show puzzle reveals the limits of human reason—and how computers might help

The Living Word: Why AI Can't Walk the Talk

社区洞察

其他会员也浏览了

AI Text Detection in Python: How to Identify AI-Generated Content

A Gentle Introduction to XGBoost for Applied Machine Learning

Top 5 Python Frameworks For Machine Learning

The Rise of Python: A Tale of Triumph in the Realm of AI and ML

Why Newsdata.io is the best news API for you?

Data Imputation in Python: Bridging the Gaps in Your Dataset

Goodbye GIL, Hello Efficiency: How Python 3.13 can revolutionize ML Deployments

Python vs C++ for AI: Which Language Should You Choose???

Dask Library in Python: A Comprehensive Guide

From Code to Intelligence: Python's Role in AI and Machine Learning