登录查看更多内容

You can’t and shouldn’t do Everything in Python

Nira Amit

Big-data and Cloud engineer. Blows the whistle on imposters who write bad and vulnerable code, don't hire me if you have anything to hide ??

发布日期: 2024年9月26日

With all the buzz around AI and Machine-Learning, lots of people learned Python. Which is great, because Python is an excellent programming language for that. You have libraries like Pandas for statistical calculations, and loads of powerful NLP libraries.

However, since what people really want to do is process data at scale and serve the results of these processes – there’s a lot to build around the Python programs themselves.

And here’s where it gets problematic: when inexperienced yet over-confident Python programmers decide that they can just do Everything in Python, and convince their non-technical managers that it’s possible. When it really, really isn’t.

Let’s start with the fact that Python is simply not good for data-typing, data conversions, applying and maintaining a schema, or even copying a vector.

As an example, here's something that sounds easy enough but isn't:

Suppose you have a list of integers to process, and you want to put them in a vector. But you’re reading them from a text file. So first you have to convert the strings to integers, but to which kind? Python’s default integer? PyArrow’s int64? NumPy’s int32?

Each of these options may or may not produce different results, depending on the Python version and the system-architecture your code is running on. For example: with Python 2.x you’ll get 32-bit integers as the default Python “int” on certain Windows systems. So if you were counting on 64-bit integers – you’ll get an overflow for large enough values.

And if this didn’t deter you because you work with Python 3.x on Linux – in the Pandas’ documentation you’ll find that things like trying to copy or mutate values don’t always behave as expected, and can even produce outright unpredictable results. You may have seen SettingWithCopyWarning messages while building your code – that’s what the warning refers to.

领英推荐

How To Work with Literals in Python?

Learnbay 2 年前

How to implement Bubble Sort in Python - NareshIT

Naresh i Technologies 2 年前

How Much Python Do You Need to Learn for Data Analysis?

Benjamin Bennett Alexander 1 个月前

So if your plan is to load data as text into a Pandas dataframe and then convert columns to numbers – just don’t. Even if you manage to convert the data, you’ll most likely also end up with “Not a Number” values all over the place, and those spread to other columns when you apply mathematical operations on vectors...

There’s a reason why Spark provides typed-dataframes only in Java and Scala, but not in Python. Ingestion of data, typing and applying a schema should all be done in one of those strongly-typed languages, which run on a JVM and hence ensure that you “write once, run anywhere” and your code will always produce the same results. Python does not guarantee this, and don’t let your software developers find this out by trying and failing to deliver even the ETL part of the system. Once they have clean data in the correct schema as output from the Java/Scala part of the pipeline – then they can pass it through the Python code.

And last but not least - when it comes to the web-services around your data:

Do you want things like input-validation, JSON support, schema-evolution support, connection to pretty much any storage type, CSRF, CORS, any type of authentication method, roles and permissions, seamless back-end/front-end integration – all out of the box? Do you want your developers to spin up the full-blown web-application in minutes and then just have to configure it and add your business-logic? Then you want Java’s spring-boot. There’s no such thing in Python. Or in any other programming language, that I know of.

Bottom line – my strong advice is to always use best-practices, even if there’s a learning-curve for your developers. In the long run it will really save you a lot of time and money.

要查看或添加评论，请登录

Nira Amit的更多文章

Creepy Mediocre White Men in Power

2025年3月2日

Creepy Mediocre White Men in Power

Trigger warning: SA Do you know who else is a college-dropout, like Bill Gates, Elon Musk, Mark Zuckerberg and Sam…
OMG stop falling for these "Genius" college dropouts!!

2025年2月18日

OMG stop falling for these "Genius" college dropouts!!

Just to be clear: I myself have mentored in Java people who dropped out of college, didn't go to college, or had a…
People who can't do the job - point the finger.

2025年1月31日

People who can't do the job - point the finger.

I hear that the demented-pumpkin who conned his way back into the White House is blaming the latest airplane disaster…

5 条评论
We are the lost ancient global civilization

2025年1月18日

We are the lost ancient global civilization

I keep seeing conspiracy theories about some ancient lost global civilization, which ruled the world thousands of years…
Important Notice: All Cryptocurrencies are a Scam.

2024年12月18日

Important Notice: All Cryptocurrencies are a Scam.

A pyramid scheme, to be precise. I’m no economist but I do understand blockchain technology, and it has nothing to do…

3 条评论
Mediocre Token-Women in Leadership

2024年11月19日

Mediocre Token-Women in Leadership

There are 33 ministers in the current Israeli government, 5 of which are women. Three of these women are “Ministers…
When were White People actually superior?

2024年10月26日

When were White People actually superior?

I’ve been living in the Netherlands for the past 8 years or so, and it’s by far the best country I’ve lived in. Before…

1 条评论
Spark performance issues? Check your Ser/De

2024年10月19日

Spark performance issues? Check your Ser/De

Serialization and de-serialization is the process of turning an object used in your code into/from a stream of 0s and…
Is the full-hierarchy of corporate ownership difficult to calculate?

2024年10月7日

Is the full-hierarchy of corporate ownership difficult to calculate?

I’ve worked for several financial institutions already, and sometimes my team would need corporate-hierarchy to deliver…

1 条评论
How to know if your Lead Software-Engineer is delivering

2024年9月19日

How to know if your Lead Software-Engineer is delivering

Since it’s not very helpful to only complain about stuff without offering a solution - here’s what I would suggest to…

See all articles

You can’t and shouldn’t do Everything in Python

Nira Amit

Big-data and Cloud engineer. Blows the whistle on imposters who write bad and vulnerable code, don't hire me if you have anything to hide ??

领英推荐

Nira Amit的更多文章

社区洞察

其他会员也浏览了

What are the Reasons behind Increasing Demand for Python?

Which one is the best programming language currently, and why is it Python?

The Anatomy of a Python Class

Python Fundamentals: The Building Blocks of Python Language.

Let's talk about Python

Python for Productivity, Rust for Performance: A Match Made in Heaven

Python Syntax and Data Types: A Friendly Introduction for Beginners

How to use Python dictionaries

How to Simulate a Text File in Python?

Python vs. Mosel - When/Where/Why

领英推荐

Nira Amit的更多文章

Creepy Mediocre White Men in Power

OMG stop falling for these "Genius" college dropouts!!

People who can't do the job - point the finger.

We are the lost ancient global civilization

Important Notice: All Cryptocurrencies are a Scam.

Mediocre Token-Women in Leadership

When were White People actually superior?

Spark performance issues? Check your Ser/De

Is the full-hierarchy of corporate ownership difficult to calculate?

How to know if your Lead Software-Engineer is delivering

社区洞察

其他会员也浏览了

What are the Reasons behind Increasing Demand for Python?

Which one is the best programming language currently, and why is it Python?

The Anatomy of a Python Class

Python Fundamentals: The Building Blocks of Python Language.

Let's talk about Python

Python for Productivity, Rust for Performance: A Match Made in Heaven

Python Syntax and Data Types: A Friendly Introduction for Beginners

How to use Python dictionaries

How to Simulate a Text File in Python?

Python vs. Mosel - When/Where/Why