How python optimizes your code?
Python is a high-level language that does a lot of under the hood work to make life easier for its users. But it is also beneficial for the user to know these optimisations to fully utilise these functionalities and also for quenching the pure quest of digging deeper into the mechanics of python.
Interning
This is the python way of not reinventing the wheel. It enables python to reuse objects instead of creating them every time.
For example, python caches (pre-loads) a list of integers [-5, 256]. As these small integers are used pretty often so each time they are used python uses the cached version instead of creating them from scratch.
Output: 0x7ffebc48adf0 0x7ffebc48adf0
We can see that although 'a' and 'b' are different variables having the same value 10, even the address of the object to which they are pointing in memory is the same. This is how python avoided the process of putting 10 inside a new memory location and pointing 'b' to that location.
String interning
Leveraging the fact that strings are immutable objects meaning they cannot be altered once created just like tuples, python caches minimalistic strings.
For example,
Output: 0x15ec0041db0 0x15ec0041db0
As we can see, here Python again once created the string 'hello' and put it into memory while the second time it reused the cached version. One thing to note here is that python doesn't pre-loads all strings but this is more of reusing on the fly.
Forceful string interning
Using the sys package, we can forcefully intern a string - meaning can assign multiple instances of that string to be pinpointed to the same memory address.
Let's see an example,
Output: 0x15ec004dcf0 0x15ec004da70 0x15ec004d970 0x15ec004d970
Forceful string interning is useful in situations where we want to do many string comparisons and the occurrence of strings is repeated. In many NLP problems, we can use this technique to achieve great results.
For comparison sake, I tried comparing 2 strings once without interning and once with interning. The performance enhancements are unimaginable because when comparing string normally using == operator, python tries to match each alphabet character by character. In the case of very long strings, this is a pretty tedious task. But after interning using the is operator we can just compare the addresses of the strings to be compared instead of character by character comparison.
Peephole optimization
As the name suggests, python tries to look through the peephole and identify what all numerical expressions, strings and tuples it might need to calculate during execution. The results of such expressions are already pre-calculated and stored in bytecode before execution.
Let's take an example to better understand,
Output: (None, 30, 'quick testquick testquick testquick testquick testquick testquick testquick testquick testquick test', (4, 5, 4, 5, 4, 5, 4, 5, 4, 5), 1, 3, 6)
co_consts when used with any function, returns a tuple of literals that occur inside the function body. What optimization happened for each variable -
- a - instead of storing individual literals and calculating the result at runtime, python pre-calculates the output of the expression
- b - here again as strings are immutable objects, so python stores the resultant of the string multiplication
- c - tuples again being the immutable result is pre-calculated
- d - list being a mutable object, python refrains from storing the result of the expression but instead stores the individual literals being used
Membership tests
Python also speeds up membership tests by converting the data types to their immutable versions. We will agree that checking for the existence of a number inside a list is far more time taking than inside a tuple because of the latter being immutable. This is what is being leveraged by python.
Output: (None, frozenset({1, 2, 3, 4, 5}), (6, 7, 8, 9, 10))
We can see a simple set gets converted to a frozenset and a list to tuple.
To conclude, we covered the various optimizations that python does under the hood for speeding up our code. A better understanding of these will help us not only fully leverage these but also enhance our coding skill.
Systems Performance @ Workday
3 年Good and Useful info. Especially the membership operator. Thanks for sharing this.
at Morgan Stanley
3 年Some interesting stuff there! Thanks for sharing.
Senior Director at SymphonyAI
3 年Nice article Shaleen Taneja ,In current scenario where python programming has been growing as primary language , there is an immense need to follow best practices and optimizations