Python Optimization Basics 1 - Profiling

Python Optimization Basics 1 - Profiling

Knowing when and where to start

This is part 1 in a series of articles regarding my experiences and learning with optimizing python code.

See the Table of Contents

The git repository for code samples for this and the following articles can be found here

In this series I shall be going through how I approach (and have approached) various major performance improvements in the codebases I work on, and how you should go about thinking things.

tldr: it's all about first principles

I don't think I need to outline what optimization is, but why you need it is fairly important and critical. In general there are 2 main reasons why you want to optimize code:

  1. Speeding up an extremely slow bottleneck
  2. Speeding up a compounding controllable operation

The first reason is fairly obvious, and it is the one we consider when speaking about optimization. Some function or piece of code is really slow, slow enough to be noticeable, to cause a programmer to sit around waiting for those lines to complete. You roll up your sleeves, grit your teeth and dive in.

The second point however is more delicate. First, I'll explain what it means, it is a set of operations (maybe a function) that are easily controllable, they do not depend on variable factors like network speeds or read/write speeds to the database. These operations depend primarily on the CPU. They also, when observed in isolation seem rather fast, maybe 500ms or even 1 sec (I am being very liberal with what is considered fast here). When repeated n number of times a day however, these operations compound and cause delays far higher than in certain cases even the slow bottlenecks. The dangerous thing about these slowdowns is that they depend on a number of things, firstly the programmers' threshold and propensity for what is "slow". This depends a lot on their experience and knowledge of the underlying operations of their language. While both metrics are subjective, the definition of slow is not the same between 2 programmers of varying degrees of expertise. Secondly, their impact depends on the frequency of these operations. Running an operation even 10 times a day is fine, but what about 100 or a 1000.

The Problems of Compounding delays

I'll illustrate this issue further using an example. Take a small function that parses a result, lets call this normal_op_parse(), this function takes 5 seconds for a list of size 10,000. Now, this is a contrived example, it really does not make sense for a parsing function to take that long (in everyday cases). 5 seconds is still a pretty slow operation for a list of that small of a size. Since I did mention that it takes 5 seconds you probably did not think much of it, but the problem creates hidden catastrophes depending on 2 factors:

  • The size of the input list
  • How frequently the function runs

We don't know how this function would scale with the size of the list, and while we can go with time complexity, it isn't often that helpful if you are trying to get exact measurements (something I will not get into here).

The real issue lies in how frequently the function runs, so say this function runs for 10 cron jobs, and these 10 cron jobs are running every day for every hour. Well, you can then guess where I am going with this, but in case you haven't

10 (cron jobs) * 24 (run every hour) * 5 (time taken in seconds)
1200 seconds or around 20 minutes        

I did use a made up function, but these numbers and this scenario is based on experience. To go a step further, you can also assume that the more time an operation takes, the more other jobs in the pipeline are just idling and waiting for an operation to complete.

In the most common case, you quite often see programmers and teams scaling up their machines, incurring unnecessary expenses and sometimes even throwing in threading and multiprocessing libraries to over complicate a simple operation, and these are not that easy or simple to add even if we forget how much they impact the maintainability and test-ability of the program.

In the worst case they build up these very complicated architectures that more often than not, are doomed to fail and often take months to setup, all under an illusion of productivity. In most cases, none of that is necessary, something I shall try to show in future articles. That's not to say there isn't any use for that, there always is, it's just way ... way further along the road than you think.

For now, let's proceed to how you go about determining any of this.

Profiling

Judging how long a sequence of print statements takes is a pretty bad idea, or how long it takes to step from one function to the next whilst debugging. It is however, a start and is quite often how you first become aware of certain issues in your code base. This is when you would take out a profiler. You need to measure exactly what is slow, and how long it takes. Ideally, profiling is a practice one would adopt rather healthily whenever they are working on a section of a code base.

Luckily with python, all of this is fairly easy. What I mean is that you can just throw in a few start and end markers in your code and get a detailed and structured list of the slowest operations in those markers, how long they take, and how many times they are called. Now, when I say slowest operations, I mean that you get each individual operation inside even the most nested of functions. I hope you can see why this is so powerful (and cool).

Examples

To illustrate this, I've setup a basic test program which is just a list and to keep things simple I have added an arbitrary delay to slow things down. It's not a good example by any means but it should get the point across.

This is the piece of code we shall be profiling

list_size = 100  
list_data = []  
for i in range(0, list_size):  
    num = 0  
    if (i % 2) == 0:  
        num = add_operation(i)            # defined somewhere
    else:  
        num = subtract_operation(i)       # defined somewhere
  
    list_data.append(num)
        

Let's assume we know that something is causing delays in this example. The simplest thing we can do here to get an idea of how long this operation takes is to setup some markers for start and end times. Let's do that and run this program.

# profiling: start marker
start_time = datetime.now()  
print("======= Log Start ======= ")  
print(f"Start time: {start_time}")  
  
list_size = 100  
list_data = []  
for i in range(0, list_size):  
    num = 0  
    if (i % 2) == 0:  
        num = add_operation(i)  
    else:  
        num = subtract_operation(i)  
  
    list_data.append(num)  

# profiling: end marker
end_time = datetime.now()  
print(f"End time: {end_time}")  
print(f"Duration: {end_time - start_time}")  
print("======= Log End ======= ")
        

Running this, we get this output

======= Log Start ======= 
Start time: 2024-05-12 17:14:25.631500
End time: 2024-05-12 17:14:30.653710
Duration: 0:00:05.022210
======= Log End ======= 
        

Now, we have a log, and we also have a time duration so we have some idea of how long it's taking. Python has a few builtin functions to make this logging process better. I'm going to improve upon this logging and introduce what I think to be a pretty cool feature.

I'll use?cProfile, a built in logging library in Python alongside?pStats?a library allowing us to parse cProfile logs and display them (amongst a variety of other options). Let's replace the markers in the previous code with something different, a simple class for maintaining a timer. We are doing this so we can call a?timer.start()?and?timer.end()?function every time we want to measure a piece of code. This can be done without creating this class. The benefit is that having done this you create a reusable piece of code that can be added into your own codebase.

class CustomTimer:  
  
    def __init__(self):
	    # create the profiler
        self.pr = cProfile.Profile()  
  
    def start(self):  
        # start the profiler  
        self.pr.enable()  
  
    def end(self):  
        # stop the profiler  
        self.pr.disable()  
  
        # define the profiler results sorting criteria:  
        # sort by the time spend in this and all sub functions from invocation
        # till program exit        
        sortby = SortKey.CUMULATIVE  
  
        # create a pStats object from the profiler instance
        # using the sorting criteria        
        ps = pstats.Stats(self.pr).sort_stats(sortby)  
  
        # and print the 10 slowest operations to the console
        ps.print_stats(10)
        

You can (and should) read through the official python documentation on this, get started?here

Now that we have created this we can replace our main program, and run it.

timer = CustomTimer()  

list_size = 100  
list_data = []  

# timer: start marker
timer.start()

for i in range(0, list_size):  
    num = 0  
    if (i % 2) == 0:  
        num = add_operation(i)  
    else:  
        num = subtract_operation(i)  
  
    list_data.append(num)  

# timer: start marker
timer.end()
        

Running this, you'll be pleasantly surprised with a detailed output of exactly what we need.

The first line returns the total execution time, and the table can be read as follows:

  • ncalls: the number of calls made for each listed operation
  • tottime: the total time taken only in the function (excluding any subfunctions)
  • percall: the time taken for each call of a function
  • cumtime: the total time taken inside a function (including all subfunctions)

You need only consider?cumtime?and?ncalls?in case you get overwhelmed by this info.

From here, you'll see that all of the slowdown is coming from the?subtract_operation, and that too from the?time.sleep?function. You can also see how many calls of a function were made, and see how much time each call took.

There is a reason I did not show the subtract_operation? and? add_operation? functions. The point is that quite often, you'll be better served just throwing in logging markers and seeing results objectively rather than going through your code and trying to identify and judge each line. This becomes especially crucial when the program is large and complicated, at that point it's highly likely that you feel overwhelmed and don't really have enough time to manually go through everything. This is also why I find this to be so exciting, it's a relatively painless, structured log of each and every function call inside your code. This can at times become difficult to read due to it's granularity, but that is something you will eventually learn to deal with.

Conclusion

Now that you know about profiling, you also know why it is so important and how you can very easily extract a lot of information from even the most obscure codebase. In the next series of articles I shall be planning to go through different kinds of optimization scenarios I have come across but none of these will have anything to do with the threading and multiprocessing libraries. I have found that quite often you can find creative but very simple ways of dealing with the majority of issues.

To leave you with a taste of what follows, it all depends on 2 factors:

  • your understanding (or lack) of how your language works
  • the data your program is working with

要查看或添加评论,请登录

Talha Aamir的更多文章

社区洞察

其他会员也浏览了