Profiling a Containerized Python Service with Pyflame
a sample flame graph

Profiling a Containerized Python Service with Pyflame

Profiling is one of the coolest techniques in software engineering arsenal that can be used to find CPU demanding tasks, bottlenecks and places in the program to focus optimization. I sometimes find some lines in my programs that do nothing important but consumes a lot of CPU power and occasionally I am able to improve performance just by changing the order of statements, using another data structure or even, eliminating some not necessary lines such as log messages. In one of those cases, I improved the performance of a program dramatically by disabling a flag in compile time that removes some unnecessary logs from the program that have smaller level than configured log level. This simple fact that I never even imagined, just revealed after one hour of profiling but saved my day.

Often profiling is done in two phases: gathering statistics and analyzing them. There are a lot of tools for different programming languages to gather profiling statistics from a program and for some languages there are official tools. Some of them are hard to use and need some instrumentation in code and on the other hand, some of them are really easy to use and you could run your executable with them and have the required information for analysis without touching the code. Analysis phase always is performed used some sort of visualization tools.

Default Profilers for Python

I often use standard profiler for Python services. They are easy to use because they are shipped with Python and completely compatible with it. Any experienced Python developer may heard about them or used them. You could find more information about them here and I recommend using them for any Python program even if you are satisfied about the performance. Just postpone the profiling to the end of development process to not be trapped with premature optimization.

The cProfile and profile tools generate profiling statistics but they are not good in visualization. They dump statistics into terminal and it is not straightforward to find bottlenecks using them. There are other tools that can help us visualize output of Python profiling tools that I found helpful. Personally I use this technique to visualize profiler outputs and it worked perfectly for me. It uses kcachegrind for visualization that I am familiar with after using Valgrind for C/C++ debugging and profiling for years.

Profiling Containerized Python Service

My problem with classic Python profilers began when I started containerizing Python services using Docker technologies. The problems were:

  1. How to generate profiling statistics inside a container?
  2. How to transfer them to my desktop for visualization and analysis?

Fist problem seemed simple at first glance because any container with installed Python have standard profilers ready to use. The only change that I had to make was changing the running command to start profiler instead of regular Python interpreter. I used something like this as running command of the container:

$ python -m cProfile -o /root/profiler/output -s time my_service.py

I found it a little hard to switch between regular execution and profiler execution and I had to maintain two separate docker-compose files to ease this switching.

The second problem seemed even easier because you can use volume mounting to accumulate profiling statistics. I mounted a volume inside the container that the profiler used to put output file and then I could use the file to profile the service. The volume mount was something like this:

volumes:
 - /path/to/profiler/output:/root/profiler

I succeeded in solving both problems after some trial and error but the solution was not satisfactory to me because it sometimes failed without any reason and it was hard to maintain. I found running profiler inside containers fragile.

Profiling with Pyflame

Recently I have found another way to profile a containerized Python service and fortunately, this time it has none of two mentioned problems. The new solution was so easy and nice that I like to share it. I found a profiler, named Pyflame, that can attach to a process inside a container. The profiler is written by Uber engineers and the source code is here. It gathers data from any process executing Python and the output is compatible with FlameGraph, another cool visualization for profile data. Pyflame is introduced by creator in this way:

Pyflame is a high performance profiling tool that generates flame graphs for Python. Pyflame is implemented in C++, and uses the Linux ptrace(2) system call to collect profiling information. It can take snapshots of the Python call stack without explicit instrumentation, meaning you can profile a program without modifying its source code. Pyflame is capable of profiling embedded Python interpreters like uWSGI. It fully supports profiling multi-threaded Python programs.
Pyflame usually introduces significantly less overhead than the builtin profile (or cProfile) modules, and emits richer profiling data. The profiling overhead is low enough that you can use it to profile live processes in production.

As you can see, Pyflame is exactly what I need. The profiling with Pyflame is easy. Just start your containerized service with no modification. Found the PID of the process running the Python service inside the container and attach Pyflame to it to gather data and pump data to FlameGraph for visualization. Here is detailed steps that I took:

  1. Install Pyflame (described here) and get FlameGraph from github. I did not installed FlameGraph and used it directly.

2. Run the containerized Python service. I use docker-compose to start my multi-service application, so it's something like this for me:

$ docker-compose up -d

3. Find the container Id of the Python service. Be careful if the application is multi-service:

$ docker ps

4. Find the PID (process Id) of the containerized Python service using container Id. Be careful and note CMD column to find right process. In my case, it is like this:

$ docker top <container_id>

5. Attach the Pyflame to the containerized Python service using PID and wait for completion. I combined this step with the next step in one command. The pyflame command can be customized with some command line arguments that are introduced later.

6. Create an SVG file with FlameGraph. I piped output of previous command directly into it:

$ sudo pyflame -p <PID> | ./flamegraph.pl > my_service_profile.svg

7. Open the SVG (I used Firefox) and see the flame graph. It's interactive and you can easily zoom in and out to see what parts of your service is CPU consuming. Above picture is a snapshot of the SVG. It is all!

Final Notes

The pyflame command has two useful arguments that I often use. You can change the sample time with -s, and the sampling frequency with -r. Both units are measured in seconds. The sample time is duration in which your service is inspected and the sampling frequency determines how often the snapshot is taken from your service. The default behavior is to sample for 1 second (equivalent to -s 1), taking a snapshot every millisecond (equivalent to -r 0.001). I used something like this to profile my service for 10 minutes with 1 second snapshots:

sudo pyflame -s 600 -r 1 -p <PID> | ./flamegraph.pl > my_service.svg

The sample time and sampling frequency should be determined based on your service and load. Use a big enough sample time to make sure that you capture all the behavior of your service and test the service with realistic load during profiling. The similarity between test load and actual load determines the quality of profiling. Use small enough sampling frequency to make sure that the effect of random behavior of your service is compensated by multiple sampling. Less frequent sampling is short sample time might cause a misleading profiling data.

My application changes its behavior after a while; so, I profiled it and stored the SVG files regularly to see how the run-time behavior of the service is changing over time. Remind that profiling is repetitive task and you have to do it again after any improvement to see the effect and to find new bottlenecks that were hidden behind the previous one.




要查看或添加评论,请登录

社区洞察

其他会员也浏览了