Profiling Python Code


Profiling is a method to determine how time is spent in a program. With this statistics, we will discover the “scorching spot” of a program and take into consideration methods of enchancment. Typically, scorching spot in surprising location might trace a bug in this system as nicely.

On this tutorial, we are going to see how we will use the profiling facility in Python. Particularly, you will notice

  • How we will evaluate small code fragments utilizing timeit module
  • How we will profile your complete program utilizing cProfile module
  • How we will invoke a profiler inside an current program
  • What the profiler can not do

Let’s get began.

Profiling Python Code. Picture by Prashant Saini. Some rights reserved.

Tutorial Overview

This tutorial is in 4 elements; they’re:

  • Profiling small fragments
  • The profile module
  • Utilizing profiler inside code
  • Caveats

Profiling small fragments

When you’re requested in regards to the other ways of doing the identical factor in Python, one perspective is to verify which one is extra environment friendly. In Python’s customary library, we’ve the timeit module that enables us to do some easy profiling.

For instance, to concatenate many quick strings, we will use the be part of() perform from strings or use the + operator. So how we all know which is quicker? Think about the next Python code:

This may produce an extended string 012345.... within the variabe longstr. Various approach to write that is:

To check the 2, we will do the next on the command line:

These two instructions will produce the next output:

The above instructions is to load the timeit module and move on a single line of code for measurement. Within the first case, we’ve two strains of statements and they’re handed on to the timeit module as two separate arguments. In the identical rationale, the primary command can be introduced as three strains of statements (by breaking the for loop into two strains), however the indentation of every line must be quoted appropriately:

The output of timeit is to seek out the very best efficiency amongst a number of runs (default to be 5). Every run is to run the supplied statements a number of occasions (which is dynamically decided). The time is reported as the typical to execute the statements as soon as in the very best run.

Whereas it’s true that the be part of perform is quicker than the + operator for string concatenation, the timing above will not be a good comparability. It’s as a result of we use str(x) to make quick strings on the fly throughout the loop. The higher approach to do are the next:

which produces:

The -s possibility permits us to offer the “setup” code, which is executed earlier than the profiling and never timed. Within the above, we create the record of quick strings earlier than we begin the loop. Therefore the time to create these strings are usually not measured within the “per loop” timing. From the above, we see that the be part of() perform is 2 orders of magnitude quicker than the + operator. The extra typically use of the -s possibility is to import the libraries. For instance, we will evaluate the sq. root perform from Python’s math module, from numpy, and utilizing the exponential operator ** as follows:

The above produces the next measurement, which we see that math.sqrt() is quickest whereas numpy.sqrt() is slowest on this specific instance:

In the event you surprise why numpy is slowest, it’s as a result of numpy is optimized for arrays. You will note its distinctive velocity within the following various:

the place the result’s:

In the event you desire, you can even run timeit in Python code. For instance, the next will likely be much like the above, however provide the uncooked complete timing for every run:

Within the above, every run is to execute the assertion 10000 occasions; the result’s as follows, which you’ll see the results of roughly 98 usec per loop in the very best run:

The profile module

Give attention to a press release or two for efficiency is from a microscopic perspective. Likelihood is, we’ve an extended program and need to see what’s inflicting it to run gradual. That occurs earlier than we will take into account various statements or algorithms.

A program operating gradual can typically attributable to two causes: A component is operating gradual, or a component is operating too many occasions and that added as much as take an excessive amount of time. We name these “efficiency hogs” the new spot. Let’s have a look at an instance. Think about the next program that makes use of hill climbing algorithm to seek out hyperparameters for a perceptron mannequin:

Assume we saved this program within the file, we will run the profiler within the command line as follows:

and the output would be the following:

The conventional output of this system will likely be printed first, after which the profiler’s statistics will likely be printed. From the primary row, we see that the perform goal() in our program has run for 101 occasions that took a complete of 4.89 seconds. However this 4.89 seconds are principally spent on the capabilities it referred to as, which the overall time that spent at that perform is merely 0.001 second. The capabilities from dependent modules are additionally profiled. Therefore you see quite a lot of numpy capabilities above too.

The above output is lengthy and is probably not helpful to you as it may be troublesome to inform which perform is the new spot. Certainly we will kind the above output. For instance, to see which perform known as essentially the most variety of occasions, we will kind by ncalls:

Its output is as follows, which says the get() perform from a Python dict is essentially the most used perform (nevertheless it solely consumed 0.03 seconds in complete out of the 5.6 seconds to complete this system):

The opposite kind choices are as follows:

Kind string Which means
calls Name depend
cumulative Cumulative time
cumtime Cumulative time
file File identify
filename File identify
module File identify
ncalls Name depend
pcalls Primitive name depend
line Line quantity
identify Perform identify
nfl Title/file/line
stdname Customary identify
time Inside time
tottime Inside time

If this system takes a while to complete, it isn’t cheap to run this system many occasions simply to seek out the profiling end in a special kind order. Certainly, we will save the profiler’s statistics for additional processing, as follows:

Just like above, it’ll run this system. However this won’t print the statistics to the display screen however to reserve it right into a file. Afterwards, we will use the pstats module like following to open up the statistics file and supply us a immediate to govern the information:

For instance, we will use kind command to alter the kind order and use stats to print what we noticed above:

You’ll discover that the stats command above permits us to offer an additional argument. The argument generally is a common expression to seek for the capabilities such that solely these matched will likely be printed. Therefore it’s a manner to offer a search string to filter.

This pstats browser permits us to see extra than simply the desk above. The callers and callees instructions reveals us which perform calls which perform and what number of occasions it’s referred to as, and the way a lot time it spent. Therefore we will take into account that as a breakdown of the perform degree statistics. It’s helpful when you have quite a lot of capabilities that calls one another and needed to know the way the time spent in numerous situations. For instance, this reveals that the goal() perform known as solely by the hillclimbing() perform however the hillclimbing() perform calls a number of different capabilities:

Utilizing profiler inside code

The above instance assumes you will have the entire program saved in a file and profile your complete program. Typically, we give attention to solely part of your complete program. For instance, if we load a big module, it takes time to bootstrap and we need to ignore this from profiling. On this case, we will invoke the profiler just for sure strains. An instance is as follows, which modified from this system above:

it’ll output the next:


Utilizing profiler with Tensorflow fashions might not produce what you’ll count on, particularly when you have written your personal customized layer or customized perform for the mannequin. In the event you did it appropriately, Tenorflow supposed to construct the computation graph earlier than your mannequin is executed and therefore the logic will likely be modified. The profiler output will due to this fact not exhibiting your customized lessons.

Equally for some superior modules that contain binary code. The profiler can see you referred to as some capabilities and marked it as “built-in” strategies nevertheless it can not go any additional into the compiled code.

Beneath is a brief code of LeNet5 mannequin for the MNIST classification drawback. In the event you attempt to profile it and print the highest 15 rows, you will notice {that a} wrapper is occupying majority of the time and nothing might be proven past that:

Within the outcome under, the TFE_Py_Execute is marked as “built-in” methodology and it consumes 30.1 sec out of the overall run time of 39.6 sec. Be aware that the tottime is identical because the cumtime that means from profiler’s perspective, it appears all time are spent at this perform and it doesn’t name some other capabilities. This illustrates the limitation of Python’s profiler.

Lastly, Python’s profiler offers you solely the statistics on time however not reminiscence utilization. Chances are you’ll have to search for one other library or instruments for this objective.

Additional Readings

The usual library modules timeit, cProfile, pstats have their documentation in Python’s documentation:

The usual library’s profiler could be very highly effective however not the one one. In order for you one thing extra visible, you may check out the Python Name Graph module. It may possibly produce an image of how capabilities calling one another utilizing the GraphViz instrument:

The limitation of not capable of dig into the compiled code might be solved by not utilizing the Python’s profiler however as an alternative, use one for compiled packages. My favourite is Valgrind:

however to make use of it, you could have to recompile your Python interpreter to activate debugging assist.


On this tutorial, we discovered what’s a profiler and what it could actually do. Particularly,

  • We all know find out how to evaluate small code with timeit module
  • We see Python’s cProfile module can present us detailed statistics on how time is spent
  • We discovered to make use of the pstats module towards the output of cProfile to kind or filter