Profiling is a method to determine how time is spent in a program. With this statistics, we will discover the “scorching spot” of a program and take into consideration methods of enchancment. Typically, scorching spot in surprising location might trace a bug in this system as nicely.
On this tutorial, we are going to see how we will use the profiling facility in Python. Particularly, you will notice
- How we will evaluate small code fragments utilizing
timeit
module - How we will profile your complete program utilizing
cProfile
module - How we will invoke a profiler inside an current program
- What the profiler can not do
Let’s get began.

Profiling Python Code. Picture by Prashant Saini. Some rights reserved.
Tutorial Overview
This tutorial is in 4 elements; they’re:
- Profiling small fragments
- The profile module
- Utilizing profiler inside code
- Caveats
Profiling small fragments
When you’re requested in regards to the other ways of doing the identical factor in Python, one perspective is to verify which one is extra environment friendly. In Python’s customary library, we’ve the timeit
module that enables us to do some easy profiling.
For instance, to concatenate many quick strings, we will use the be part of()
perform from strings or use the +
operator. So how we all know which is quicker? Think about the next Python code:
longstr = “” for x in vary(1000): longstr += str(x) |
This may produce an extended string 012345....
within the variabe longstr
. Various approach to write that is:
longstr = “”.be part of([str(x) for x in range(1000)]) |
To check the 2, we will do the next on the command line:
python -m timeit ‘longstr=””‘ ‘for x in vary(1000): longstr += str(x)’ python -m timeit ‘””.be part of([str(x) for x in range(1000)])’ |
These two instructions will produce the next output:
1000 loops, better of 5: 265 usec per loop 2000 loops, better of 5: 160 usec per loop |
The above instructions is to load the timeit
module and move on a single line of code for measurement. Within the first case, we’ve two strains of statements and they’re handed on to the timeit
module as two separate arguments. In the identical rationale, the primary command can be introduced as three strains of statements (by breaking the for loop into two strains), however the indentation of every line must be quoted appropriately:
python -m timeit ‘longstr=””‘ ‘for x in vary(1000):’ ‘ longstr += str(x)’ |
The output of timeit
is to seek out the very best efficiency amongst a number of runs (default to be 5). Every run is to run the supplied statements a number of occasions (which is dynamically decided). The time is reported as the typical to execute the statements as soon as in the very best run.
Whereas it’s true that the be part of perform is quicker than the +
operator for string concatenation, the timing above will not be a good comparability. It’s as a result of we use str(x)
to make quick strings on the fly throughout the loop. The higher approach to do are the next:
python -m timeit -s ‘strings = [str(x) for x in range(1000)]’ ‘longstr=””‘ ‘for x in strings:’ ‘ longstr += str(x)’ python -m timeit -s ‘strings = [str(x) for x in range(1000)]’ ‘””.be part of(strings)’ |
which produces:
2000 loops, better of 5: 173 usec per loop 50000 loops, better of 5: 6.91 usec per loop |
The -s
possibility permits us to offer the “setup” code, which is executed earlier than the profiling and never timed. Within the above, we create the record of quick strings earlier than we begin the loop. Therefore the time to create these strings are usually not measured within the “per loop” timing. From the above, we see that the be part of()
perform is 2 orders of magnitude quicker than the +
operator. The extra typically use of the -s
possibility is to import the libraries. For instance, we will evaluate the sq. root perform from Python’s math module, from numpy, and utilizing the exponential operator **
as follows:
python -m timeit ‘[x**0.5 for x in range(1000)]’ python -m timeit -s ‘from math import sqrt’ ‘[sqrt(x) for x in range(1000)]’ python -m timeit -s ‘from numpy import sqrt’ ‘[sqrt(x) for x in range(1000)]’ |
The above produces the next measurement, which we see that math.sqrt()
is quickest whereas numpy.sqrt()
is slowest on this specific instance:
5000 loops, better of 5: 93.2 usec per loop 5000 loops, better of 5: 72.3 usec per loop 200 loops, better of 5: 974 usec per loop |
In the event you surprise why numpy is slowest, it’s as a result of numpy is optimized for arrays. You will note its distinctive velocity within the following various:
python -m timeit -s ‘import numpy as np; x=np.array(vary(1000))’ ‘np.sqrt(x)’ |
the place the result’s:
100000 loops, better of 5: 2.08 usec per loop |
In the event you desire, you can even run timeit
in Python code. For instance, the next will likely be much like the above, however provide the uncooked complete timing for every run:
import timeit measurements = timeit.repeat(‘[x**0.5 for x in range(1000)]’, quantity=10000) print(measurements) |
Within the above, every run is to execute the assertion 10000 occasions; the result’s as follows, which you’ll see the results of roughly 98 usec per loop in the very best run:
[1.0888952040000106, 0.9799715450000122, 1.0921516899999801, 1.0946189250000202, 1.2792069260000005] |
The profile module
Give attention to a press release or two for efficiency is from a microscopic perspective. Likelihood is, we’ve an extended program and need to see what’s inflicting it to run gradual. That occurs earlier than we will take into account various statements or algorithms.
A program operating gradual can typically attributable to two causes: A component is operating gradual, or a component is operating too many occasions and that added as much as take an excessive amount of time. We name these “efficiency hogs” the new spot. Let’s have a look at an instance. Think about the next program that makes use of hill climbing algorithm to seek out hyperparameters for a perceptron mannequin:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# manually search perceptron hyperparameters for binary classification from numpy import imply from numpy.random import randn from numpy.random import rand from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron
# goal perform def goal(X, y, cfg): # unpack config eta, alpha = cfg # outline mannequin mannequin = Perceptron(penalty=‘elasticnet’, alpha=alpha, eta0=eta) # outline analysis process cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # consider mannequin scores = cross_val_score(mannequin, X, y, scoring=‘accuracy’, cv=cv, n_jobs=–1) # calculate imply accuracy outcome = imply(scores) return outcome
# take a step within the search area def step(cfg, step_size): # unpack the configuration eta, alpha = cfg # step eta new_eta = eta + randn() * step_measurement # verify the bounds of eta if new_eta <= 0.0: new_eta = 1e–8 if new_eta > 1.0: new_eta = 1.0 # step alpha new_alpha = alpha + randn() * step_measurement # verify the bounds of alpha if new_alpha < 0.0: new_alpha = 0.0 # return the brand new configuration return [new_eta, new_alpha]
# hill climbing native search algorithm def hillclimbing(X, y, goal, n_iter, step_size): # place to begin for the search answer = [rand(), rand()] # consider the preliminary level solution_eval = goal(X, y, answer) # run the hill climb for i in vary(n_iter): # take a step candidate = step(answer, step_size) # consider candidate level candidate_eval = goal(X, y, candidate) # verify if we should always maintain the brand new level if candidate_eval >= solution_eval: # retailer the brand new level answer, solution_eval = candidate, candidate_eval # report progress print(‘>%d, cfg=%s %.5f’ % (i, answer, solution_eval)) return [solution, solution_eval]
# outline dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # outline the overall iterations n_iter = 100 # step measurement within the search area step_size = 0.1 # carry out the hill climbing search cfg, rating = hillclimbing(X, y, goal, n_iter, step_size) print(‘Achieved!’) print(‘cfg=%s: Imply Accuracy: %f’ % (cfg, rating)) |
Assume we saved this program within the file hillclimb.py
, we will run the profiler within the command line as follows:
python -m cProfile hillclimb.py |
and the output would be the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
>10, cfg=[0.3792455490265847, 0.21589566352848377] 0.78400 >17, cfg=[0.49105438202347707, 0.1342150084854657] 0.79833 >26, cfg=[0.5737524712834843, 0.016749795596210315] 0.80033 >47, cfg=[0.5067828976025809, 0.05280380038497864] 0.80133 >48, cfg=[0.5427345321546029, 0.0049895870979695875] 0.81167 Achieved! cfg=[0.5427345321546029, 0.0049895870979695875]: Imply Accuracy: 0.811667 2686451 perform calls (2638255 primitive calls) in 5.500 seconds
Ordered by: customary identify
ncalls tottime percall cumtime percall filename:lineno(perform) 101 0.001 0.000 4.892 0.048 hillclimb.py:11(goal) 1 0.000 0.000 5.501 5.501 hillclimb.py:2(<module>) 100 0.000 0.000 0.001 0.000 hillclimb.py:25(step) 1 0.001 0.001 4.894 4.894 hillclimb.py:44(hillclimbing) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(<module>) 303 0.000 0.000 0.008 0.000 <__array_function__ internals>:2(all) 303 0.000 0.000 0.005 0.000 <__array_function__ internals>:2(amin) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(any) 4 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(atleast_1d) 3333 0.003 0.000 0.018 0.000 <__array_function__ internals>:2(bincount) 103 0.000 0.000 0.001 0.000 <__array_function__ internals>:2(concatenate) 3 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(copyto) 606 0.001 0.000 0.010 0.000 <__array_function__ internals>:2(cumsum) 6 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(dot) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(empty_like) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(inv) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(linspace) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(lstsq) 101 0.000 0.000 0.005 0.000 <__array_function__ internals>:2(imply) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(ndim) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(outer) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(polyfit) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(polyval) 1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(prod) 303 0.000 0.000 0.002 0.000 <__array_function__ internals>:2(ravel) 2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(result_type) 303 0.001 0.000 0.001 0.000 <__array_function__ internals>:2(form) 303 0.000 0.000 0.035 0.000 <__array_function__ internals>:2(kind) 4 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(trim_zeros) 1617 0.002 0.000 0.112 0.000 <__array_function__ internals>:2(distinctive) … |
The conventional output of this system will likely be printed first, after which the profiler’s statistics will likely be printed. From the primary row, we see that the perform goal()
in our program has run for 101 occasions that took a complete of 4.89 seconds. However this 4.89 seconds are principally spent on the capabilities it referred to as, which the overall time that spent at that perform is merely 0.001 second. The capabilities from dependent modules are additionally profiled. Therefore you see quite a lot of numpy capabilities above too.
The above output is lengthy and is probably not helpful to you as it may be troublesome to inform which perform is the new spot. Certainly we will kind the above output. For instance, to see which perform known as essentially the most variety of occasions, we will kind by ncalls
:
python -m cProfile -s ncalls hillclimb.py |
Its output is as follows, which says the get()
perform from a Python dict is essentially the most used perform (nevertheless it solely consumed 0.03 seconds in complete out of the 5.6 seconds to complete this system):
2685349 perform calls (2637153 primitive calls) in 5.609 seconds
Ordered by: name depend
ncalls tottime percall cumtime percall filename:lineno(perform) 247588 0.029 0.000 0.029 0.000 {methodology ‘get’ of ‘dict’ objects} 246196 0.028 0.000 0.028 0.000 examine.py:2548(identify) 168057 0.018 0.000 0.018 0.000 {methodology ‘append’ of ‘record’ objects} 161738 0.018 0.000 0.018 0.000 examine.py:2560(sort) 144431 0.021 0.000 0.029 0.000 {built-in methodology builtins.isinstance} 142213 0.030 0.000 0.031 0.000 {built-in methodology builtins.getattr} … |
The opposite kind choices are as follows:
Kind string | Which means |
---|---|
calls | Name depend |
cumulative | Cumulative time |
cumtime | Cumulative time |
file | File identify |
filename | File identify |
module | File identify |
ncalls | Name depend |
pcalls | Primitive name depend |
line | Line quantity |
identify | Perform identify |
nfl | Title/file/line |
stdname | Customary identify |
time | Inside time |
tottime | Inside time |
If this system takes a while to complete, it isn’t cheap to run this system many occasions simply to seek out the profiling end in a special kind order. Certainly, we will save the profiler’s statistics for additional processing, as follows:
python -m cProfile -o hillclimb.stats hillclimb.py |
Just like above, it’ll run this system. However this won’t print the statistics to the display screen however to reserve it right into a file. Afterwards, we will use the pstats
module like following to open up the statistics file and supply us a immediate to govern the information:
python -m pstats hillclimb.stats |
For instance, we will use kind command to alter the kind order and use stats to print what we noticed above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Welcome to the profile statistics browser. hillclimb.stat% assist
Documented instructions (sort assist <subject>): ======================================== EOF add callees callers assist give up learn reverse kind stats strip
hillclimb.stat% kind ncall hillclimb.stat% stats hillclimb Thu Jan 13 16:44:10 2022 hillclimb.stat
2686227 perform calls (2638031 primitive calls) in 5.582 seconds
Ordered by: name depend Record diminished from 3456 to 4 attributable to restriction <‘hillclimb’>
ncalls tottime percall cumtime percall filename:lineno(perform) 101 0.001 0.000 4.951 0.049 hillclimb.py:11(goal) 100 0.000 0.000 0.001 0.000 hillclimb.py:25(step) 1 0.000 0.000 5.583 5.583 hillclimb.py:2(<module>) 1 0.000 0.000 4.952 4.952 hillclimb.py:44(hillclimbing)
hillclimb.stat% |
You’ll discover that the stats
command above permits us to offer an additional argument. The argument generally is a common expression to seek for the capabilities such that solely these matched will likely be printed. Therefore it’s a manner to offer a search string to filter.
This pstats
browser permits us to see extra than simply the desk above. The callers
and callees
instructions reveals us which perform calls which perform and what number of occasions it’s referred to as, and the way a lot time it spent. Therefore we will take into account that as a breakdown of the perform degree statistics. It’s helpful when you have quite a lot of capabilities that calls one another and needed to know the way the time spent in numerous situations. For instance, this reveals that the goal()
perform known as solely by the hillclimbing()
perform however the hillclimbing()
perform calls a number of different capabilities:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
hillclimb.stat% callers goal Ordered by: name depend Record diminished from 3456 to 1 attributable to restriction <‘goal’>
Perform was referred to as by… ncalls tottime cumtime hillclimb.py:11(goal) <- 101 0.001 4.951 hillclimb.py:44(hillclimbing)
hillclimb.stat% callees hillclimbing Ordered by: name depend Record diminished from 3456 to 1 attributable to restriction <‘hillclimbing’>
Perform referred to as… ncalls tottime cumtime hillclimb.py:44(hillclimbing) -> 101 0.001 4.951 hillclimb.py:11(goal) 100 0.000 0.001 hillclimb.py:25(step) 4 0.000 0.000 {built-in methodology builtins.print} 2 0.000 0.000 {methodology ‘rand’ of ‘numpy.random.mtrand.RandomState’ objects}
hillclimb.stat% |
Utilizing profiler inside code
The above instance assumes you will have the entire program saved in a file and profile your complete program. Typically, we give attention to solely part of your complete program. For instance, if we load a big module, it takes time to bootstrap and we need to ignore this from profiling. On this case, we will invoke the profiler just for sure strains. An instance is as follows, which modified from this system above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# manually search perceptron hyperparameters for binary classification import cProfile as profile import pstats from numpy import imply from numpy.random import randn from numpy.random import rand from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import Perceptron
# goal perform def goal(X, y, cfg): # unpack config eta, alpha = cfg # outline mannequin mannequin = Perceptron(penalty=’elasticnet’, alpha=alpha, eta0=eta) # outline analysis process cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # consider mannequin scores = cross_val_score(mannequin, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1) # calculate imply accuracy outcome = imply(scores) return outcome
# take a step within the search area def step(cfg, step_size): # unpack the configuration eta, alpha = cfg # step eta new_eta = eta + randn() * step_size # verify the bounds of eta if new_eta <= 0.0: new_eta = 1e-8 if new_eta > 1.0: new_eta = 1.0 # step alpha new_alpha = alpha + randn() * step_size # verify the bounds of alpha if new_alpha < 0.0: new_alpha = 0.0 # return the brand new configuration return [new_eta, new_alpha]
# hill climbing native search algorithm def hillclimbing(X, y, goal, n_iter, step_size): # place to begin for the search answer = [rand(), rand()] # consider the preliminary level solution_eval = goal(X, y, answer) # run the hill climb for i in vary(n_iter): # take a step candidate = step(answer, step_size) # consider candidate level candidate_eval = goal(X, y, candidate) # verify if we should always maintain the brand new level if candidate_eval >= solution_eval: # retailer the brand new level answer, solution_eval = candidate, candidate_eval # report progress print(‘>%d, cfg=%s %.5f’ % (i, answer, solution_eval)) return [solution, solution_eval]
# outline dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) # outline the overall iterations n_iter = 100 # step measurement within the search area step_size = 0.1 # carry out the hill climbing search with profiling prof = profile.Profile() prof.allow() cfg, rating = hillclimbing(X, y, goal, n_iter, step_size) prof.disable() # print program output print(‘Achieved!’) print(‘cfg=%s: Imply Accuracy: %f’ % (cfg, rating)) # print profiling output stats = pstats.Stats(prof).strip_dirs().sort_stats(“cumtime”) stats.print_stats(10) # prime 10 rows |
it’ll output the next:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
>0, cfg=[0.3776271076534661, 0.2308364063203663] 0.75700 >3, cfg=[0.35803234662466354, 0.03204434939660264] 0.77567 >8, cfg=[0.3001050823005957, 0.0] 0.78633 >10, cfg=[0.39518618870158934, 0.0] 0.78633 >12, cfg=[0.4291267905390187, 0.0] 0.78633 >13, cfg=[0.4403131521968569, 0.0] 0.78633 >16, cfg=[0.38865272555918756, 0.0] 0.78633 >17, cfg=[0.38871654921891885, 0.0] 0.78633 >18, cfg=[0.4542440671724224, 0.0] 0.78633 >19, cfg=[0.44899743344802734, 0.0] 0.78633 >20, cfg=[0.5855375509507891, 0.0] 0.78633 >21, cfg=[0.5935318064858227, 0.0] 0.78633 >23, cfg=[0.7606367310048543, 0.0] 0.78633 >24, cfg=[0.855444293727846, 0.0] 0.78633 >25, cfg=[0.9505501566826242, 0.0] 0.78633 >26, cfg=[1.0, 0.0244821888204496] 0.79800 Achieved! cfg=[1.0, 0.0244821888204496]: Imply Accuracy: 0.798000 2179559 perform calls (2140124 primitive calls) in 4.941 seconds
Ordered by: cumulative time Record diminished from 581 to 10 attributable to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(perform) 1 0.001 0.001 4.941 4.941 hillclimb.py:46(hillclimbing) 101 0.001 0.000 4.939 0.049 hillclimb.py:13(goal) 101 0.001 0.000 4.931 0.049 _validation.py:375(cross_val_score) 101 0.002 0.000 4.930 0.049 _validation.py:48(cross_validate) 101 0.005 0.000 4.903 0.049 parallel.py:960(__call__) 101 0.235 0.002 3.089 0.031 parallel.py:920(retrieve) 3030 0.004 0.000 2.849 0.001 _parallel_backends.py:537(wrap_future_result) 3030 0.020 0.000 2.845 0.001 _base.py:417(outcome) 2602 0.016 0.000 2.819 0.001 threading.py:280(wait) 12447 2.796 0.000 2.796 0.000 {methodology ‘purchase’ of ‘_thread.lock’ objects} |
Caveats
Utilizing profiler with Tensorflow fashions might not produce what you’ll count on, particularly when you have written your personal customized layer or customized perform for the mannequin. In the event you did it appropriately, Tenorflow supposed to construct the computation graph earlier than your mannequin is executed and therefore the logic will likely be modified. The profiler output will due to this fact not exhibiting your customized lessons.
Equally for some superior modules that contain binary code. The profiler can see you referred to as some capabilities and marked it as “built-in” strategies nevertheless it can not go any additional into the compiled code.
Beneath is a brief code of LeNet5 mannequin for the MNIST classification drawback. In the event you attempt to profile it and print the highest 15 rows, you will notice {that a} wrapper is occupying majority of the time and nothing might be proven past that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import numpy as np import tensorflow as tf from tensorflow.keras.datasets import mnist from tensorflow.keras.fashions import Sequential from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten from tensorflow.keras.utils import to_categorical from tensorflow.keras.callbacks import EarlyStopping
# Load and reshape information to form of (n_sample, top, width, n_channel) (X_train, y_train), (X_test, y_test) = mnist.load_data() X_train = np.expand_dims(X_train, axis=3).astype(‘float32’) X_test = np.expand_dims(X_test, axis=3).astype(‘float32’)
# One-hot encode the output y_train = to_categorical(y_train) y_test = to_categorical(y_test)
# LeNet5 mannequin mannequin = Sequential([ Conv2D(6, (5,5), input_shape=(28,28,1), padding=“same”, activation=“tanh”), AveragePooling2D((2,2), strides=2), Conv2D(16, (5,5), activation=“tanh”), AveragePooling2D((2,2), strides=2), Conv2D(120, (5,5), activation=“tanh”), Flatten(), Dense(84, activation=“tanh”), Dense(10, activation=“softmax”) ]) mannequin.abstract(line_length=100)
# Coaching mannequin.compile(loss=“categorical_crossentropy”, optimizer=“adam”, metrics=[“accuracy”]) earlystopping = EarlyStopping(monitor=“val_loss”, endurance=2, restore_best_weights=True) mannequin.match(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=32, callbacks=[earlystopping])
# Consider print(mannequin.consider(X_test, y_test, verbose=0)) |
Within the outcome under, the TFE_Py_Execute
is marked as “built-in” methodology and it consumes 30.1 sec out of the overall run time of 39.6 sec. Be aware that the tottime is identical because the cumtime that means from profiler’s perspective, it appears all time are spent at this perform and it doesn’t name some other capabilities. This illustrates the limitation of Python’s profiler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
5962698 perform calls (5728324 primitive calls) in 39.674 seconds
Ordered by: cumulative time Record diminished from 12295 to fifteen attributable to restriction <15>
ncalls tottime percall cumtime percall filename:lineno(perform) 3212/1 0.013 0.000 39.699 39.699 {built-in methodology builtins.exec} 1 0.003 0.003 39.699 39.699 mnist.py:4(<module>) 52/4 0.005 0.000 35.470 8.868 /usr/native/lib/python3.9/site-packages/keras/utils/traceback_utils.py:58(error_handler) 1 0.089 0.089 34.334 34.334 /usr/native/lib/python3.9/site-packages/keras/engine/coaching.py:901(match) 11075/9531 0.032 0.000 33.406 0.004 /usr/native/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler) 4689 0.089 0.000 33.017 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/def_function.py:882(__call__) 4689 0.023 0.000 32.771 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/def_function.py:929(_call) 4688 0.042 0.000 32.134 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/perform.py:3125(__call__) 4689 0.075 0.000 30.941 0.007 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/perform.py:1888(_call_flat) 4689 0.158 0.000 30.472 0.006 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/perform.py:553(name) 4689 0.034 0.000 30.152 0.006 /usr/native/lib/python3.9/site-packages/tensorflow/python/keen/execute.py:33(quick_execute) 4689 30.105 0.006 30.105 0.006 {built-in methodology tensorflow.python._pywrap_tfe.TFE_Py_Execute} 3185/24 0.021 0.000 3.902 0.163 <frozen importlib._bootstrap>:1002(_find_and_load) 3169/10 0.014 0.000 3.901 0.390 <frozen importlib._bootstrap>:967(_find_and_load_unlocked) 2885/12 0.009 0.000 3.901 0.325 <frozen importlib._bootstrap_external>:844(exec_module) |
Lastly, Python’s profiler offers you solely the statistics on time however not reminiscence utilization. Chances are you’ll have to search for one other library or instruments for this objective.
Additional Readings
The usual library modules timeit
, cProfile
, pstats
have their documentation in Python’s documentation:
The usual library’s profiler could be very highly effective however not the one one. In order for you one thing extra visible, you may check out the Python Name Graph module. It may possibly produce an image of how capabilities calling one another utilizing the GraphViz instrument:
The limitation of not capable of dig into the compiled code might be solved by not utilizing the Python’s profiler however as an alternative, use one for compiled packages. My favourite is Valgrind:
however to make use of it, you could have to recompile your Python interpreter to activate debugging assist.
Abstract
On this tutorial, we discovered what’s a profiler and what it could actually do. Particularly,
- We all know find out how to evaluate small code with
timeit
module - We see Python’s
cProfile
module can present us detailed statistics on how time is spent - We discovered to make use of the
pstats
module towards the output ofcProfile
to kind or filter