________________________________________________________________________ | |
PYBENCH - A Python Benchmark Suite | |
________________________________________________________________________ | |
Extendable suite of of low-level benchmarks for measuring | |
the performance of the Python implementation | |
(interpreter, compiler or VM). | |
pybench is a collection of tests that provides a standardized way to | |
measure the performance of Python implementations. It takes a very | |
close look at different aspects of Python programs and let's you | |
decide which factors are more important to you than others, rather | |
than wrapping everything up in one number, like the other performance | |
tests do (e.g. pystone which is included in the Python Standard | |
Library). | |
pybench has been used in the past by several Python developers to | |
track down performance bottlenecks or to demonstrate the impact of | |
optimizations and new features in Python. | |
The command line interface for pybench is the file pybench.py. Run | |
this script with option '--help' to get a listing of the possible | |
options. Without options, pybench will simply execute the benchmark | |
and then print out a report to stdout. | |
Micro-Manual | |
------------ | |
Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run | |
the benchmark suite using default settings and 'pybench.py -f <file>' | |
to have it store the results in a file too. | |
It is usually a good idea to run pybench.py multiple times to see | |
whether the environment, timers and benchmark run-times are suitable | |
for doing benchmark tests. | |
You can use the comparison feature of pybench.py ('pybench.py -c | |
<file>') to check how well the system behaves in comparison to a | |
reference run. | |
If the differences are well below 10% for each test, then you have a | |
system that is good for doing benchmark testings. Of you get random | |
differences of more than 10% or significant differences between the | |
values for minimum and average time, then you likely have some | |
background processes running which cause the readings to become | |
inconsistent. Examples include: web-browsers, email clients, RSS | |
readers, music players, backup programs, etc. | |
If you are only interested in a few tests of the whole suite, you can | |
use the filtering option, e.g. 'pybench.py -t string' will only | |
run/show the tests that have 'string' in their name. | |
This is the current output of pybench.py --help: | |
""" | |
------------------------------------------------------------------------ | |
PYBENCH - a benchmark test suite for Python interpreters/compilers. | |
------------------------------------------------------------------------ | |
Synopsis: | |
pybench.py [option] files... | |
Options and default settings: | |
-n arg number of rounds (10) | |
-f arg save benchmark to file arg () | |
-c arg compare benchmark with the one in file arg () | |
-s arg show benchmark in file arg, then exit () | |
-w arg set warp factor to arg (10) | |
-t arg run only tests with names matching arg () | |
-C arg set the number of calibration runs to arg (20) | |
-d hide noise in comparisons (0) | |
-v verbose output (not recommended) (0) | |
--with-gc enable garbage collection (0) | |
--with-syscheck use default sys check interval (0) | |
--timer arg use given timer (time.time) | |
-h show this help text | |
--help show this help text | |
--debug enable debugging | |
--copyright show copyright | |
--examples show examples of usage | |
Version: | |
2.0 | |
The normal operation is to run the suite and display the | |
results. Use -f to save them for later reuse or comparisons. | |
Available timers: | |
time.time | |
time.clock | |
systimes.processtime | |
Examples: | |
python2.1 pybench.py -f p21.pybench | |
python2.5 pybench.py -f p25.pybench | |
python pybench.py -s p25.pybench -c p21.pybench | |
""" | |
License | |
------- | |
See LICENSE file. | |
Sample output | |
------------- | |
""" | |
------------------------------------------------------------------------------- | |
PYBENCH 2.0 | |
------------------------------------------------------------------------------- | |
* using Python 2.4.2 | |
* disabled garbage collection | |
* system check interval set to maximum: 2147483647 | |
* using timer: time.time | |
Calibrating tests. Please wait... | |
Running 10 round(s) of the suite at warp factor 10: | |
* Round 1 done in 6.388 seconds. | |
* Round 2 done in 6.485 seconds. | |
* Round 3 done in 6.786 seconds. | |
... | |
* Round 10 done in 6.546 seconds. | |
------------------------------------------------------------------------------- | |
Benchmark: 2006-06-12 12:09:25 | |
------------------------------------------------------------------------------- | |
Rounds: 10 | |
Warp: 10 | |
Timer: time.time | |
Machine Details: | |
Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64 | |
Processor: x86_64 | |
Python: | |
Executable: /usr/local/bin/python | |
Version: 2.4.2 | |
Compiler: GCC 3.3.4 (pre 3.3.5 20040809) | |
Bits: 64bit | |
Build: Oct 1 2005 15:24:35 (#1) | |
Unicode: UCS2 | |
Test minimum average operation overhead | |
------------------------------------------------------------------------------- | |
BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms | |
BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms | |
CompareFloats: 109ms 110ms 0.09us 0.361ms | |
CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms | |
CompareIntegers: 137ms 138ms 0.08us 0.542ms | |
CompareInternedStrings: 124ms 127ms 0.08us 1.367ms | |
CompareLongs: 100ms 104ms 0.10us 0.316ms | |
CompareStrings: 111ms 115ms 0.12us 0.929ms | |
CompareUnicode: 108ms 128ms 0.17us 0.693ms | |
ConcatStrings: 142ms 155ms 0.31us 0.562ms | |
ConcatUnicode: 119ms 127ms 0.42us 0.384ms | |
CreateInstances: 123ms 128ms 1.14us 0.367ms | |
CreateNewInstances: 121ms 126ms 1.49us 0.335ms | |
CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms | |
CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms | |
DictCreation: 108ms 109ms 0.27us 0.361ms | |
DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms | |
DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms | |
DictWithStringKeys: 114ms 117ms 0.10us 0.905ms | |
ForLoops: 110ms 111ms 4.46us 0.063ms | |
IfThenElse: 118ms 119ms 0.09us 0.685ms | |
ListSlicing: 116ms 120ms 8.59us 0.103ms | |
NestedForLoops: 125ms 137ms 0.09us 0.019ms | |
NormalClassAttribute: 124ms 136ms 0.11us 0.457ms | |
NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms | |
PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms | |
PythonMethodCalls: 140ms 149ms 0.66us 0.141ms | |
Recursion: 156ms 166ms 3.32us 0.452ms | |
SecondImport: 112ms 118ms 1.18us 0.180ms | |
SecondPackageImport: 118ms 127ms 1.27us 0.180ms | |
SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms | |
SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms | |
SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms | |
SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms | |
SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms | |
SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms | |
SimpleListManipulation: 103ms 113ms 0.10us 0.587ms | |
SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms | |
SmallLists: 105ms 116ms 0.17us 0.366ms | |
SmallTuples: 108ms 128ms 0.24us 0.406ms | |
SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms | |
SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms | |
StringMappings: 115ms 121ms 0.48us 0.405ms | |
StringPredicates: 120ms 129ms 0.18us 2.064ms | |
StringSlicing: 111ms 127ms 0.23us 0.781ms | |
TryExcept: 125ms 126ms 0.06us 0.681ms | |
TryRaiseExcept: 133ms 137ms 2.14us 0.361ms | |
TupleSlicing: 117ms 120ms 0.46us 0.066ms | |
UnicodeMappings: 156ms 160ms 4.44us 0.429ms | |
UnicodePredicates: 117ms 121ms 0.22us 2.487ms | |
UnicodeProperties: 115ms 153ms 0.38us 2.070ms | |
UnicodeSlicing: 126ms 129ms 0.26us 0.689ms | |
------------------------------------------------------------------------------- | |
Totals: 6283ms 6673ms | |
""" | |
________________________________________________________________________ | |
Writing New Tests | |
________________________________________________________________________ | |
pybench tests are simple modules defining one or more pybench.Test | |
subclasses. | |
Writing a test essentially boils down to providing two methods: | |
.test() which runs .rounds number of .operations test operations each | |
and .calibrate() which does the same except that it doesn't actually | |
execute the operations. | |
Here's an example: | |
------------------ | |
from pybench import Test | |
class IntegerCounting(Test): | |
# Version number of the test as float (x.yy); this is important | |
# for comparisons of benchmark runs - tests with unequal version | |
# number will not get compared. | |
version = 1.0 | |
# The number of abstract operations done in each round of the | |
# test. An operation is the basic unit of what you want to | |
# measure. The benchmark will output the amount of run-time per | |
# operation. Note that in order to raise the measured timings | |
# significantly above noise level, it is often required to repeat | |
# sets of operations more than once per test round. The measured | |
# overhead per test round should be less than 1 second. | |
operations = 20 | |
# Number of rounds to execute per test run. This should be | |
# adjusted to a figure that results in a test run-time of between | |
# 1-2 seconds (at warp 1). | |
rounds = 100000 | |
def test(self): | |
""" Run the test. | |
The test needs to run self.rounds executing | |
self.operations number of operations each. | |
""" | |
# Init the test | |
a = 1 | |
# Run test rounds | |
# | |
# NOTE: Use xrange() for all test loops unless you want to face | |
# a 20MB process ! | |
# | |
for i in xrange(self.rounds): | |
# Repeat the operations per round to raise the run-time | |
# per operation significantly above the noise level of the | |
# for-loop overhead. | |
# Execute 20 operations (a += 1): | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
a += 1 | |
def calibrate(self): | |
""" Calibrate the test. | |
This method should execute everything that is needed to | |
setup and run the test - except for the actual operations | |
that you intend to measure. pybench uses this method to | |
measure the test implementation overhead. | |
""" | |
# Init the test | |
a = 1 | |
# Run test rounds (without actually doing any operation) | |
for i in xrange(self.rounds): | |
# Skip the actual execution of the operations, since we | |
# only want to measure the test's administration overhead. | |
pass | |
Registering a new test module | |
----------------------------- | |
To register a test module with pybench, the classes need to be | |
imported into the pybench.Setup module. pybench will then scan all the | |
symbols defined in that module for subclasses of pybench.Test and | |
automatically add them to the benchmark suite. | |
Breaking Comparability | |
---------------------- | |
If a change is made to any individual test that means it is no | |
longer strictly comparable with previous runs, the '.version' class | |
variable should be updated. Therefafter, comparisons with previous | |
versions of the test will list as "n/a" to reflect the change. | |
Version History | |
--------------- | |
2.0: rewrote parts of pybench which resulted in more repeatable | |
timings: | |
- made timer a parameter | |
- changed the platform default timer to use high-resolution | |
timers rather than process timers (which have a much lower | |
resolution) | |
- added option to select timer | |
- added process time timer (using systimes.py) | |
- changed to use min() as timing estimator (average | |
is still taken as well to provide an idea of the difference) | |
- garbage collection is turned off per default | |
- sys check interval is set to the highest possible value | |
- calibration is now a separate step and done using | |
a different strategy that allows measuring the test | |
overhead more accurately | |
- modified the tests to each give a run-time of between | |
100-200ms using warp 10 | |
- changed default warp factor to 10 (from 20) | |
- compared results with timeit.py and confirmed measurements | |
- bumped all test versions to 2.0 | |
- updated platform.py to the latest version | |
- changed the output format a bit to make it look | |
nicer | |
- refactored the APIs somewhat | |
1.3+: Steve Holden added the NewInstances test and the filtering | |
option during the NeedForSpeed sprint; this also triggered a long | |
discussion on how to improve benchmark timing and finally | |
resulted in the release of 2.0 | |
1.3: initial checkin into the Python SVN repository | |
Have fun, | |
-- | |
Marc-Andre Lemburg | |
mal@lemburg.com |