Preliminary wuffs bench -mimic
summarized throughput numbers for various codecs are below. Higher is better.
“Mimic” tests check that Wuffs' output mimics (i.e. exactly matches) other libraries' output. “Mimic” benchmarks give the numbers for those other libraries, as shipped with the OS. These were measured on a Debian Testing system as of November 2017, which means these compiler versions:
and these mimic library versions:
Unless otherwise stated, the numbers below were measured on an Intel x86_64 Broadwell, and were taken as of git commit “693af47 Rename some bcheckOptimizeXxx methods.”
The benchmark programs aim to be runnable “out of the box” without any configuration or installation. For example, to run the std/zlib
benchmarks:
git clone https://github.com/google/wuffs.git cd wuffs/test/c/std gcc -O3 zlib.c ./a.out -bench rm a.out
A comment near the top of that .c file says how to run the mimic benchmarks.
The output of those benchmark programs is compatible with the benchstat tool. For example, that tool can calculate confidence intervals based on multiple benchmark runs, or calculate p-values when comparing numbers before and after a code change. To install it, first install Go, then run go get golang.org/x/perf/cmd/benchstat
.
As mentioned above, individual benchmark programs can be run manually. However, the canonical way to run the benchmarks (across multiple compilers and multiple packages like GIF and PNG) is to use the wuffs
command line tool, as it will also re-generate (transpile) the C code whenever you edit the *.wuffs code. Running go install -v github.com/google/wuffs/cmd/...
will install the Wuffs tools. After that, you can say
wuffs bench
or
wuffs bench -mimic std/flate
or
wuffs bench -ccompilers=gcc -reps=3 -focus=Benchmarkwuffs_gif_lzw std/gif
CPU power management can inject noise in benchmark times. On a Linux system, power management can be controlled with:
# Query. cpupower --cpu all frequency-info --policy # Turn on. sudo cpupower frequency-set --governor powersave # Turn off. sudo cpupower frequency-set --governor performance
The 1k, 10k, etc. numbers are approximately how many bytes there in the decoded output.
name speed wuffs_adler32_10k/clang 2.42GB/s ± 3% wuffs_adler32_100k/clang 2.42GB/s ± 3% wuffs_flate_decode_1k/clang 133MB/s ± 3% wuffs_flate_decode_10k/clang 199MB/s ± 3% wuffs_flate_decode_100k/clang 229MB/s ± 4% wuffs_zlib_decode_10k/clang 185MB/s ± 1% wuffs_zlib_decode_100k/clang 210MB/s ± 3% wuffs_adler32_10k/gcc 3.22GB/s ± 1% wuffs_adler32_100k/gcc 3.23GB/s ± 0% wuffs_flate_decode_1k/gcc 152MB/s ± 1% wuffs_flate_decode_10k/gcc 250MB/s ± 1% wuffs_flate_decode_100k/gcc 298MB/s ± 1% wuffs_zlib_decode_10k/gcc 238MB/s ± 2% wuffs_zlib_decode_100k/gcc 270MB/s ± 2% mimic_adler32_10k 3.00GB/s ± 1% mimic_adler32_100k 2.91GB/s ± 2% mimic_flate_decode_1k 211MB/s ± 1% mimic_flate_decode_10k 270MB/s ± 2% mimic_flate_decode_100k 285MB/s ± 1% mimic_zlib_decode_10k 250MB/s ± 2% mimic_zlib_decode_100k 294MB/s ± 2%
The 1k, 10k, etc. numbers are approximately how many bytes of pixel data there are in the decoded image. For example, the test/data/harvesters.*
images are 1165 × 859 (approximately 1000k pixels) and a GIF image (a paletted image) is 1 byte per pixel.
name speed wuffs_gif_decode_1k/clang 346MB/s ± 1% wuffs_gif_decode_10k/clang 137MB/s ± 0% wuffs_gif_decode_100k/clang 118MB/s ± 0% wuffs_gif_decode_1000k/clang 120MB/s ± 0% wuffs_gif_decode_1k/gcc 399MB/s ± 1% wuffs_gif_decode_10k/gcc 141MB/s ± 0% wuffs_gif_decode_100k/gcc 128MB/s ± 0% wuffs_gif_decode_1000k/gcc 131MB/s ± 0% mimic_gif_decode_1k 147MB/s ± 0% mimic_gif_decode_10k 90.7MB/s ± 0% mimic_gif_decode_100k 95.4MB/s ± 0% mimic_gif_decode_1000k 97.8MB/s ± 0%
TODO: investigate why gcc 4.8 (Ubuntu Trusty) seems to generate faster code than gcc 7.2 (Debian Testing):
wuffs_gif_decode_1k/gcc 411MB/s ± 2% wuffs_gif_decode_10k/gcc 162MB/s ± 0% wuffs_gif_decode_100k/gcc 138MB/s ± 0% wuffs_gif_decode_1000k/gcc 141MB/s ± 0%
Updated on November 2017.