Benchmarks

Preliminary wuffs bench -mimic summarized throughput numbers for various codecs are below. Higher is better.

“Mimic” tests check that Wuffs' output mimics (i.e. exactly matches) other libraries' output. “Mimic” benchmarks give the numbers for those other libraries, as shipped with the OS. These were measured on a Debian Testing system as of November 2017, which means these compiler versions:

  • clang/llvm 5.0.0
  • gcc 7.2.0

and these mimic library versions:

  • libgif 5.1.4
  • zlib 1.2.8

Unless otherwise stated, the numbers below were measured on an Intel x86_64 Broadwell, and were taken as of git commit “693af47 Rename some bcheckOptimizeXxx methods.”

Reproducing

The benchmark programs aim to be runnable “out of the box” without any configuration or installation. For example, to run the std/zlib benchmarks:

git clone https://github.com/google/wuffs.git
cd wuffs/test/c/std
gcc -O3 zlib.c
./a.out -bench
rm a.out

A comment near the top of that .c file says how to run the mimic benchmarks.

The output of those benchmark programs is compatible with the benchstat tool. For example, that tool can calculate confidence intervals based on multiple benchmark runs, or calculate p-values when comparing numbers before and after a code change. To install it, first install Go, then run go get golang.org/x/perf/cmd/benchstat.

wuffs bench

As mentioned above, individual benchmark programs can be run manually. However, the canonical way to run the benchmarks (across multiple compilers and multiple packages like GIF and PNG) is to use the wuffs command line tool, as it will also re-generate (transpile) the C code whenever you edit the *.wuffs code. Running go install -v github.com/google/wuffs/cmd/... will install the Wuffs tools. After that, you can say

wuffs bench

or

wuffs bench -mimic std/flate

or

wuffs bench -ccompilers=gcc -reps=3 -focus=Benchmarkwuffs_gif_lzw std/gif

CPU Scaling

CPU power management can inject noise in benchmark times. On a Linux system, power management can be controlled with:

# Query.
cpupower --cpu all frequency-info --policy
# Turn on.
sudo cpupower frequency-set --governor powersave
# Turn off.
sudo cpupower frequency-set --governor performance

Deflate (including gzip and zlib)

The 1k, 10k, etc. numbers are approximately how many bytes there in the decoded output.

name                             speed

wuffs_adler32_10k/clang          2.42GB/s ± 3%
wuffs_adler32_100k/clang         2.42GB/s ± 3%
wuffs_flate_decode_1k/clang       133MB/s ± 3%
wuffs_flate_decode_10k/clang      199MB/s ± 3%
wuffs_flate_decode_100k/clang     229MB/s ± 4%
wuffs_zlib_decode_10k/clang       185MB/s ± 1%
wuffs_zlib_decode_100k/clang      210MB/s ± 3%

wuffs_adler32_10k/gcc            3.22GB/s ± 1%
wuffs_adler32_100k/gcc           3.23GB/s ± 0%
wuffs_flate_decode_1k/gcc         152MB/s ± 1%
wuffs_flate_decode_10k/gcc        250MB/s ± 1%
wuffs_flate_decode_100k/gcc       298MB/s ± 1%
wuffs_zlib_decode_10k/gcc         238MB/s ± 2%
wuffs_zlib_decode_100k/gcc        270MB/s ± 2%

mimic_adler32_10k                3.00GB/s ± 1%
mimic_adler32_100k               2.91GB/s ± 2%
mimic_flate_decode_1k             211MB/s ± 1%
mimic_flate_decode_10k            270MB/s ± 2%
mimic_flate_decode_100k           285MB/s ± 1%
mimic_zlib_decode_10k             250MB/s ± 2%
mimic_zlib_decode_100k            294MB/s ± 2%

GIF

The 1k, 10k, etc. numbers are approximately how many bytes of pixel data there are in the decoded image. For example, the test/data/harvesters.* images are 1165 × 859 (approximately 1000k pixels) and a GIF image (a paletted image) is 1 byte per pixel.

name                             speed

wuffs_gif_decode_1k/clang         346MB/s ± 1%
wuffs_gif_decode_10k/clang        137MB/s ± 0%
wuffs_gif_decode_100k/clang       118MB/s ± 0%
wuffs_gif_decode_1000k/clang      120MB/s ± 0%

wuffs_gif_decode_1k/gcc           399MB/s ± 1%
wuffs_gif_decode_10k/gcc          141MB/s ± 0%
wuffs_gif_decode_100k/gcc         128MB/s ± 0%
wuffs_gif_decode_1000k/gcc        131MB/s ± 0%

mimic_gif_decode_1k               147MB/s ± 0%
mimic_gif_decode_10k             90.7MB/s ± 0%
mimic_gif_decode_100k            95.4MB/s ± 0%
mimic_gif_decode_1000k           97.8MB/s ± 0%

TODO: investigate why gcc 4.8 (Ubuntu Trusty) seems to generate faster code than gcc 7.2 (Debian Testing):

wuffs_gif_decode_1k/gcc           411MB/s ± 2%
wuffs_gif_decode_10k/gcc          162MB/s ± 0%
wuffs_gif_decode_100k/gcc         138MB/s ± 0%
wuffs_gif_decode_1000k/gcc        141MB/s ± 0%

Updated on November 2017.