Benchmarks

Preliminary puffs bench -mimic summarized throughput numbers for various codecs are below. Higher is better.

“Mimic” tests check that Puffs' output mimics (i.e. exactly matches) other libraries' output. “Mimic” benchmarks give the numbers for those other libraries, as shipped with the OS. These were measured on a Debian Testing system as of November 2017, which means these compiler versions:

  • clang/llvm 5.0.0
  • gcc 7.2.0

and these mimic library versions:

  • libgif 5.1.4
  • zlib 1.2.8

Unless otherwise stated, the numbers below were measured on an Intel x86_64 Broadwell, and were taken as of git commit “693af47 Rename some bcheckOptimizeXxx methods.”

Reproducing

The benchmark programs aim to be runnable “out of the box” without any configuration or installation. For example, to run the std/flate benchmarks:

git clone https://github.com/google/puffs.git
cd puffs/test/c/std
gcc -O3 flate.c
./a.out -bench
rm a.out

A comment near the top of that .c file says how to run the mimic benchmarks.

The output of those benchmark programs is compatible with the benchstat tool. For example, that tool can calculate confidence intervals based on multiple benchmark runs, or calculate p-values when comparing numbers before and after a code change. To install it, first install Go, then run go get golang.org/x/perf/cmd/benchstat.

puffs bench

As mentioned above, individual benchmark programs can be run manually. However, the canonical way to run the benchmarks (across multiple compilers and multiple packages like GIF and PNG) is to use the puffs command line tool, as it will also re-generate (transpile) the C code whenever you edit the *.puffs code. Running go install -v github.com/google/puffs/cmd/... will install the Puffs tools. After that, you can say

puffs bench

or

puffs bench -mimic std/flate

or

puffs bench -ccompilers=gcc -reps=3 -focus=Benchmarkpuffs_gif_lzw std/gif

CPU Scaling

CPU power management can inject noise in benchmark times. On a Linux system, power management can be controlled with:

# Query.
cpupower --cpu all frequency-info --policy
# Turn on.
sudo cpupower frequency-set --governor powersave
# Turn off.
sudo cpupower frequency-set --governor performance

Flate (including gzip and zlib)

The 1k, 10k, etc. numbers are approximately how many bytes there in the decoded output.

name                             speed

puffs_adler32_10k/clang          2.42GB/s ± 3%
puffs_adler32_100k/clang         2.42GB/s ± 3%
puffs_flate_decode_1k/clang       133MB/s ± 3%
puffs_flate_decode_10k/clang      199MB/s ± 3%
puffs_flate_decode_100k/clang     229MB/s ± 4%
puffs_zlib_decode_10k/clang       185MB/s ± 1%
puffs_zlib_decode_100k/clang      210MB/s ± 3%

puffs_adler32_10k/gcc            3.22GB/s ± 1%
puffs_adler32_100k/gcc           3.23GB/s ± 0%
puffs_flate_decode_1k/gcc         152MB/s ± 1%
puffs_flate_decode_10k/gcc        250MB/s ± 1%
puffs_flate_decode_100k/gcc       298MB/s ± 1%
puffs_zlib_decode_10k/gcc         238MB/s ± 2%
puffs_zlib_decode_100k/gcc        270MB/s ± 2%

mimic_adler32_10k                3.00GB/s ± 1%
mimic_adler32_100k               2.91GB/s ± 2%
mimic_flate_decode_1k             211MB/s ± 1%
mimic_flate_decode_10k            270MB/s ± 2%
mimic_flate_decode_100k           285MB/s ± 1%
mimic_zlib_decode_10k             250MB/s ± 2%
mimic_zlib_decode_100k            294MB/s ± 2%

GIF

The 1k, 10k, etc. numbers are approximately how many bytes of pixel data there are in the decoded image. For example, the test/testdata/harvesters.* images are 1165 × 859 (approximately 1000k pixels) and a GIF image (a paletted image) is 1 byte per pixel.

name                             speed

puffs_gif_decode_1k/clang         346MB/s ± 1%
puffs_gif_decode_10k/clang        137MB/s ± 0%
puffs_gif_decode_100k/clang       118MB/s ± 0%
puffs_gif_decode_1000k/clang      120MB/s ± 0%

puffs_gif_decode_1k/gcc           399MB/s ± 1%
puffs_gif_decode_10k/gcc          141MB/s ± 0%
puffs_gif_decode_100k/gcc         128MB/s ± 0%
puffs_gif_decode_1000k/gcc        131MB/s ± 0%

mimic_gif_decode_1k               147MB/s ± 0%
mimic_gif_decode_10k             90.7MB/s ± 0%
mimic_gif_decode_100k            95.4MB/s ± 0%
mimic_gif_decode_1000k           97.8MB/s ± 0%

TODO: investigate why gcc 4.8 (Ubuntu Trusty) seems to generate faster code than gcc 7.2 (Debian Testing):

puffs_gif_decode_1k/gcc           411MB/s ± 2%
puffs_gif_decode_10k/gcc          162MB/s ± 0%
puffs_gif_decode_100k/gcc         138MB/s ± 0%
puffs_gif_decode_1000k/gcc        141MB/s ± 0%

Updated on November 2017.