wuffs bench -mimic
summarized throughput numbers for various codecs are below. Higher is better.
“Mimic” tests check that Wuffs' output mimics (i.e. exactly matches) other libraries' output. “Mimic” benchmarks give the numbers for those other libraries, as shipped with Debian. These were measured on a Debian Testing system as of October 2019, which meant these compiler versions:
and these “mimic” library versions, all written in C:
Unless otherwise stated, the numbers below were measured on an Intel x86_64 Broadwell, and were taken as of Wuffs git commit ffdce5ef “Have bench-rust-gif process animated / RGBA images”.
The benchmark programs aim to be runnable “out of the box” without any configuration or installation. For example, to run the std/zlib
benchmarks:
git clone https://github.com/google/wuffs.git cd wuffs gcc -O3 test/c/std/zlib.c ./a.out -bench rm a.out
A comment near the top of that .c
file says how to run the mimic benchmarks.
The output of those benchmark programs is compatible with the benchstat tool. For example, that tool can calculate confidence intervals based on multiple benchmark runs, or calculate p-values when comparing numbers before and after a code change. To install it, first install Go, then run go install golang.org/x/perf/cmd/benchstat
.
As mentioned above, individual benchmark programs can be run manually. However, the canonical way to run the benchmarks (across multiple compilers and multiple packages like GIF and PNG) for Wuffs' standard library is to use the wuffs
command line tool, as it will also re-generate (transpile) the C code whenever you edit the std/*/*.wuffs
code. Running go install -v github.com/google/wuffs/cmd/...
will install the Wuffs tools. After that, you can say
wuffs bench
or
wuffs bench -mimic std/deflate
or
wuffs bench -ccompilers=gcc -reps=3 -focus=wuffs_gif_decode_20k std/gif
On some of the benchmarks below, clang performs noticeably worse (e.g. 1.3x slower) than gcc, on the same C code. A relatively simple reproduction was filed as LLVM bug 35567.
CPU power management can inject noise in benchmark times. On a Linux system, power management can be controlled with:
# Query. cpupower --cpu all frequency-info --policy # Turn on. sudo cpupower frequency-set --governor powersave # Turn off. sudo cpupower frequency-set --governor performance
The 1k
, 10k
, etc. numbers are approximately how many bytes are hashed.
name speed vs_mimic wuffs_adler32_10k/clang8 2.41GB/s 0.84x wuffs_adler32_100k/clang8 2.42GB/s 0.84x wuffs_adler32_10k/gcc9 3.24GB/s 1.13x wuffs_adler32_100k/gcc9 3.24GB/s 1.12x mimic_adler32_10k 2.87GB/s 1.00x mimic_adler32_100k 2.90GB/s 1.00x
The 1k
, 10k
, etc. numbers are approximately how many bytes are hashed.
name speed vs_mimic wuffs_crc32_ieee_10k/clang8 2.85GB/s 2.11x wuffs_crc32_ieee_100k/clang8 2.87GB/s 2.13x wuffs_crc32_ieee_10k/gcc9 3.38GB/s 2.50x wuffs_crc32_ieee_100k/gcc9 3.40GB/s 2.52x mimic_crc32_ieee_10k 1.35GB/s 1.00x mimic_crc32_ieee_100k 1.35GB/s 1.00x
The 1k
, 10k
, etc. numbers are approximately how many bytes there in the decoded output.
The full_init
vs part_init
suffixes are whether WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED
is unset or set.
name speed vs_mimic wuffs_deflate_decode_1k_full_init/clang8 160MB/s 0.74x wuffs_deflate_decode_1k_part_init/clang8 199MB/s 0.92x wuffs_deflate_decode_10k_full_init/clang8 255MB/s 0.94x wuffs_deflate_decode_10k_part_init/clang8 263MB/s 0.97x wuffs_deflate_decode_100k_just_one_read/clang8 306MB/s 0.93x wuffs_deflate_decode_100k_many_big_reads/clang8 250MB/s 0.98x wuffs_deflate_decode_1k_full_init/gcc9 164MB/s 0.76x wuffs_deflate_decode_1k_part_init/gcc9 207MB/s 0.95x wuffs_deflate_decode_10k_full_init/gcc9 247MB/s 0.91x wuffs_deflate_decode_10k_part_init/gcc9 254MB/s 0.94x wuffs_deflate_decode_100k_just_one_read/gcc9 333MB/s 1.01x wuffs_deflate_decode_100k_many_big_reads/gcc9 261MB/s 1.02x mimic_deflate_decode_1k 217MB/s 1.00x mimic_deflate_decode_10k 270MB/s 1.00x mimic_deflate_decode_100k_just_one_read 329MB/s 1.00x mimic_deflate_decode_100k_many_big_reads 256MB/s 1.00x
32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017):
name speed vs_mimic wuffs_deflate_decode_1k_full_init/clang5 30.4MB/s 0.60x wuffs_deflate_decode_1k_part_init/clang5 37.9MB/s 0.74x wuffs_deflate_decode_10k_full_init/clang5 72.8MB/s 0.81x wuffs_deflate_decode_10k_part_init/clang5 76.2MB/s 0.85x wuffs_deflate_decode_100k_just_one_read/clang5 96.5MB/s 0.82x wuffs_deflate_decode_100k_many_big_reads/clang5 81.1MB/s 0.90x wuffs_deflate_decode_1k_full_init/gcc6 31.6MB/s 0.62x wuffs_deflate_decode_1k_part_init/gcc6 39.9MB/s 0.78x wuffs_deflate_decode_10k_full_init/gcc6 69.6MB/s 0.78x wuffs_deflate_decode_10k_part_init/gcc6 72.4MB/s 0.81x wuffs_deflate_decode_100k_just_one_read/gcc6 87.3MB/s 0.74x wuffs_deflate_decode_100k_many_big_reads/gcc6 73.8MB/s 0.82x mimic_deflate_decode_1k 51.0MB/s 1.00x mimic_deflate_decode_10k 89.7MB/s 1.00x mimic_deflate_decode_100k_just_one_read 118MB/s 1.00x mimic_deflate_decode_100k_many_big_reads 90.0MB/s 1.00x
For comparison, here are miniz 2.1.0's numbers.
name speed vs_mimic miniz_deflate_decode_1k/clang8 174MB/s 0.80x miniz_deflate_decode_10k/clang8 245MB/s 0.91x miniz_deflate_decode_100k_just_one_read/clang8 309MB/s 0.94x miniz_deflate_decode_1k/gcc9 158MB/s 0.73x miniz_deflate_decode_10k/gcc9 221MB/s 0.82x miniz_deflate_decode_100k_just_one_read/gcc9 250MB/s 0.76x
To reproduce these numbers, look in test/c/mimiclib/deflate-gzip-zlib.c
.
For comparison, here are Go 1.12.10‘s numbers, using Go’s standard library's compress/flate
package.
name speed vs_mimic go_deflate_decode_1k 45.4MB/s 0.21x go_deflate_decode_10k 82.5MB/s 0.31x go_deflate_decode_100k 94.0MB/s 0.29x
To reproduce these numbers:
git clone https://github.com/google/wuffs.git cd wuffs/script/bench-go-deflate/ go run main.go
For comparison, here are Rust 1.37.0's numbers, using the alexcrichton/flate2-rs and Frommi/miniz_oxide crates, which this file suggests is the fastest pure-Rust Deflate decoder.
name speed vs_mimic rust_deflate_decode_1k 104MB/s 0.48x rust_deflate_decode_10k 202MB/s 0.75x rust_deflate_decode_100k 218MB/s 0.66x
To reproduce these numbers:
git clone https://github.com/google/wuffs.git cd wuffs/script/bench-rust-deflate/ cargo run --release
The 1k
, 10k
, etc. numbers are approximately how many pixels there are in the decoded image. For example, the test/data/harvesters.*
images are 1165 × 859, approximately 1000k pixels.
The bgra
vs indexed
suffixes are whether to decode to 4 bytes (BGRA or RGBA) or 1 byte (a palette index) per pixel, even if the underlying file format gives 1 byte per pixel.
The full_init
vs part_init
suffixes are whether WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED
is unset or set.
The libgif library doesn't export any API for decode-to-BGRA or decode-to-RGBA, so there are no mimic numbers to compare to for the bgra
suffix.
name speed vs_mimic wuffs_gif_decode_1k_bw/clang8 461MB/s 3.18x wuffs_gif_decode_1k_color_full_init/clang8 141MB/s 1.85x wuffs_gif_decode_1k_color_part_init/clang8 189MB/s 2.48x wuffs_gif_decode_10k_bgra/clang8 743MB/s n/a wuffs_gif_decode_10k_indexed/clang8 200MB/s 2.11x wuffs_gif_decode_20k/clang8 245MB/s 2.50x wuffs_gif_decode_100k_artificial/clang8 531MB/s 3.43x wuffs_gif_decode_100k_realistic/clang8 218MB/s 2.27x wuffs_gif_decode_1000k_full_init/clang8 221MB/s 2.25x wuffs_gif_decode_1000k_part_init/clang8 221MB/s 2.25x wuffs_gif_decode_anim_screencap/clang8 1.07GB/s 6.01x wuffs_gif_decode_1k_bw/gcc9 478MB/s 3.30x wuffs_gif_decode_1k_color_full_init/gcc9 148MB/s 1.94x wuffs_gif_decode_1k_color_part_init/gcc9 194MB/s 2.54x wuffs_gif_decode_10k_bgra/gcc9 645MB/s n/a wuffs_gif_decode_10k_indexed/gcc9 203MB/s 2.14x wuffs_gif_decode_20k/gcc9 244MB/s 2.49x wuffs_gif_decode_100k_artificial/gcc9 532MB/s 3.43x wuffs_gif_decode_100k_realistic/gcc9 214MB/s 2.23x wuffs_gif_decode_1000k_full_init/gcc9 217MB/s 2.21x wuffs_gif_decode_1000k_part_init/gcc9 218MB/s 2.22x wuffs_gif_decode_anim_screencap/gcc9 1.11GB/s 6.24x mimic_gif_decode_1k_bw 145MB/s 1.00x mimic_gif_decode_1k_color 76.3MB/s 1.00x mimic_gif_decode_10k_indexed 94.9MB/s 1.00x mimic_gif_decode_20k 98.1MB/s 1.00x mimic_gif_decode_100k_artificial 155MB/s 1.00x mimic_gif_decode_100k_realistic 96.1MB/s 1.00x mimic_gif_decode_1000k 98.4MB/s 1.00x mimic_gif_decode_anim_screencap 178MB/s 1.00x
32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017):
name speed vs_mimic wuffs_gif_decode_1k_bw/clang5 49.1MB/s 1.76x wuffs_gif_decode_1k_color_full_init/clang5 22.3MB/s 1.35x wuffs_gif_decode_1k_color_part_init/clang5 27.4MB/s 1.66x wuffs_gif_decode_10k_bgra/clang5 157MB/s n/a wuffs_gif_decode_10k_indexed/clang5 42.0MB/s 1.79x wuffs_gif_decode_20k/clang5 49.3MB/s 1.68x wuffs_gif_decode_100k_artificial/clang5 132MB/s 2.62x wuffs_gif_decode_100k_realistic/clang5 47.8MB/s 1.62x wuffs_gif_decode_1000k_full_init/clang5 46.4MB/s 1.62x wuffs_gif_decode_1000k_part_init/clang5 46.4MB/s 1.62x wuffs_gif_decode_anim_screencap/clang5 243MB/s 4.03x wuffs_gif_decode_1k_bw/gcc6 46.6MB/s 1.67x wuffs_gif_decode_1k_color_full_init/gcc6 20.1MB/s 1.22x wuffs_gif_decode_1k_color_part_init/gcc6 24.2MB/s 1.47x wuffs_gif_decode_10k_bgra/gcc6 124MB/s n/a wuffs_gif_decode_10k_indexed/gcc6 34.8MB/s 1.49x wuffs_gif_decode_20k/gcc6 43.8MB/s 1.49x wuffs_gif_decode_100k_artificial/gcc6 123MB/s 2.44x wuffs_gif_decode_100k_realistic/gcc6 42.7MB/s 1.44x wuffs_gif_decode_1000k_full_init/gcc6 41.6MB/s 1.45x wuffs_gif_decode_1000k_part_init/gcc6 41.7MB/s 1.45x wuffs_gif_decode_anim_screencap/gcc6 227MB/s 3.76x mimic_gif_decode_1k_bw 27.9MB/s 1.00x mimic_gif_decode_1k_color 16.5MB/s 1.00x mimic_gif_decode_10k_indexed 23.4MB/s 1.00x mimic_gif_decode_20k 29.4MB/s 1.00x mimic_gif_decode_100k_artificial 50.4MB/s 1.00x mimic_gif_decode_100k_realistic 29.5MB/s 1.00x mimic_gif_decode_1000k 28.7MB/s 1.00x mimic_gif_decode_anim_screencap 60.3MB/s 1.00x
For comparison, here are Go 1.12.10‘s numbers, using Go’s standard library's image/gif
package.
name speed vs_mimic go_gif_decode_1k_bw 107MB/s 0.74x go_gif_decode_1k_color 39.2MB/s 0.51x go_gif_decode_10k_bgra 117MB/s n/a go_gif_decode_10k_indexed 57.8MB/s 0.61x go_gif_decode_20k 67.2MB/s 0.69x go_gif_decode_100k_artificial 151MB/s 0.97x go_gif_decode_100k_realistic 67.2MB/s 0.70x go_gif_decode_1000k 68.1MB/s 0.69x go_gif_decode_anim_screencap 206MB/s 1.16x
To reproduce these numbers:
git clone https://github.com/google/wuffs.git cd wuffs/script/bench-go-gif/ go run main.go
For comparison, here are Rust 1.37.0's numbers, using the image-rs/image-gif crate, easily the top crates.io
result for “gif”.
name speed vs_mimic rust_gif_decode_1k_bw 89.2MB/s 0.62x rust_gif_decode_1k_color 20.7MB/s 0.27x rust_gif_decode_10k_bgra 74.5MB/s n/a rust_gif_decode_10k_indexed 20.4MB/s 0.21x rust_gif_decode_20k 28.9MB/s 0.29x rust_gif_decode_100k_artificial 79.1MB/s 0.51x rust_gif_decode_100k_realistic 27.9MB/s 0.29x rust_gif_decode_1000k 27.9MB/s 0.28x rust_gif_decode_anim_screencap 144MB/s 0.81x
To reproduce these numbers:
git clone https://github.com/google/wuffs.git cd wuffs/script/bench-rust-gif/ cargo run --release
The 1k
, 10k
, etc. numbers are approximately how many bytes there in the decoded output.
name speed vs_mimic wuffs_gzip_decode_10k/clang8 238MB/s 1.05x wuffs_gzip_decode_100k/clang8 273MB/s 1.03x wuffs_gzip_decode_10k/gcc9 239MB/s 1.06x wuffs_gzip_decode_100k/gcc9 297MB/s 1.12x mimic_gzip_decode_10k 226MB/s 1.00x mimic_gzip_decode_100k 265MB/s 1.00x
The 1k
, 10k
, etc. numbers are approximately how many bytes there in the decoded output.
The libgif library doesn't export its LZW decoder in its API, so there are no mimic numbers to compare to.
name speed vs_mimic wuffs_lzw_decode_20k/clang8 263MB/s n/a wuffs_lzw_decode_100k/clang8 438MB/s n/a wuffs_lzw_decode_20k/gcc9 266MB/s n/a wuffs_lzw_decode_100k/gcc9 450MB/s n/a
The 1k
, 10k
, etc. numbers are approximately how many bytes there in the decoded output.
name speed vs_mimic wuffs_zlib_decode_10k/clang8 237MB/s 0.96x wuffs_zlib_decode_100k/clang8 272MB/s 0.92x wuffs_zlib_decode_10k/gcc9 242MB/s 0.98x wuffs_zlib_decode_100k/gcc9 294MB/s 0.99x mimic_zlib_decode_10k 247MB/s 1.00x mimic_zlib_decode_100k 296MB/s 1.00x
Updated on December 2019.