Make a small s/uint/uint32/ decoder optimization.
I'm not entirely sure why the benchmark numbers improve as much as they
do, but I'll take it.
benchmark old MB/s new MB/s speedup
BenchmarkWordsDecode1e1-8 482.12 485.84 1.01x
BenchmarkWordsDecode1e2-8 372.28 421.86 1.13x
BenchmarkWordsDecode1e3-8 482.21 525.44 1.09x
BenchmarkWordsDecode1e4-8 339.46 360.87 1.06x
BenchmarkWordsDecode1e5-8 264.90 270.42 1.02x
BenchmarkWordsDecode1e6-8 284.27 290.98 1.02x
Benchmark_UFlat0-8 511.15 544.02 1.06x
Benchmark_UFlat1-8 431.52 450.03 1.04x
Benchmark_UFlat2-8 15208.70 15099.07 0.99x
Benchmark_UFlat3-8 805.02 871.78 1.08x
Benchmark_UFlat4-8 2631.19 2980.30 1.13x
Benchmark_UFlat5-8 501.62 535.45 1.07x
Benchmark_UFlat6-8 271.30 278.13 1.03x
Benchmark_UFlat7-8 265.19 272.14 1.03x
Benchmark_UFlat8-8 282.54 288.80 1.02x
Benchmark_UFlat9-8 256.39 262.69 1.02x
Benchmark_UFlat10-8 590.37 640.96 1.09x
Benchmark_UFlat11-8 339.13 357.01 1.05x
1 file changed