Make a small s/uint/uint32/ decoder optimization.

I'm not entirely sure why the benchmark numbers improve as much as they
do, but I'll take it.

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsDecode1e1-8     482.12       485.84       1.01x
BenchmarkWordsDecode1e2-8     372.28       421.86       1.13x
BenchmarkWordsDecode1e3-8     482.21       525.44       1.09x
BenchmarkWordsDecode1e4-8     339.46       360.87       1.06x
BenchmarkWordsDecode1e5-8     264.90       270.42       1.02x
BenchmarkWordsDecode1e6-8     284.27       290.98       1.02x
Benchmark_UFlat0-8            511.15       544.02       1.06x
Benchmark_UFlat1-8            431.52       450.03       1.04x
Benchmark_UFlat2-8            15208.70     15099.07     0.99x
Benchmark_UFlat3-8            805.02       871.78       1.08x
Benchmark_UFlat4-8            2631.19      2980.30      1.13x
Benchmark_UFlat5-8            501.62       535.45       1.07x
Benchmark_UFlat6-8            271.30       278.13       1.03x
Benchmark_UFlat7-8            265.19       272.14       1.03x
Benchmark_UFlat8-8            282.54       288.80       1.02x
Benchmark_UFlat9-8            256.39       262.69       1.02x
Benchmark_UFlat10-8           590.37       640.96       1.09x
Benchmark_UFlat11-8           339.13       357.01       1.05x
1 file changed