Restrict the scope of the tableSize variable.
It's really just a style change, not an optimization, but for the
record, the numbers don't show a strong change either way, and could
easily just be noise.
name old speed new speed delta
WordsEncode1e1-8 667MB/s ± 0% 665MB/s ± 0% ~ (p=0.190 n=5+4)
WordsEncode1e2-8 85.1MB/s ± 0% 85.0MB/s ± 0% ~ (p=0.556 n=5+4)
WordsEncode1e3-8 235MB/s ± 0% 234MB/s ± 2% ~ (p=0.690 n=5+5)
WordsEncode1e4-8 234MB/s ± 0% 233MB/s ± 0% ~ (p=0.151 n=5+5)
WordsEncode1e5-8 216MB/s ± 0% 214MB/s ± 1% -0.61% (p=0.008 n=5+5)
WordsEncode1e6-8 258MB/s ± 0% 258MB/s ± 0% -0.29% (p=0.024 n=5+5)
RandomEncode-8 13.2GB/s ± 1% 13.1GB/s ± 1% ~ (p=0.056 n=5+5)
_ZFlat0-8 629MB/s ± 0% 630MB/s ± 0% ~ (p=0.111 n=5+4)
_ZFlat1-8 325MB/s ± 0% 326MB/s ± 0% +0.27% (p=0.016 n=5+4)
_ZFlat2-8 13.7GB/s ± 5% 13.9GB/s ± 1% ~ (p=0.310 n=5+5)
_ZFlat3-8 177MB/s ± 0% 177MB/s ± 1% ~ (p=0.690 n=5+5)
_ZFlat4-8 6.15GB/s ± 2% 6.19GB/s ± 1% ~ (p=0.222 n=5+5)
_ZFlat5-8 614MB/s ± 0% 615MB/s ± 0% ~ (p=0.310 n=5+5)
_ZFlat6-8 231MB/s ± 2% 231MB/s ± 0% ~ (p=0.690 n=5+5)
_ZFlat7-8 215MB/s ± 2% 215MB/s ± 1% ~ (p=0.222 n=5+5)
_ZFlat8-8 246MB/s ± 0% 246MB/s ± 0% ~ (p=0.190 n=4+5)
_ZFlat9-8 202MB/s ± 0% 202MB/s ± 0% ~ (p=0.683 n=4+5)
_ZFlat10-8 794MB/s ± 2% 803MB/s ± 0% +1.13% (p=0.008 n=5+5)
_ZFlat11-8 350MB/s ± 0% 351MB/s ± 0% +0.25% (p=0.032 n=4+5)
diff --git a/encode.go b/encode.go
index 77eb280..da5fb2b 100644
--- a/encode.go
+++ b/encode.go
@@ -103,11 +103,14 @@
// checks.
tableMask = maxTableSize - 1
)
- shift, tableSize := uint32(32-8), 1<<8
- for tableSize < maxTableSize && tableSize < len(src) {
+ shift := uint32(32 - 8)
+ for tableSize := 1 << 8; tableSize < maxTableSize && tableSize < len(src); tableSize *= 2 {
shift--
- tableSize *= 2
}
+ // In Go, all array elements are zero-initialized, so there is no advantage
+ // to a smaller tableSize per se. However, it matches the C++ algorithm,
+ // and in the asm versions of this code, we can get away with zeroing only
+ // the first tableSize elements.
var table [maxTableSize]uint16
// sLimit is when to stop looking for offset/length copies. The inputMargin