Restrict the scope of the tableSize variable.

It's really just a style change, not an optimization, but for the
record, the numbers don't show a strong change either way, and could
easily just be noise.

name              old speed      new speed      delta
WordsEncode1e1-8   667MB/s ± 0%   665MB/s ± 0%    ~     (p=0.190 n=5+4)
WordsEncode1e2-8  85.1MB/s ± 0%  85.0MB/s ± 0%    ~     (p=0.556 n=5+4)
WordsEncode1e3-8   235MB/s ± 0%   234MB/s ± 2%    ~     (p=0.690 n=5+5)
WordsEncode1e4-8   234MB/s ± 0%   233MB/s ± 0%    ~     (p=0.151 n=5+5)
WordsEncode1e5-8   216MB/s ± 0%   214MB/s ± 1%  -0.61%  (p=0.008 n=5+5)
WordsEncode1e6-8   258MB/s ± 0%   258MB/s ± 0%  -0.29%  (p=0.024 n=5+5)
RandomEncode-8    13.2GB/s ± 1%  13.1GB/s ± 1%    ~     (p=0.056 n=5+5)
_ZFlat0-8          629MB/s ± 0%   630MB/s ± 0%    ~     (p=0.111 n=5+4)
_ZFlat1-8          325MB/s ± 0%   326MB/s ± 0%  +0.27%  (p=0.016 n=5+4)
_ZFlat2-8         13.7GB/s ± 5%  13.9GB/s ± 1%    ~     (p=0.310 n=5+5)
_ZFlat3-8          177MB/s ± 0%   177MB/s ± 1%    ~     (p=0.690 n=5+5)
_ZFlat4-8         6.15GB/s ± 2%  6.19GB/s ± 1%    ~     (p=0.222 n=5+5)
_ZFlat5-8          614MB/s ± 0%   615MB/s ± 0%    ~     (p=0.310 n=5+5)
_ZFlat6-8          231MB/s ± 2%   231MB/s ± 0%    ~     (p=0.690 n=5+5)
_ZFlat7-8          215MB/s ± 2%   215MB/s ± 1%    ~     (p=0.222 n=5+5)
_ZFlat8-8          246MB/s ± 0%   246MB/s ± 0%    ~     (p=0.190 n=4+5)
_ZFlat9-8          202MB/s ± 0%   202MB/s ± 0%    ~     (p=0.683 n=4+5)
_ZFlat10-8         794MB/s ± 2%   803MB/s ± 0%  +1.13%  (p=0.008 n=5+5)
_ZFlat11-8         350MB/s ± 0%   351MB/s ± 0%  +0.25%  (p=0.032 n=4+5)
diff --git a/encode.go b/encode.go
index 77eb280..da5fb2b 100644
--- a/encode.go
+++ b/encode.go
@@ -103,11 +103,14 @@
 		// checks.
 		tableMask = maxTableSize - 1
 	)
-	shift, tableSize := uint32(32-8), 1<<8
-	for tableSize < maxTableSize && tableSize < len(src) {
+	shift := uint32(32 - 8)
+	for tableSize := 1 << 8; tableSize < maxTableSize && tableSize < len(src); tableSize *= 2 {
 		shift--
-		tableSize *= 2
 	}
+	// In Go, all array elements are zero-initialized, so there is no advantage
+	// to a smaller tableSize per se. However, it matches the C++ algorithm,
+	// and in the asm versions of this code, we can get away with zeroing only
+	// the first tableSize elements.
 	var table [maxTableSize]uint16
 
 	// sLimit is when to stop looking for offset/length copies. The inputMargin