Skip multiple bytes if the last match was >= 32 bytes prior.

benchmark                     old MB/s     new MB/s     speedup
BenchmarkWordsEncode1e3-8     137.99       132.57       0.96x
BenchmarkWordsEncode1e4-8     173.30       156.26       0.90x
BenchmarkWordsEncode1e5-8     137.16       132.59       0.97x
BenchmarkWordsEncode1e6-8     165.45       164.47       0.99x
BenchmarkRandomEncode-8       140.04       12260.44     87.55x
Benchmark_ZFlat0-8            334.14       335.84       1.01x
Benchmark_ZFlat1-8            168.93       168.19       1.00x
Benchmark_ZFlat2-8            134.42       8763.96      65.20x
Benchmark_ZFlat3-8            48.04        47.36        0.99x
Benchmark_ZFlat4-8            151.86       2578.12      16.98x
Benchmark_ZFlat5-8            344.43       341.94       0.99x
Benchmark_ZFlat6-8            149.21       147.24       0.99x
Benchmark_ZFlat7-8            140.87       138.72       0.98x
Benchmark_ZFlat8-8            155.95       155.89       1.00x
Benchmark_ZFlat9-8            135.05       136.07       1.01x
Benchmark_ZFlat10-8           380.98       379.77       1.00x
Benchmark_ZFlat11-8           227.48       226.59       1.00x

Thanks to Klaus Post for the original suggestion. Unfortunately,
https://github.com/golang/snappy/pull/19 was abandoned.
diff --git a/encode.go b/encode.go
index f3b5484..d80185d 100644
--- a/encode.go
+++ b/encode.go
@@ -122,7 +122,8 @@
 		t, *p = *p-1, s+1
 		// If t is invalid or src[s:s+4] differs from src[t:t+4], accumulate a literal byte.
 		if t < 0 || s-t >= maxOffset || b0 != src[t] || b1 != src[t+1] || b2 != src[t+2] || b3 != src[t+3] {
-			s++
+			// Skip multiple bytes if the last match was >= 32 bytes prior.
+			s += 1 + (s-lit)>>5
 			continue
 		}
 		// Otherwise, we have a match. First, emit any pending literal bytes.