Write the encoder's emitCopy in asm.
name old speed new speed delta
WordsEncode1e1-8 690MB/s ± 0% 665MB/s ± 0% -3.64% (p=0.008 n=5+5)
WordsEncode1e2-8 83.7MB/s ± 1% 83.8MB/s ± 1% ~ (p=0.421 n=5+5)
WordsEncode1e3-8 230MB/s ± 1% 231MB/s ± 1% ~ (p=0.421 n=5+5)
WordsEncode1e4-8 233MB/s ± 1% 232MB/s ± 1% ~ (p=0.151 n=5+5)
WordsEncode1e5-8 212MB/s ± 0% 212MB/s ± 1% ~ (p=1.000 n=5+5)
WordsEncode1e6-8 255MB/s ± 0% 257MB/s ± 0% +0.57% (p=0.008 n=5+5)
RandomEncode-8 13.2GB/s ± 1% 13.2GB/s ± 1% ~ (p=0.151 n=5+5)
_ZFlat0-8 623MB/s ± 0% 629MB/s ± 0% +0.93% (p=0.008 n=5+5)
_ZFlat1-8 319MB/s ± 1% 324MB/s ± 0% +1.65% (p=0.008 n=5+5)
_ZFlat2-8 13.9GB/s ± 1% 13.9GB/s ± 1% ~ (p=0.548 n=5+5)
_ZFlat3-8 176MB/s ± 0% 176MB/s ± 1% ~ (p=0.690 n=5+5)
_ZFlat4-8 6.05GB/s ± 0% 6.12GB/s ± 0% +1.20% (p=0.008 n=5+5)
_ZFlat5-8 603MB/s ± 0% 614MB/s ± 0% +1.71% (p=0.008 n=5+5)
_ZFlat6-8 228MB/s ± 0% 230MB/s ± 0% +0.83% (p=0.008 n=5+5)
_ZFlat7-8 212MB/s ± 0% 214MB/s ± 0% +0.74% (p=0.008 n=5+5)
_ZFlat8-8 242MB/s ± 0% 244MB/s ± 0% +0.99% (p=0.008 n=5+5)
_ZFlat9-8 199MB/s ± 1% 200MB/s ± 0% +0.57% (p=0.008 n=5+5)
_ZFlat10-8 796MB/s ± 1% 797MB/s ± 0% ~ (p=1.000 n=5+5)
_ZFlat11-8 348MB/s ± 0% 351MB/s ± 1% ~ (p=0.056 n=5+5)
I'm not overly worried about the WordsEncode1e1-8 change: the time/op is
around 15 nanoseconds, which is tiny. In comparison, _ZFlat0-8 takes
around 163 microseconds (note µs not ns).
4 files changed