Rearrange the extendMatch register allocation.
This minimizes the diff in a follow-up commit, when manually inlining.
It's not an optimization per se, but for the record:
name old speed new speed delta
WordsEncode1e1-8 700MB/s ± 1% 701MB/s ± 0% ~ (p=0.393 n=10+10)
WordsEncode1e2-8 460MB/s ± 1% 460MB/s ± 0% ~ (p=0.393 n=10+10)
WordsEncode1e3-8 478MB/s ± 2% 480MB/s ± 0% ~ (p=0.912 n=10+10)
WordsEncode1e4-8 414MB/s ± 0% 416MB/s ± 0% +0.64% (p=0.000 n=9+10)
WordsEncode1e5-8 296MB/s ± 1% 297MB/s ± 0% ~ (p=0.113 n=9+10)
WordsEncode1e6-8 345MB/s ± 0% 345MB/s ± 0% ~ (p=0.949 n=8+10)
RandomEncode-8 14.4GB/s ± 2% 14.4GB/s ± 2% ~ (p=0.278 n=9+10)
_ZFlat0-8 888MB/s ± 1% 891MB/s ± 1% +0.35% (p=0.010 n=10+9)
_ZFlat1-8 471MB/s ± 1% 471MB/s ± 0% ~ (p=0.447 n=10+9)
_ZFlat2-8 16.2GB/s ± 3% 16.2GB/s ± 3% ~ (p=0.912 n=10+10)
_ZFlat3-8 675MB/s ± 1% 676MB/s ± 0% ~ (p=0.150 n=9+10)
_ZFlat4-8 8.31GB/s ± 1% 8.36GB/s ± 1% +0.65% (p=0.035 n=10+10)
_ZFlat5-8 850MB/s ± 0% 852MB/s ± 0% ~ (p=0.182 n=9+10)
_ZFlat6-8 316MB/s ± 0% 316MB/s ± 0% ~ (p=0.762 n=10+8)
_ZFlat7-8 294MB/s ± 1% 296MB/s ± 0% +0.51% (p=0.006 n=9+8)
_ZFlat8-8 330MB/s ± 1% 331MB/s ± 1% ~ (p=0.881 n=9+9)
_ZFlat9-8 273MB/s ± 0% 274MB/s ± 0% +0.23% (p=0.043 n=10+8)
_ZFlat10-8 1.17GB/s ± 1% 1.17GB/s ± 0% ~ (p=0.922 n=10+9)
_ZFlat11-8 461MB/s ± 0% 462MB/s ± 0% ~ (p=0.219 n=10+9)
Also:
name old time/op new time/op delta
ExtendMatch-8 7.92µs ± 2% 7.80µs ± 2% -1.51% (p=0.002 n=10+9)
and note that this is time/op instead of MB/s, so negative is better,
although it's quite possibly all just noise.
1 file changed