Rearrange the extendMatch register allocation.

This minimizes the diff in a follow-up commit, when manually inlining.

It's not an optimization per se, but for the record:
name              old speed      new speed      delta
WordsEncode1e1-8   700MB/s ± 1%   701MB/s ± 0%    ~     (p=0.393 n=10+10)
WordsEncode1e2-8   460MB/s ± 1%   460MB/s ± 0%    ~     (p=0.393 n=10+10)
WordsEncode1e3-8   478MB/s ± 2%   480MB/s ± 0%    ~     (p=0.912 n=10+10)
WordsEncode1e4-8   414MB/s ± 0%   416MB/s ± 0%  +0.64%   (p=0.000 n=9+10)
WordsEncode1e5-8   296MB/s ± 1%   297MB/s ± 0%    ~      (p=0.113 n=9+10)
WordsEncode1e6-8   345MB/s ± 0%   345MB/s ± 0%    ~      (p=0.949 n=8+10)
RandomEncode-8    14.4GB/s ± 2%  14.4GB/s ± 2%    ~      (p=0.278 n=9+10)
_ZFlat0-8          888MB/s ± 1%   891MB/s ± 1%  +0.35%   (p=0.010 n=10+9)
_ZFlat1-8          471MB/s ± 1%   471MB/s ± 0%    ~      (p=0.447 n=10+9)
_ZFlat2-8         16.2GB/s ± 3%  16.2GB/s ± 3%    ~     (p=0.912 n=10+10)
_ZFlat3-8          675MB/s ± 1%   676MB/s ± 0%    ~      (p=0.150 n=9+10)
_ZFlat4-8         8.31GB/s ± 1%  8.36GB/s ± 1%  +0.65%  (p=0.035 n=10+10)
_ZFlat5-8          850MB/s ± 0%   852MB/s ± 0%    ~      (p=0.182 n=9+10)
_ZFlat6-8          316MB/s ± 0%   316MB/s ± 0%    ~      (p=0.762 n=10+8)
_ZFlat7-8          294MB/s ± 1%   296MB/s ± 0%  +0.51%    (p=0.006 n=9+8)
_ZFlat8-8          330MB/s ± 1%   331MB/s ± 1%    ~       (p=0.881 n=9+9)
_ZFlat9-8          273MB/s ± 0%   274MB/s ± 0%  +0.23%   (p=0.043 n=10+8)
_ZFlat10-8        1.17GB/s ± 1%  1.17GB/s ± 0%    ~      (p=0.922 n=10+9)
_ZFlat11-8         461MB/s ± 0%   462MB/s ± 0%    ~      (p=0.219 n=10+9)

Also:
name           old time/op  new time/op  delta
ExtendMatch-8  7.92µs ± 2%  7.80µs ± 2%  -1.51%  (p=0.002 n=10+9)
and note that this is time/op instead of MB/s, so negative is better,
although it's quite possibly all just noise.
1 file changed