util/bufferiszero: improve avx2 accelerator

By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a

The authorship of this patch actually belongs to Richard Henderson
<richard.henderson@linaro.org>, I just fixed a boundary case on his
original patch.

