Make SSE2/Neon convolution functions not to read extra bytes

This change makes SSE2/Neon horizontal convolution functions do not read
extra pixels past the end of the buffer. So we can remove all the SIMD
specific logic in SkConvolver to deal with last couple of rows and also
avoid applying padding to convolution filters.

Performance impact is small. Nanobench time change:
                              SSE2    NEON
bitmap_scale_filter_64_256     1%     -2%
bitmap_scale_filter_256_64     1%      2%
bitmap_scale_filter_90_10      1%     -1%
bitmap_scale_filter_90_30      1%      0%
bitmap_scale_filter_90_80      1%      0%
bitmap_scale_filter_90_90      1%      1%
bitmap_scale_filter_80_90      0%      0%
bitmap_scale_filter_30_90      3%      6%
bitmap_scale_filter_10_90      0%      2%

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2481733003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2481733003
6 files changed