vmx: optimize vmx_composite_over_n_8888_8888_ca
This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use
of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of
splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h.
Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.2 ppc64le gave the following results:
reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills)
Before After Change
--------------------------------------------
L1 244.97 474.05 +93.51%
L2 243.74 473.05 +94.08%
M 243.29 467.16 +92.02%
HT 144.03 252.79 +75.51%
VT 174.24 279.03 +60.14%
R 109.86 149.98 +36.52%
RT 47.96 53.18 +10.88%
Kops/s 524 576 +9.92%
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
1 file changed