857880f0e4d1d42a8508ac77be33556cc6f7f546 - third_party/pixman

commit	857880f0e4d1d42a8508ac77be33556cc6f7f546	[log] [tgz]
author	Oded Gabbay <oded.gabbay@gmail.com>	Sun Sep 06 10:58:30 2015 +0300
committer	Oded Gabbay <oded.gabbay@gmail.com>	Fri Sep 18 10:06:50 2015 +0300
tree	b0b64884dc83ef1a87fee9e1c770b9d434bd49c9
parent	73e586efb3ee149f76f15d9e549bffa15d8e30ec [diff]

vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER

This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all
the functions it calls (combine1, combine4 and
core_combine_over_u_pixel_vmx).

The optimization is done by removing use of expand_alpha_1x128 and
expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from
pixman_combine32.h.

Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.2 ppc64le gave the following results:

reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)

                Before          After           Change
              --------------------------------------------
L1              182.05          210.22         +15.47%
L2              180.6           208.92         +15.68%
M               180.52          208.22         +15.34%
HT              130.17          178.97         +37.49%
VT              145.82          184.22         +26.33%
R               104.51          129.38         +23.80%
RT              48.3            61.54          +27.41%
Kops/s          430             504            +17.21%

v2: Check *pm is not NULL before dereferencing it in combine1()

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>

pixman/pixman-vmx.c[diff]

1 file changed

tree: b0b64884dc83ef1a87fee9e1c770b9d434bd49c9