vmx: implement fast path vmx_composite_over_n_8888

Running "lowlevel-blt-bench over_n_8888" on Playstation3 3.2GHz,
Gentoo ppc (32-bit userland) gave the following results:

before:  over_n_8888 =  L1: 147.47  L2: 205.86  M:121.07
after:   over_n_8888 =  L1: 287.27  L2: 261.09  M:133.48

Cairo non-trimmed benchmarks on POWER8, 3.4GHz 8 Cores:

ocitysmap          659.69  -> 611.71   :  1.08x speedup
xfce4-terminal-a1  2725.22 -> 2547.47  :  1.07x speedup

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
1 file changed