commit | 4ad050b5ecc04925d2938df722c7b684e88647b7 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Wed May 01 17:07:11 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Fri Jul 19 15:55:59 2024 +0000 |
tree | 693312ce83a0836003e6f60bc2e068281e183581 | |
parent | b5f9d7cb76a1e31f1893df0d903a8a421f2fbba0 [diff] |
[AArch64] Unroll {I422,I422Alpha}ToARGBRow_SVE2 Since the UV components are duplicated in I422 we end up wasting half of the vector bandwidth processing the same elements twice. By unrolling the kernel to process two vectors of Y per iteration we can fill a whole vector of U/V components. Rather than packing RGBA components into pairs during the narrowing we now just narrow into individual component vectors and use ST4B instead. This by itself is slower on some micro-architectures like Cortex-A510 but the benefit from unrolling significantly outweights this. | I422AlphaToARGBRow_SVE2 | I422ToARGBRow_SVE2 Cortex-A510 | -46.2% | -48.8% Cortex-A720 | -20.8% | -21.0% Cortex-X2 | -11.3% | -7.5% Cortex-X4 | -15.4% | -15.5% Bug: libyuv:973 Change-Id: I69389c4279861f7a460ae0c28186f023c728c4e8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5725173 Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.