commit | 6f1d8b1e11868bdcff72eeaf7e0a80fd82fde929 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Fri Apr 12 16:44:54 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Wed May 01 19:46:43 2024 +0000 |
tree | eaf9d02b921cbd553c5cdc61059bac88f7c76160 | |
parent | 67e5e79dbe6745bf2b0c25c1a56acf97e2b8966a [diff] |
[AArch64] Add SVE2 implementations for ARGBToUVRow and similar By maintaining the interleaved format of the data we can use a common kernel for all input channel orderings and simply pass a different vector of constants instead. A similar approach is possible with only Neon by making use of multiplies and repeated application of ADDP to combine channels, however this is slower on older cores like Cortex-A53 so is not pursued further. For odd problem sizes we need a slightly different implementation for the final element, so introduce an "any" kernel to address that rather than bloating the code for the common case. Observed affect on runtimes compared to the existing Neon kernels: | Cortex-A510 | Cortex-A720 | Cortex-X2 ABGRToUVJRow | -15.5% | +5.4% | -33.1% ABGRToUVRow | -15.6% | +5.3% | -35.9% ARGBToUVJRow | -10.1% | +5.4% | -32.7% ARGBToUVRow | -10.1% | +5.4% | -29.3% BGRAToUVRow | -15.5% | +4.6% | -32.8% RGBAToUVRow | -10.1% | +4.2% | -36.0% Bug: libyuv:973 Change-Id: I041ca44db0ae8a2adffcdf24e822eebe962baf33 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5505537 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.