commit | c1fe5663f5c386d03892c9d9b82cbb169ddea171 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Thu Apr 18 08:58:29 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Wed Jul 10 23:12:43 2024 +0000 |
tree | 02c5ea9fb8a597cef62c2705a8893bf247c170e9 | |
parent | 5bac99fe09055ff6866d052669c123d00a86fae0 [diff] |
[AArch64] Use full vectors in ARGB4444To{Y,UV}Row_NEON The existing ARGB4444TORGB macro only makes use of 64 bit wide vectors rather than the full 128 bits available, so unroll it to allow us to process more data per instruction. For ARGB4444ToUVRow_NEON we already have enough data available each iteration to make use of full vectors, but for ARGB4444ToYRow_NEON we also need to adjust the "any" kernel to allow us to process 16 elements per iteration. Reduction in runtimes observed compared to the existing Neon kernels: | ARGB4444ToUVRow | ARGB4444ToYRow Cortex-A55 | -27.8% | -34.6% Cortex-A510 | -37.0% | -44.4% Cortex-A76 | -40.2% | -22.0% Cortex-A720 | -33.4% | -35.5% Cortex-X1 | -34.1% | -19.7% Cortex-X2 | -32.1% | -26.3% Bug: libyuv:976 Change-Id: I08f6286bab0ebf5e24d5d5803f8c45ec6ba776ee Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631541 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.