commit | 5236846b648418089d9d88b797ed1b7a5e03e907 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Wed Apr 24 18:03:20 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Mon Jul 08 20:26:23 2024 +0000 |
tree | d20ee38ffb791cc072142d2e1c25918460ab2c7e | |
parent | 555f80f3ce4b789f4e91e98e0848c3d8f51d19ef [diff] |
[AArch64] Keep UV interleaved in some *ToARGBRow_SVE2 kernels The existing I4XXTORGB_SVE macro operates only on even byte lanes of the loaded U/V vectors. This is sub-optimal since we are effectively wasting half of the vector in any pre-processing steps before the conversion. In particular, where the UV components are loaded from interleaved data we can save a TBL instruction by maintaining the interleaved format. This commit introduces a new NVTORGB_SVE macro to handle the case where U/V components are interleaved into even/odd bytes of a vector, mirroring a similar macro in the AArch64 Neon implementation. Reduction in runtimes observed compared to the existing SVE2 code: | Cortex-A510 | Cortex-A720 | Cortex-X2 NV12ToARGBRow_SVE2 | -5.3% | -0.2% | -4.4% NV21ToARGBRow_SVE2 | -5.3% | -0.2% | -4.4% UYVYToARGBRow_SVE2 | -5.6% | 0.0% | -4.6% YUY2ToARGBRow_SVE2 | -5.5% | -0.1% | -4.2% Bug: libyuv:973 Change-Id: I418de2e684e0b6b0b9e41c39b564438531e44671 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5622133 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.