commit | f00c43f4d6e812b581f64edc53a655f8e2413938 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Tue May 07 13:41:47 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Wed Oct 30 17:58:29 2024 +0000 |
tree | 7ac0170c9f89bffaf220a7f488e2349c88248893 | |
parent | 51d07554a039004ff278009852f9d33c0f76bf91 [diff] |
[AArch64] Unroll HalfFloat{,1}Row_NEON The existing C implementation compiled with a recent LLVM is auto-vectorised and unrolled to process four vectors per loop iteration, making the Neon implementation slower than the C implementation on little cores. To avoid this, unroll the Neon implementation to also process four vectors per iteration. Reduction in cycle counts observed compared to the existing Neon implementation: | HalfFloat1Row_NEON | HalfFloatRow_NEON Cortex-A510 | -37.1% | -40.8% Cortex-A520 | -32.3% | -37.4% Cortex-A720 | 0.0% | -10.6% Cortex-X2 | 0.0% | -7.8% Cortex-X4 | +0.3% | -6.9% Bug: b/42280945 Change-Id: I12b474c970fc4355d75ed924c4ca6169badda2bc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5872805 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.