commit | 8f039f639c44448eb16c9544b7d00dad71aa7011 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Wed May 15 21:47:21 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Fri Jul 19 19:52:21 2024 +0000 |
tree | 3add1ce968c1b882b72ac05811d079c23aed65bd | |
parent | dc392094fc77ecc00e12cb9f47ba7168fcac1dcd [diff] |
[AArch64] Unroll ScaleRowDown4Box_NEON We can use wider load/store instructions and avoid the need to waste half of the ADDP/RSHRN vector data. The duplicated UADDLP and UADALP instructions also provide a good improvement on little cores due to their limited out-of-order capability. The mask in the "any" kernel definition is already set up to handle an unrolling of eight so no change to scale_any.cc is needed. Reduction in runtimes observed compared to the existing Neon implementation: Cortex-A55: -19.5% Cortex-A520: -38.3% Cortex-A76: -36.0% Cortex-A715: -18.1% Cortex-A720: -17.9% Cortex-X1: -25.4% Cortex-X2: -18.5% Cortex-X3: -8.2% Cortex-X4: -3.8% Bug: b/42280945 Change-Id: Iebba5da4db5e25af4b9fa5651c7396364dedffba Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5725172 Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.