commit | faade2f73f4a33ead7e19f9506a588074d3154f9 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Wed Sep 18 13:05:07 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Thu Oct 24 20:52:08 2024 +0000 |
tree | ceba5bad4fe237d83430b665f22ceceb3802b45e | |
parent | 0dce974ca0680db8ca1b1c42d807a97c90bb75ce [diff] |
[AArch64] Avoid partial vector stores in ScaleRowDown38_NEON The existing code performs a pair of stores since there is no AArch64 instruction in Neon to store exactly 12 bytes from a vector register. It is guaranteed to be safe to write full vectors until the last iteration of the loop, since the extra four bytes will be over-written by subsequent iterations. This allows us to avoid duplicating the store instruction and address arithmetic. Reduction in runtime observed relative to the existing Neon implementation: Cortex-A55: +2.0% Cortex-A510: -25.3% Cortex-A520: -15.1% Cortex-A76: -32.2% Cortex-A715: -19.7% Cortex-A720: -19.6% Cortex-X1: -31.6% Cortex-X2: -27.1% Cortex-X3: -25.9% Cortex-X4: -24.7% Cortex-X925: -35.8% Bug: b/42280945 Change-Id: I222ed662f169d82f5f472bebb1bcfe6d428ccae2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5872843 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.