commit | 5bac99fe09055ff6866d052669c123d00a86fae0 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Tue May 14 17:38:43 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Wed Jul 10 23:10:43 2024 +0000 |
tree | 9d664d5dba265df007b12a01516b075d30df821e | |
parent | a425b559bdf8c2a7c3567af42ffe970d238cb72e [diff] |
[AArch64] Rework data loading in ScaleARGBFilterCols_NEON The existing code makes use of lane-indexed LD2 instructions to load the input data however this creates a strong dependency chain between consecutive load instructions. We can reduce this dependency chain by instead loading two vectors with wider lane-indexed LD1 instructions and then performing a permute to unzip the data. We can also avoid the need for a complex sequence of DUP + EXT instructions by using TBL to permute the data exactly as we want it. Reduction in runtimes observed compared to the existing Neon implementation: Cortex-A55: =0.0% Cortex-A510: -44.2% Cortex-A520: -47.6% Cortex-A76: -45.8% Cortex-A715: -58.3% Cortex-A720: -58.4% Cortex-X1: -66.7% Cortex-X2: -68.0% Cortex-X3: -67.9% Cortex-X4: -70.0% Change-Id: I8a1d1fe08d8a2ddb0b86d4a44f0d49b69ab03ece Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5683126 Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.