commit | 0bce5120f674b55d23c2a481aab8efd8d1972848 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Tue Sep 17 13:41:02 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Mon Oct 28 17:03:54 2024 +0000 |
tree | 58db15f4659743c9a813df73bfc1daabfc778581 | |
parent | 22ac86800edc2b6419a815a300e6d63d0708a35b [diff] |
[AArch64] Optimize ScaleRowDown38_2_Box_NEON Replace LD4 and TRN instructions with LD1s and TBL since LD4 is known to be slow on some micro-architectures, and remove other unnecessary permutes. Reduction in run times: Cortex-A55: -17.9% Cortex-A510: -28.7% Cortex-A520: -31.8% Cortex-A76: -40.8% Cortex-A715: -46.1% Cortex-A720: -46.1% Cortex-X1: -44.3% Cortex-X2: -40.1% Cortex-X3: -46.3% Cortex-X4: -40.2% Cortex-X925: -42.3% Co-authored-by: Cosmina Dunca <cosmina.dunca@arm.com> Bug: b/42280945 Change-Id: I84e2cd04912fc11d59b4407a1836f047b74a4c92 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5872802 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.