commit | 775fd92e599ae9f758cce06e4573ef7145a94f1f | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Tue Sep 17 13:41:08 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Mon Oct 28 17:04:22 2024 +0000 |
tree | d4baf9154196e12154f4abd89eff47c81753978f | |
parent | 0bce5120f674b55d23c2a481aab8efd8d1972848 [diff] |
[AArch64] Optimize ScaleRowDown38_3_Box_NEON Replace LD4 and TRN instructions with LD1s and TBL since LD4 is known to be slow on some micro-architectures, and remove other unnecessary permutes. Reduction in run times: Cortex-A55: -24.8% Cortex-A510: -32.7% Cortex-A520: -37.7% Cortex-A76: -51.8% Cortex-A715: -58.9% Cortex-A720: -58.9% Cortex-X1: -54.8% Cortex-X2: -50.3% Cortex-X3: -57.1% Cortex-X4: -49.8% Cortex-X925: -52.0% Co-authored-by: Cosmina Dunca <cosmina.dunca@arm.com> Bug: b/42280945 Change-Id: Ie96bac30fffbe41f8d1501ee289795830ab127e5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5872803 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.