commit | 23a6a412e5d3d10c3bbd79b147c1eab4d284bc77 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Wed May 15 21:30:25 2024 +0100 |
committer | Frank Barchard <fbarchard@chromium.org> | Mon Sep 16 15:37:27 2024 +0000 |
tree | fd7ecddc12795171aaa2ff817161be0124ec6f1f | |
parent | d5303f4f779a6baf66bbac8a97937e507ffaaaba [diff] |
[AArch64] Unroll and use TBL in ScaleRowDown34_NEON ST3 is known to be slow on a number of modern micro-architectures. By unrolling the code we are able to use TBL to shuffle elements into the correct indices without needing to use LD4 and ST3, giving a good improvement in performance across the board. Reduction in runtimes observed compared to the existing Neon implementation: Cortex-A55: -14.4% Cortex-A510: -66.0% Cortex-A520: -50.8% Cortex-A76: -60.5% Cortex-A715: -63.9% Cortex-A720: -64.2% Cortex-X1: -74.3% Cortex-X2: -75.4% Cortex-X3: -75.5% Cortex-X4: -48.1% Bug: b/42280945 Change-Id: Ia1efb03af2d6ec00bc5a4b72168963fede9f0c83 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5785971 Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.