swscale/x86/ops: add AVX2/SSE4 path for SWS_UOP_READ_PALETTE The AVX2 is a fairly straightforward vpgatherdd + 4x4 transpose. The SSE4 fallback is an unrolled scalar loop, for lack of anything better to do. checkasm: - CPU: AMD Ryzen 9 9950X3D 16-Core Processor (00B40F40) - Timing source: x86 (rdtsc) - Bench duration: 10000 µs per function (45898205 cycles) - Random seed: 2518020648 Benchmark results: name cycles (vs ref) u8_read_palette_xyzw_c: 2877.5 u8_read_palette_xyzw_x86_sse4: 1951.9 ( 1.47x) u8_read_palette_xyzw_x86_avx2: 1051.6 ( 2.74x) Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>
FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.
libavcodec provides implementation of a wider range of codecs.libavformat implements streaming protocols, container formats and basic I/O access.libavutil includes hashers, decompressors and miscellaneous utility functions.libavfilter provides means to alter decoded audio and video through a directed graph of connected filters.libavdevice provides an abstraction to access capture and playback devices.libswresample implements audio mixing and resampling routines.libswscale implements color conversion and scaling routines.aviocat, ismindex and qt-faststart.The offline documentation is available in the doc/ directory.
The online documentation is available in the main website and in the wiki.
Coding examples are available in the doc/examples directory.
FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.
Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.