This directory contains the libjpeg-turbo SIMD extensions, hand-coded assembly or compiler intrinsics modules that plug into the modular interfaces of the libjpeg API library and use various SIMD instruction sets to accelerate most of the 8-bit-per-sample lossy JPEG compression and decompression algorithms.
(Note that, since the TurboJPEG API library wraps the libjpeg API library, it uses the libjpeg-turbo SIMD extensions implicitly.)
The following 8-bit-per-sample lossy JPEG compression algorithms are currently implemented as SIMD modules:
The following 8-bit-per-sample lossy JPEG decompression algorithms are currently implemented as SIMD modules:
Refer to https://libjpeg-turbo.org/About/SIMDCoverage for a list of SIMD modules that are implemented for the algorithms above using specific SIMD instruction sets.
Legacy features are features that were designed to work around hardware performance limitations that no longer exist. They generally have little or no utility on modern hardware and are retained only for backward compatibility with libjpeg.
Infrequently used features may be useful for specific applications but are not the “common case” for JPEG compression and decompression.
When initializing a particular compression or decompression module (which occurs during jpeg_start_compress(), jpeg_start_decompress(), tj3Compress*(), or tj3Decompress*()), the libjpeg API library calls a SIMD dispatcher function (jsimd_set_*()) for each algorithm listed above. The SIMD dispatcher functions are defined in jsimd.h, jsimddct.h, and jsimd.c. Each function
You can use environment variables to override the dispatchers' choice of SIMD modules:
JSIMD_FORCENONE=1 disables all SIMD modules.JSIMD_NOHUFFENC=1 disables only the Huffman encoding SIMD modules.JSIMD_FORCESSE2=1 (x86) enables only the SSE2 and SSE SIMD modules, even if the CPU supports newer instruction sets.JSIMD_FORCESSE=1 (i386) enables only the SSE and MMX SIMD modules, even if the CPU supports newer instruction sets.JSIMD_FORCEMMX=1 (i386) enables only the MMX SIMD modules, even if the CPU supports newer instruction sets.JSIMD_FORCENEON=1 (AArch32) force-enables the Neon SIMD modules, bypassing /proc/cpuinfo feature detection (which may be unreliable in QEMU and other emulation/virtualization environments.)JSIMD_FORCEDSPR2=1 (MIPS) force-enables the DSPr2 SIMD modules, bypassing /proc/cpuinfo feature detection (which may be unreliable in QEMU and other emulation/virtualization environments.)JSIMD_FORCEMMI=1 (Loongson) force-enables the MMI SIMD modules, bypassing /proc/cpuinfo feature detection (which may be unreliable in QEMU and other emulation/virtualization environments.)The simdcoverage program reports which SIMD modules will be used, taking into account the current architecture, detected CPU features, and aforementioned overrides.
When built with the WITH_PROFILE CMake variable enabled, the libjpeg API library measures the average throughput of each lossy JPEG algorithm as an image is compressed or decompressed, and it prints the results to the command line when jpeg_destroy_compress(), jpeg_destroy_decompress(), or tj3Destroy() is called. This allows developers to easily study the performance of each SIMD module in isolation and compare it to the performance of the corresponding scalar/C module (or a previous implementation of the same SIMD module.)
The most effective way to use the profiling feature is with the TJBench application and a suitably large image, such as one of the 8-bit RGB images from imagecompression.info. Use one of the following command lines to obtain the performance of specific algorithms. (Adjust the warmup and benchmark times to suit your needs.)
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp gray
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 420
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 440
tjbench {image}.ppm 95 -cmyk -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -nosmooth
tjbench {image}.ppm 95 -cmyk -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 420 -nosmooth
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 440 -nosmooth
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -nosmooth
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 420 -nosmooth
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -progressive
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -scale 1/4
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -scale 1/2
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -scale 3/4
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -scale 3/2
tjbench {image}.ppm 95 -rgb -quiet -nowrite -benchtime 10 -warmup 10 -subsamp 422 -dct fast