Adjust performance claims

Document the latest benchmarks on the Nexus 5X and change the "2-4x"
overall claim to "2-6x".  The peak performance on x86 platforms was
already closer to 5x, and the addition of SIMD-accelerated Huffman
encoding gave it that extra push over the cliff.
diff --git a/ChangeLog.txt b/ChangeLog.txt
index a0e39d1..16b264d 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -69,21 +69,28 @@
 
 [13] Added SIMD acceleration for Huffman encoding on NEON-capable ARM 32-bit
 platforms.  This speeds up the compression of full-color JPEGs by about 30% on
-average.  For the purposes of benchmarking or regression testing,
-SIMD-accelerated Huffman encoding can be disabled by setting the
-JSIMD_NOHUFFENC environment variable to 1.
+average on a Cortex-A9 core (iPhone 4S) and by about 6-7% on average on
+Cortex-A53 and Cortex-A57 cores.  For the purposes of benchmarking or
+regression testing, SIMD-accelerated Huffman encoding can be disabled by
+setting the JSIMD_NOHUFFENC environment variable to 1.
 
 [14] Added ARM 64-bit (ARMv8) NEON SIMD implementations of the commonly-used
 compression algorithms (including the slow integer forward DCT and h2v2 & h2v1
 downsampling algorithms, which are not accelerated in the 32-bit NEON
-implementation.)  This speeds up the overall 64-bit compression performance by
-about 2x on ARMv8 processors.
+implementation.)  This speeds up the compression of full-color JPEGs by about
+75% on average on a Cavium ThunderX processor and by about 2-2.5x on average on
+Cortex-A53 and Cortex-A57 cores.
 
 [15] pkg-config (.pc) scripts are now included for both the libjpeg and
 TurboJPEG API libraries on Un*x systems.  Note that if a project's build system
 relies on these scripts, then it will not be possible to build that project
 with libjpeg or with a prior version of libjpeg-turbo.
 
+[16] Optimized the ARM 64-bit (ARMv8) NEON SIMD decompression routines to
+improve performance on CPUs with in-order pipelines.  This speeds up the
+decompression of full-color JPEGs by nearly 2x on average on a Cavium ThunderX
+processor and by about 15% on average on a Cortex-A53 core.
+
 
 1.4.2
 =====
diff --git a/README.md b/README.md
index 1d3c3da..ad614ca 100755
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2,
 NEON, AltiVec) to accelerate baseline JPEG compression and decompression on
 x86, x86-64, ARM, and PowerPC systems.  On such systems, libjpeg-turbo is
-generally 2-4x as fast as libjpeg, all else being equal.  On other types of
+generally 2-6x as fast as libjpeg, all else being equal.  On other types of
 systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by
 virtue of its highly-optimized Huffman coding routines.  In many cases, the
 performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
@@ -36,7 +36,7 @@
 libjpeg-turbo includes two APIs that can be used to compress and decompress
 JPEG images:
 
-- **TurboJPEG API**  
+- **TurboJPEG API**
   This API provides an easy-to-use interface for compressing and decompressing
   JPEG images in memory.  It also provides some functionality that would not be
   straightforward to achieve using the underlying libjpeg API, such as
@@ -44,7 +44,7 @@
   transforms on an image.  The Java interface for libjpeg-turbo is written on
   top of the TurboJPEG API.
 
-- **libjpeg API**  
+- **libjpeg API**
   This is the de facto industry-standard API for compressing and decompressing
   JPEG images.  It is more difficult to use than the TurboJPEG API but also
   more powerful.  The libjpeg API implementation in libjpeg-turbo is both
@@ -135,17 +135,17 @@
 
 #### Fully supported
 
-- **libjpeg: IDCT scaling extensions in decompressor**  
+- **libjpeg: IDCT scaling extensions in decompressor**
   libjpeg-turbo supports IDCT scaling with scaling factors of 1/8, 1/4, 3/8,
   1/2, 5/8, 3/4, 7/8, 9/8, 5/4, 11/8, 3/2, 13/8, 7/4, 15/8, and 2/1 (only 1/4
   and 1/2 are SIMD-accelerated.)
 
 - **libjpeg: Arithmetic coding**
 
-- **libjpeg: In-memory source and destination managers**  
+- **libjpeg: In-memory source and destination managers**
   See notes below.
 
-- **cjpeg: Separate quality settings for luminance and chrominance**  
+- **cjpeg: Separate quality settings for luminance and chrominance**
   Note that the libpjeg v7+ API was extended to accommodate this feature only
   for convenience purposes.  It has always been possible to implement this
   feature with libjpeg v6b (see rdswitch.c for an example.)
@@ -174,15 +174,15 @@
 but it is the general belief of our project that these features have not
 demonstrated sufficient usefulness to justify inclusion in libjpeg-turbo.
 
-- **libjpeg: DCT scaling in compressor**  
-  `cinfo.scale_num` and `cinfo.scale_denom` are silently ignored.  
+- **libjpeg: DCT scaling in compressor**
+  `cinfo.scale_num` and `cinfo.scale_denom` are silently ignored.
   There is no technical reason why DCT scaling could not be supported when
   emulating the libjpeg v7+ API/ABI, but without the SmartScale extension (see
   below), only scaling factors of 1/2, 8/15, 4/7, 8/13, 2/3, 8/11, 4/5, and
   8/9 would be available, which is of limited usefulness.
 
-- **libjpeg: SmartScale**  
-  `cinfo.block_size` is silently ignored.  
+- **libjpeg: SmartScale**
+  `cinfo.block_size` is silently ignored.
   SmartScale is an extension to the JPEG format that allows for DCT block
   sizes other than 8x8.  Providing support for this new format would be
   feasible (particularly without full acceleration.)  However, until/unless
@@ -194,15 +194,15 @@
   interest in providing this feature would be as a means of supporting
   additional DCT scaling factors.
 
-- **libjpeg: Fancy downsampling in compressor**  
-  `cinfo.do_fancy_downsampling` is silently ignored.  
+- **libjpeg: Fancy downsampling in compressor**
+  `cinfo.do_fancy_downsampling` is silently ignored.
   This requires the DCT scaling feature, which is not supported.
 
-- **jpegtran: Scaling**  
+- **jpegtran: Scaling**
   This requires both the DCT scaling and SmartScale features, which are not
   supported.
 
-- **Lossless RGB JPEG files**  
+- **Lossless RGB JPEG files**
   This requires the SmartScale feature, which is not supported.
 
 ### What About libjpeg v9?
diff --git a/release/ReadMe.txt b/release/ReadMe.txt
index 2f00e8a..7fb8d0f 100644
--- a/release/ReadMe.txt
+++ b/release/ReadMe.txt
@@ -1,4 +1,4 @@
-libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, and PowerPC systems.  On such systems, libjpeg-turbo is generally 2-4x as fast as libjpeg, all else being equal.  On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines.  In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
+libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, and PowerPC systems.  On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal.  On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines.  In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
 
 libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API.  libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface.
 
diff --git a/release/deb-control.tmpl b/release/deb-control.tmpl
index 1a6242b..681721d 100644
--- a/release/deb-control.tmpl
+++ b/release/deb-control.tmpl
@@ -11,7 +11,7 @@
  libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2,
  NEON, AltiVec) to accelerate baseline JPEG compression and decompression on
  x86, x86-64, ARM, and PowerPC systems.  On such systems, libjpeg-turbo is
- generally 2-4x as fast as libjpeg, all else being equal.  On other types of
+ generally 2-6x as fast as libjpeg, all else being equal.  On other types of
  systems, libjpeg-turbo can still outperform libjpeg by a significant amount,
  by virtue of its highly-optimized Huffman coding routines.  In many cases, the
  performance of libjpeg-turbo rivals that of proprietary high-speed JPEG
diff --git a/release/libjpeg-turbo.spec.in b/release/libjpeg-turbo.spec.in
index 3e0dfa2..74ee300 100644
--- a/release/libjpeg-turbo.spec.in
+++ b/release/libjpeg-turbo.spec.in
@@ -46,7 +46,7 @@
 libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2,
 NEON, AltiVec) to accelerate baseline JPEG compression and decompression on
 x86, x86-64, ARM, and PowerPC systems.  On such systems, libjpeg-turbo is
-generally 2-4x as fast as libjpeg, all else being equal.  On other types of
+generally 2-6x as fast as libjpeg, all else being equal.  On other types of
 systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by
 virtue of its highly-optimized Huffman coding routines.  In many cases, the
 performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.