Complete the ARM64 NEON SIMD implementation

This adds 64-bit NEON coverage for all of the algorithms that are
covered by the 32-bit NEON implementation, except for h2v1 (4:2:2) fancy
upsampling (used when decompressing 4:2:2 JPEG images.)  It also adds
64-bit NEON SIMD coverage for:

* slow integer forward DCT (compressor)
* h2v2 (4:2:0) downsampling (compressor)
* h2v1 (4:2:2) downsampling (compressor)

which are not covered in the 32-bit implementation.

Compression speedups relative to libjpeg-turbo 1.4.2:
Apple A7 (iPhone 5S), iOS, 64-bit: 113-150% (reported)
48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 2.1-33% (avg. 15%)

Refer to #44 and #49 for discussion

This commit also removes the unnecessary

    if (simd_support & JSIMD_ARM_NEON)

statements from the jsimd* algorithm functions.  Since the jsimd_can*()
functions check for the existence of NEON, the corresponding algorithm
functions will never be called if NEON isn't available.

Based on:
https://github.com/mayeut/libjpeg-turbo/commit/dcd9d84f10fae192c0e3935818dc289bca9c3e29
https://github.com/mayeut/libjpeg-turbo/commit/b0d87b811f37bd560083deea8c6e7d704e5cd944
https://github.com/mayeut/libjpeg-turbo/commit/70cd5c8a493a67f4d54dd2067ae6dedb65d95389
https://github.com/mayeut/libjpeg-turbo/commit/3e58d9a064648503c57ec2650ee79880f749a52b
https://github.com/mayeut/libjpeg-turbo/commit/837b19542f53fa81af83e6ba002d559877aaf597
https://github.com/mayeut/libjpeg-turbo/commit/73dc43ccc870c2e10ba893e9764b8e48d6836585
https://github.com/mayeut/libjpeg-turbo/commit/a82b71a261b4c0213f558baf4bc745f1c27356d8
https://github.com/mayeut/libjpeg-turbo/commit/c1b1188c2106d6ea7b76644b6023b57edeb602e1
https://github.com/mayeut/libjpeg-turbo/commit/305c89284e1bb222b34fbc7261f697a0cc452a41
https://github.com/mayeut/libjpeg-turbo/commit/7f443f99950b4d7d442b9b879648eca5273209bd
https://github.com/mayeut/libjpeg-turbo/commit/4c2b53b77da5a20e30e2aadaeddb0efbfe24e06d

Unified version with fixes:
https://github.com/mayeut/libjpeg-turbo/commit/1004a3cd05870612a194b410efeaa1b4da76d246
4 files changed
tree: 6e9a8c91e2a336d54792f9bf63fe17fd2b6db901
  1. cmakescripts/
  2. doc/
  3. java/
  4. md5/
  5. release/
  6. sharedlib/
  7. simd/
  8. testimages/
  9. win/
  10. .gitignore
  11. acinclude.m4
  12. bmp.c
  13. bmp.h
  14. BUILDING.md
  15. cderror.h
  16. cdjpeg.c
  17. cdjpeg.h
  18. change.log
  19. ChangeLog.txt
  20. cjpeg.1
  21. cjpeg.c
  22. CMakeLists.txt
  23. coderules.txt
  24. configure.ac
  25. djpeg.1
  26. djpeg.c
  27. doxygen-extra.css
  28. doxygen.config
  29. example.c
  30. jaricom.c
  31. jcapimin.c
  32. jcapistd.c
  33. jcarith.c
  34. jccoefct.c
  35. jccolext.c
  36. jccolor.c
  37. jcdctmgr.c
  38. jchuff.c
  39. jchuff.h
  40. jcinit.c
  41. jcmainct.c
  42. jcmarker.c
  43. jcmaster.c
  44. jcomapi.c
  45. jconfig.h.in
  46. jconfig.txt
  47. jconfigint.h.in
  48. jcparam.c
  49. jcphuff.c
  50. jcprepct.c
  51. jcsample.c
  52. jcstest.c
  53. jctrans.c
  54. jdapimin.c
  55. jdapistd.c
  56. jdarith.c
  57. jdatadst-tj.c
  58. jdatadst.c
  59. jdatasrc-tj.c
  60. jdatasrc.c
  61. jdcoefct.c
  62. jdcoefct.h
  63. jdcol565.c
  64. jdcolext.c
  65. jdcolor.c
  66. jdct.h
  67. jddctmgr.c
  68. jdhuff.c
  69. jdhuff.h
  70. jdinput.c
  71. jdmainct.c
  72. jdmainct.h
  73. jdmarker.c
  74. jdmaster.c
  75. jdmerge.c
  76. jdmrg565.c
  77. jdmrgext.c
  78. jdphuff.c
  79. jdpostct.c
  80. jdsample.c
  81. jdsample.h
  82. jdtrans.c
  83. jerror.c
  84. jerror.h
  85. jfdctflt.c
  86. jfdctfst.c
  87. jfdctint.c
  88. jidctflt.c
  89. jidctfst.c
  90. jidctint.c
  91. jidctred.c
  92. jinclude.h
  93. jmemmgr.c
  94. jmemnobs.c
  95. jmemsys.h
  96. jmorecfg.h
  97. jpeg_nbits_table.h
  98. jpegcomp.h
  99. jpegint.h
  100. jpeglib.h
  101. jpegtran.1
  102. jpegtran.c
  103. jquant1.c
  104. jquant2.c
  105. jsimd.h
  106. jsimd_none.c
  107. jsimddct.h
  108. jstdhuff.c
  109. jutils.c
  110. jversion.h
  111. libjpeg.map.in
  112. libjpeg.txt
  113. LICENSE.md
  114. Makefile.am
  115. rdbmp.c
  116. rdcolmap.c
  117. rdgif.c
  118. rdjpgcom.1
  119. rdjpgcom.c
  120. rdppm.c
  121. rdrle.c
  122. rdswitch.c
  123. rdtarga.c
  124. README.ijg
  125. README.md
  126. structure.txt
  127. tjbench.c
  128. tjbenchtest.in
  129. tjbenchtest.java.in
  130. tjexampletest.in
  131. tjunittest.c
  132. tjutil.c
  133. tjutil.h
  134. transupp.c
  135. transupp.h
  136. turbojpeg-jni.c
  137. turbojpeg-mapfile
  138. turbojpeg-mapfile.jni
  139. turbojpeg.c
  140. turbojpeg.h
  141. usage.txt
  142. wizard.txt
  143. wrbmp.c
  144. wrgif.c
  145. wrjpgcom.1
  146. wrjpgcom.c
  147. wrppm.c
  148. wrrle.c
  149. wrtarga.c