Enable the use of unaligned loads and stores for ARM-based architectures 
where they are available (ARMv7 and higher). This gives a significant 
speed boost on ARM, both for compression and decompression. 
It should not affect x86 at all. 
 
There are more changes possible to speed up ARM, but it might not be 
that easy to do without hurting x86 or making the code uglier. 
Also, we de not try to use NEON yet. 
 
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro), 
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork: 
 
Benchmark            Time(ns)    CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0             524806     529100        378 184.6MB/s  html            [+33.6%]
BM_UFlat/1            5139790    5200000        100 128.8MB/s  urls            [+28.8%]
BM_UFlat/2              86540      84166       1901 1.4GB/s  jpg               [ +0.6%]
BM_UFlat/3             215351     210176        904 428.0MB/s  pdf             [+29.8%]
BM_UFlat/4            2144490    2100000        100 186.0MB/s  html4           [+33.3%]
BM_UFlat/5             194482     190000       1000 123.5MB/s  cp              [+36.2%]
BM_UFlat/6              91843      90175       2107 117.9MB/s  c               [+38.6%]
BM_UFlat/7              28535      28426       6684 124.8MB/s  lsp             [+34.7%]
BM_UFlat/8            9206600    9200000        100 106.7MB/s  xls             [+42.4%]
BM_UFlat/9            1865273    1886792        106 76.9MB/s  txt1             [+32.5%]
BM_UFlat/10           1576809    1587301        126 75.2MB/s  txt2             [+32.3%]
BM_UFlat/11           4968450    4900000        100 83.1MB/s  txt3             [+32.7%]
BM_UFlat/12           6673970    6700000        100 68.6MB/s  txt4             [+32.8%]
BM_UFlat/13           2391470    2400000        100 203.9MB/s  bin             [+29.2%]
BM_UFlat/14            334601     344827        522 105.8MB/s  sum             [+30.6%]
BM_UFlat/15             37404      38080       5252 105.9MB/s  man             [+33.8%]
BM_UFlat/16            535470     540540        370 209.2MB/s  pb              [+31.2%]
BM_UFlat/17           1875245    1886792        106 93.2MB/s  gaviota          [+37.8%]
BM_UValidate/0         178425     179533       1114 543.9MB/s  html            [ +2.7%]
BM_UValidate/1        2100450    2000000        100 334.8MB/s  urls            [ +5.0%]
BM_UValidate/2           1039       1044     172413 113.3GB/s  jpg             [ +3.4%]
BM_UValidate/3          59423      59470       3363 1.5GB/s  pdf               [ +7.8%]
BM_UValidate/4         760716     766283        261 509.8MB/s  html4           [ +6.5%]
BM_ZFlat/0            1204632    1204819        166 81.1MB/s  html (23.57 %)   [+32.8%]
BM_ZFlat/1           15656190   15600000        100 42.9MB/s  urls (50.89 %)   [+27.6%]
BM_ZFlat/2             403336     410677        487 294.8MB/s  jpg (99.88 %)   [+16.5%]
BM_ZFlat/3             664073     671140        298 134.0MB/s  pdf (82.13 %)   [+28.4%]
BM_ZFlat/4            4961940    4900000        100 79.7MB/s  html4 (23.55 %)  [+30.6%]
BM_ZFlat/5             500664     501253        399 46.8MB/s  cp (48.12 %)     [+33.4%]
BM_ZFlat/6             217276     215982        926 49.2MB/s  c (42.40 %)      [+25.0%]
BM_ZFlat/7              64122      65487       3054 54.2MB/s  lsp (48.37 %)    [+36.1%]
BM_ZFlat/8           18045730   18000000        100 54.6MB/s  xls (41.34 %)    [+34.4%]
BM_ZFlat/9            4051530    4000000        100 36.3MB/s  txt1 (59.81 %)   [+25.0%]
BM_ZFlat/10           3451800    3500000        100 34.1MB/s  txt2 (64.07 %)   [+25.7%]
BM_ZFlat/11          11052340   11100000        100 36.7MB/s  txt3 (57.11 %)   [+24.3%]
BM_ZFlat/12          14538690   14600000        100 31.5MB/s  txt4 (68.35 %)   [+24.7%]
BM_ZFlat/13           5041850    5000000        100 97.9MB/s  bin (18.21 %)    [+32.0%]
BM_ZFlat/14            908840     909090        220 40.1MB/s  sum (51.88 %)    [+22.2%]
BM_ZFlat/15             86921      86206       1972 46.8MB/s  man (59.36 %)    [+42.2%]
BM_ZFlat/16           1312315    1315789        152 86.0MB/s  pb (23.15 %)     [+34.5%]
BM_ZFlat/17           3173120    3200000        100 54.9MB/s  gaviota (38.27%) [+28.1%]


The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):

Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0                86279      86140       7778 1.1GB/s  html             [ +7.5%]
BM_UFlat/1               839265     822622        778 813.9MB/s  urls           [ +9.4%]
BM_UFlat/2                 9180       9143      87500 12.9GB/s  jpg             [ +1.2%]
BM_UFlat/3                35080      35000      20000 2.5GB/s  pdf              [+10.1%]
BM_UFlat/4               350318     345000       2000 1.1GB/s  html4            [ +7.0%]
BM_UFlat/5                33808      33472      21212 701.0MB/s  cp             [ +9.0%]
BM_UFlat/6                15201      15214      46667 698.9MB/s  c              [+14.9%]
BM_UFlat/7                 4652       4651     159091 762.9MB/s  lsp            [ +7.5%]
BM_UFlat/8              1285551    1282528        538 765.7MB/s  xls            [+10.7%]
BM_UFlat/9               282510     281690       2414 514.9MB/s  txt1           [+13.6%]
BM_UFlat/10              243494     239286       2800 498.9MB/s  txt2           [+14.4%]
BM_UFlat/11              743625     740000       1000 550.0MB/s  txt3           [+14.3%]
BM_UFlat/12              999441     989717        778 464.3MB/s  txt4           [+16.1%]
BM_UFlat/13              412402     410076       1707 1.2GB/s  bin              [ +7.3%]
BM_UFlat/14               54876      54000      10000 675.3MB/s  sum            [+13.0%]
BM_UFlat/15                6146       6100     100000 660.8MB/s  man            [+14.8%]
BM_UFlat/16               90496      90286       8750 1.2GB/s  pb               [ +4.0%]
BM_UFlat/17              292650     292000       2500 602.0MB/s  gaviota        [+18.1%]
BM_UValidate/0            49620      49699      14286 1.9GB/s  html             [ +0.0%]
BM_UValidate/1           501371     500000       1000 1.3GB/s  urls             [ +0.0%]
BM_UValidate/2              232        227    3043478 521.5GB/s  jpg            [ +1.3%]
BM_UValidate/3            17250      17143      43750 5.1GB/s  pdf              [ -1.3%]
BM_UValidate/4           198643     200000       3500 1.9GB/s  html4            [ -0.9%]
BM_ZFlat/0               227128     229415       3182 425.7MB/s  html (23.57 %) [ -1.4%]
BM_ZFlat/1              2970089    2960000        250 226.2MB/s  urls (50.89 %) [ -1.9%]
BM_ZFlat/2                45683      44999      15556 2.6GB/s  jpg (99.88 %)    [ +2.2%]
BM_ZFlat/3               114661     113136       6364 795.1MB/s  pdf (82.13 %)  [ -1.5%]
BM_ZFlat/4               919702     914286        875 427.2MB/s  html4 (23.55%) [ -1.3%]
BM_ZFlat/5               108189     108422       6364 216.4MB/s  cp (48.12 %)   [ -1.2%]
BM_ZFlat/6                44525      44000      15909 241.7MB/s  c (42.40 %)    [ -2.9%]
BM_ZFlat/7                15973      15857      46667 223.8MB/s  lsp (48.37 %)  [ +0.0%]
BM_ZFlat/8              2677888    2639405        269 372.1MB/s  xls (41.34 %)  [ -1.4%]
BM_ZFlat/9               800715     780000       1000 186.0MB/s  txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10              700089     700000       1000 170.5MB/s  txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11             2159356    2138365        318 190.3MB/s  txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12             2796143    2779923        259 165.3MB/s  txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13              856458     835476        778 585.8MB/s  bin (18.21 %)  [ -0.1%]
BM_ZFlat/14              166908     166857       4375 218.6MB/s  sum (51.88 %)  [ -1.4%]
BM_ZFlat/15               21181      20857      35000 193.3MB/s  man (59.36 %)  [ -0.8%]
BM_ZFlat/16              244009     239973       2917 471.3MB/s  pb (23.15 %)   [ -1.4%]
BM_ZFlat/17              596362     590000       1000 297.9MB/s  gaviota (38.27%) [ +0.0%]

R=sanjay


git-svn-id: https://snappy.googlecode.com/svn/trunk@59 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2 files changed