Speed up decompression by removing a fast-path attempt.

Whenever we try to enter a copy fast-path, there is a certain cost in checking
that all the preconditions are in place, but it's normally offset by the fact
that we can usually take the cheaper path. However, in a certain path we've
already established that "avail < literal_length", which usually means that
either the available space is small, or the literal is big. Both will disqualify
us from taking the fast path, and thus we take the hit from the precondition
checking without gaining much from having a fast path. Thus, simply don't try
the fast path in this situation -- we're already on a slow path anyway
(one where we need to refill more data from the reader).

I'm a bit surprised at how much this gained; it could be that this path is
more common than I thought, or that the simpler structure somehow makes the
compiler happier. I haven't looked at the assembler, but it's a win across
the board on both Core 2, Core i7 and Opteron, at least for the cases we
typically care about. The gains seem to be the largest on Core i7, though.
Results from my Core i7 workstation:


  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_UFlat/0              73337      73091     190996 1.3GB/s  html      [ +1.7%]
  BM_UFlat/1             696379     693501      20173 965.5MB/s  urls    [ +2.7%]
  BM_UFlat/2               9765       9734    1472135 12.1GB/s  jpg      [ +0.7%]
  BM_UFlat/3              29720      29621     472973 3.0GB/s  pdf       [ +1.8%]
  BM_UFlat/4             294636     293834      47782 1.3GB/s  html4     [ +2.3%]
  BM_UFlat/5              28399      28320     494700 828.5MB/s  cp      [ +3.5%]
  BM_UFlat/6              12795      12760    1000000 833.3MB/s  c       [ +1.2%]
  BM_UFlat/7               3984       3973    3526448 893.2MB/s  lsp     [ +5.7%]
  BM_UFlat/8             991996     989322      14141 992.6MB/s  xls     [ +3.3%]
  BM_UFlat/9             228620     227835      61404 636.6MB/s  txt1    [ +4.0%]
  BM_UFlat/10            197114     196494      72165 607.5MB/s  txt2    [ +3.5%]
  BM_UFlat/11            605240     603437      23217 674.4MB/s  txt3    [ +3.7%]
  BM_UFlat/12            804157     802016      17456 573.0MB/s  txt4    [ +3.9%]
  BM_UFlat/13            347860     346998      40346 1.4GB/s  bin       [ +1.2%]
  BM_UFlat/14             44684      44559     315315 818.4MB/s  sum     [ +2.3%]
  BM_UFlat/15              5120       5106    2739726 789.4MB/s  man     [ +3.3%]
  BM_UFlat/16             76591      76355     183486 1.4GB/s  pb        [ +2.8%]
  BM_UFlat/17            238564     237828      58824 739.1MB/s  gaviota [ +1.6%]
  BM_UValidate/0          42194      42060     333333 2.3GB/s  html      [ -0.1%]
  BM_UValidate/1         433182     432005      32407 1.5GB/s  urls      [ -0.1%]
  BM_UValidate/2            197        196   71428571 603.3GB/s  jpg     [ +0.5%]
  BM_UValidate/3          14494      14462     972222 6.1GB/s  pdf       [ +0.5%]
  BM_UValidate/4         168444     167836      83832 2.3GB/s  html4     [ +0.1%]
	
R=jeff

Revision created by MOE tool push_codebase.


git-svn-id: https://snappy.googlecode.com/svn/trunk@42 03e5f5b5-db94-4691-08a0-1a8bf15f6143
1 file changed