Speed up decompression by removing a fast-path attempt.
Whenever we try to enter a copy fast-path, there is a certain cost in checking
that all the preconditions are in place, but it's normally offset by the fact
that we can usually take the cheaper path. However, in a certain path we've
already established that "avail < literal_length", which usually means that
either the available space is small, or the literal is big. Both will disqualify
us from taking the fast path, and thus we take the hit from the precondition
checking without gaining much from having a fast path. Thus, simply don't try
the fast path in this situation -- we're already on a slow path anyway
(one where we need to refill more data from the reader).
I'm a bit surprised at how much this gained; it could be that this path is
more common than I thought, or that the simpler structure somehow makes the
compiler happier. I haven't looked at the assembler, but it's a win across
the board on both Core 2, Core i7 and Opteron, at least for the cases we
typically care about. The gains seem to be the largest on Core i7, though.
Results from my Core i7 workstation:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 73337 73091 190996 1.3GB/s html [ +1.7%]
BM_UFlat/1 696379 693501 20173 965.5MB/s urls [ +2.7%]
BM_UFlat/2 9765 9734 1472135 12.1GB/s jpg [ +0.7%]
BM_UFlat/3 29720 29621 472973 3.0GB/s pdf [ +1.8%]
BM_UFlat/4 294636 293834 47782 1.3GB/s html4 [ +2.3%]
BM_UFlat/5 28399 28320 494700 828.5MB/s cp [ +3.5%]
BM_UFlat/6 12795 12760 1000000 833.3MB/s c [ +1.2%]
BM_UFlat/7 3984 3973 3526448 893.2MB/s lsp [ +5.7%]
BM_UFlat/8 991996 989322 14141 992.6MB/s xls [ +3.3%]
BM_UFlat/9 228620 227835 61404 636.6MB/s txt1 [ +4.0%]
BM_UFlat/10 197114 196494 72165 607.5MB/s txt2 [ +3.5%]
BM_UFlat/11 605240 603437 23217 674.4MB/s txt3 [ +3.7%]
BM_UFlat/12 804157 802016 17456 573.0MB/s txt4 [ +3.9%]
BM_UFlat/13 347860 346998 40346 1.4GB/s bin [ +1.2%]
BM_UFlat/14 44684 44559 315315 818.4MB/s sum [ +2.3%]
BM_UFlat/15 5120 5106 2739726 789.4MB/s man [ +3.3%]
BM_UFlat/16 76591 76355 183486 1.4GB/s pb [ +2.8%]
BM_UFlat/17 238564 237828 58824 739.1MB/s gaviota [ +1.6%]
BM_UValidate/0 42194 42060 333333 2.3GB/s html [ -0.1%]
BM_UValidate/1 433182 432005 32407 1.5GB/s urls [ -0.1%]
BM_UValidate/2 197 196 71428571 603.3GB/s jpg [ +0.5%]
BM_UValidate/3 14494 14462 972222 6.1GB/s pdf [ +0.5%]
BM_UValidate/4 168444 167836 83832 2.3GB/s html4 [ +0.1%]
R=jeff
Revision created by MOE tool push_codebase.
git-svn-id: https://snappy.googlecode.com/svn/trunk@42 03e5f5b5-db94-4691-08a0-1a8bf15f6143
1 file changed