blob: 7c7d2ca1ab0de0f3424377a67f173e01c0e8fa13 [file] [log] [blame]
Bzip2 is not research work, in the sense that it doesn't present any
new ideas. Rather, it's an engineering exercise based on existing
ideas.
Four documents describe essentially all the ideas behind bzip2:
Michael Burrows and D. J. Wheeler:
"A block-sorting lossless data compression algorithm"
10th May 1994.
Digital SRC Research Report 124.
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
Daniel S. Hirschberg and Debra A. LeLewer
"Efficient Decoding of Prefix Codes"
Communications of the ACM, April 1990, Vol 33, Number 4.
You might be able to get an electronic copy of this
from the ACM Digital Library.
David J. Wheeler
Program bred3.c and accompanying document bred3.ps.
This contains the idea behind the multi-table Huffman
coding scheme.
ftp://ftp.cl.cam.ac.uk/pub/user/djw3/
Jon L. Bentley and Robert Sedgewick
"Fast Algorithms for Sorting and Searching Strings"
Available from Sedgewick's web page,
www.cs.princeton.edu/~rs
The following paper gives valuable additional insights into the
algorithm, but is not immediately the basis of any code
used in bzip2.
Peter Fenwick:
Block Sorting Text Compression
Proceedings of the 19th Australasian Computer Science Conference,
Melbourne, Australia. Jan 31 - Feb 2, 1996.
ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
All three are well written, and make fascinating reading. If you want
to modify bzip2 in any non-trivial way, I strongly suggest you obtain,
read and understand these papers.
I am much indebted to the various authors for their help, support and
advice.