| |
| Bzip2 is not research work, in the sense that it doesn't present any |
| new ideas. Rather, it's an engineering exercise based on existing |
| ideas. |
| |
| Four documents describe essentially all the ideas behind bzip2: |
| |
| Michael Burrows and D. J. Wheeler: |
| "A block-sorting lossless data compression algorithm" |
| 10th May 1994. |
| Digital SRC Research Report 124. |
| ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz |
| |
| Daniel S. Hirschberg and Debra A. LeLewer |
| "Efficient Decoding of Prefix Codes" |
| Communications of the ACM, April 1990, Vol 33, Number 4. |
| You might be able to get an electronic copy of this |
| from the ACM Digital Library. |
| |
| David J. Wheeler |
| Program bred3.c and accompanying document bred3.ps. |
| This contains the idea behind the multi-table Huffman |
| coding scheme. |
| ftp://ftp.cl.cam.ac.uk/pub/user/djw3/ |
| |
| Jon L. Bentley and Robert Sedgewick |
| "Fast Algorithms for Sorting and Searching Strings" |
| Available from Sedgewick's web page, |
| www.cs.princeton.edu/~rs |
| |
| The following paper gives valuable additional insights into the |
| algorithm, but is not immediately the basis of any code |
| used in bzip2. |
| |
| Peter Fenwick: |
| Block Sorting Text Compression |
| Proceedings of the 19th Australasian Computer Science Conference, |
| Melbourne, Australia. Jan 31 - Feb 2, 1996. |
| ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps |
| |
| All three are well written, and make fascinating reading. If you want |
| to modify bzip2 in any non-trivial way, I strongly suggest you obtain, |
| read and understand these papers. |
| |
| I am much indebted to the various authors for their help, support and |
| advice. |
| |