| |
| GREETINGS! |
| |
| This is the README for bzip2, my block-sorting file compressor, |
| version 0.1. |
| |
| bzip2 is distributed under the GNU General Public License version 2; |
| for details, see the file LICENSE. Pointers to the algorithms used |
| are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted. |
| |
| Please read all of this file carefully. |
| |
| |
| |
| HOW TO BUILD |
| |
| -- for UNIX: |
| |
| Type `make'. (tough, huh? :-) |
| |
| This creates binaries "bzip2", and "bunzip2", |
| which is a symbolic link to "bzip2". |
| |
| It also runs four compress-decompress tests to make sure |
| things are working properly. If all goes well, you should be up & |
| running. Please be sure to read the output from `make' |
| just to be sure that the tests went ok. |
| |
| To install bzip2 properly: |
| |
| -- Copy the binary "bzip2" to a publically visible place, |
| possibly /usr/bin, /usr/common/bin or /usr/local/bin. |
| |
| -- In that directory, make "bunzip2" be a symbolic link |
| to "bzip2". |
| |
| -- Copy the manual page, bzip2.1, to the relevant place. |
| Probably the right place is /usr/man/man1/. |
| |
| -- for Windows 95 and NT: |
| |
| For a start, do you *really* want to recompile bzip2? |
| The standard distribution includes a pre-compiled version |
| for Windows 95 and NT, `bzip2.exe'. |
| |
| This executable was created with Jacob Navia's excellent |
| port to Win32 of Chris Fraser & David Hanson's excellent |
| ANSI C compiler, "lcc". You can get to it at the pages |
| of the CS department of Princeton University, |
| www.cs.princeton.edu. |
| I have not tried to compile this version of bzip2 with |
| a commercial C compiler such as MS Visual C, as I don't |
| have one available. |
| |
| Note that lcc is designed primarily to be portable and |
| fast. Code quality is a secondary aim, so bzip2.exe |
| runs perhaps 40% slower than it could if compiled with |
| a good optimising compiler. |
| |
| I compiled a previous version of bzip (0.21) with Borland |
| C 5.0, which worked fine, and with MS VC++ 2.0, which |
| didn't. Here is an comment from the README for bzip-0.21. |
| |
| MS VC++ 2.0's optimising compiler has a bug which, at |
| maximum optimisation, gives an executable which produces |
| garbage compressed files. Proceed with caution. |
| I do not know whether or not this happens with later |
| versions of VC++. |
| |
| Edit the defines starting at line 86 of bzip.c to |
| select your platform/compiler combination, and then compile. |
| Then check that the resulting executable (assumed to be |
| called bzip.exe) works correctly, using the SELFTEST.BAT file. |
| Bearing in mind the previous paragraph, the self-test is |
| important. |
| |
| Note that the defines which bzip-0.21 had, to support |
| compilation with VC 2.0 and BC 5.0, are gone. Windows |
| is not my preferred operating system, and I am, for the |
| moment, content with the modestly fast executable created |
| by lcc-win32. |
| |
| A manual page is supplied, unformatted (bzip2.1), |
| preformatted (bzip2.1.preformatted), and preformatted |
| and sanitised for MS-DOS (bzip2.txt). |
| |
| |
| |
| COMPILATION NOTES |
| |
| bzip2 should work on any 32 or 64-bit machine. It is known to work |
| [meaning: it has compiled and passed self-tests] on the |
| following platform-os combinations: |
| |
| Intel i386/i486 running Linux 2.0.21 |
| Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5 |
| Intel i386/i486 running Windows 95 and NT |
| DEC Alpha running Digital Unix 4.0 |
| |
| Following the release of bzip-0.21, many people mailed me |
| from around the world to say they had made it work on all sorts |
| of weird and wonderful machines. Chances are, if you have |
| a reasonable ANSI C compiler and a 32-bit machine, you can |
| get it to work. |
| |
| The #defines starting at around line 82 of bzip2.c supply some |
| degree of platform-independance. If you configure bzip2 for some |
| new far-out platform which is not covered by the existing definitions, |
| please send me the relevant definitions. |
| |
| I recommend GNU C for compilation. The code is standard ANSI C, |
| except for the Unix-specific file handling, so any ANSI C compiler |
| should work. Note however that the many routines marked INLINE |
| should be inlined by your compiler, else performance will be very |
| poor. Asking your compiler to unroll loops gives some |
| small improvement too; for gcc, the relevant flag is |
| -funroll-loops. |
| |
| On a 386/486 machines, I'd recommend giving gcc the |
| -fomit-frame-pointer flag; this liberates another register for |
| allocation, which measurably improves performance. |
| |
| I used the abovementioned lcc compiler to develop bzip2. |
| I would highly recommend this compiler for day-to-day development; |
| it is fast, reliable, lightweight, has an excellent profiler, |
| and is generally excellent. And it's fun to retarget, if you're |
| into that kind of thing. |
| |
| If you compile bzip2 on a new platform or with a new compiler, |
| please be sure to run the four compress-decompress tests, either |
| using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2) |
| files. Some compilers have been seen to introduce subtle bugs |
| when optimising, so this check is important. Ideally you should |
| then go on to test bzip2 on a file several megabytes or even |
| tens of megabytes long, just to be 110% sure. ``Professional |
| programmers are paranoid programmers.'' (anon). |
| |
| |
| |
| VALIDATION |
| |
| Correct operation, in the sense that a compressed file can always be |
| decompressed to reproduce the original, is obviously of paramount |
| importance. To validate bzip2, I used a modified version of |
| Mark Nelson's churn program. Churn is an automated test driver |
| which recursively traverses a directory structure, using bzip2 to |
| compress and then decompress each file it encounters, and checking |
| that the decompressed data is the same as the original. As test |
| material, I used several runs over several filesystems of differing |
| sizes. |
| |
| One set of tests was done on my base Linux filesystem, |
| 410 megabytes in 23,000 files. There were several runs over |
| this filesystem, in various configurations designed to break bzip2. |
| That filesystem also contained some specially constructed test |
| files designed to exercise boundary cases in the code. |
| This included files of zero length, various long, highly repetitive |
| files, and some files which generate blocks with all values the same. |
| |
| The other set of tests was done just with the "normal" configuration, |
| but on a much larger quantity of data. |
| |
| Tests are: |
| |
| Linux FS, 410M, 23000 files |
| |
| As above, with --repetitive-fast |
| |
| As above, with -1 |
| |
| Low level disk image of a disk containing |
| Windows NT4.0; 420M in a single huge file |
| |
| Linux distribution, incl Slackware, |
| all GNU sources. 1900M in 2300 files. |
| |
| Approx ~100M compiler sources and related |
| programming tools, running under Purify. |
| |
| About 500M of data in 120 files of around |
| 4 M each. This is raw data from a |
| biomagnetometer (SQUID-based thing). |
| |
| Overall, total volume of test data is about |
| 3300 megabytes in 25000 files. |
| |
| The distribution does four tests after building bzip. These tests |
| include test decompressions of pre-supplied compressed files, so |
| they not only test that bzip works correctly on the machine it was |
| built on, but can also decompress files compressed on a different |
| machine. This guards against unforseen interoperability problems. |
| |
| |
| Please read and be aware of the following: |
| |
| WARNING: |
| |
| This program (attempts to) compress data by performing several |
| non-trivial transformations on it. Unless you are 100% familiar |
| with *all* the algorithms contained herein, and with the |
| consequences of modifying them, you should NOT meddle with the |
| compression or decompression machinery. Incorrect changes can and |
| very likely *will* lead to disastrous loss of data. |
| |
| |
| DISCLAIMER: |
| |
| I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE |
| USE OF THIS PROGRAM, HOWSOEVER CAUSED. |
| |
| Every compression of a file implies an assumption that the |
| compressed file can be decompressed to reproduce the original. |
| Great efforts in design, coding and testing have been made to |
| ensure that this program works correctly. However, the complexity |
| of the algorithms, and, in particular, the presence of various |
| special cases in the code which occur with very low but non-zero |
| probability make it impossible to rule out the possibility of bugs |
| remaining in the program. DO NOT COMPRESS ANY DATA WITH THIS |
| PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER |
| SMALL, THAT THE DATA WILL NOT BE RECOVERABLE. |
| |
| That is not to say this program is inherently unreliable. Indeed, |
| I very much hope the opposite is true. bzip2 has been carefully |
| constructed and extensively tested. |
| |
| |
| PATENTS: |
| |
| To the best of my knowledge, bzip2 does not use any patented |
| algorithms. However, I do not have the resources available to |
| carry out a full patent search. Therefore I cannot give any |
| guarantee of the above statement. |
| |
| End of legalities. |
| |
| |
| I hope you find bzip2 useful. Feel free to contact me at |
| jseward@acm.org |
| if you have any suggestions or queries. Many people mailed me with |
| comments, suggestions and patches after the releases of 0.15 and 0.21, |
| and the changes in bzip2 are largely a result of this feedback. |
| I thank you for your comments. |
| |
| Julian Seward |
| |
| Manchester, UK |
| 18 July 1996 (version 0.15) |
| 25 August 1996 (version 0.21) |
| |
| Guildford, Surrey, UK |
| 7 August 1997 (bzip2, version 0.1) |
| 29 August 1997 (bzip2, version 0.1pl2) |
| |