| INSTALLATION |
| 1) Change makefile settings to reflect |
| ATT vs. BSD software |
| termio vs. termcap |
| MGR vs. no MGR (MGR is a BELLCORE produced |
| window manager that is also available |
| free to the public.) |
| 2) Then, just say "make". |
| If you want to "make install", you should first |
| change definition of INSDIR in the makefile |
| |
| 3) to test the software say |
| |
| spiff Sample.1 Sample.2 |
| |
| spiff should find 4 differences and |
| you should see the words "added", "deleted", "changed", |
| and "altered" as well as four number in stand-out mode. |
| |
| spiff Sample.1 Sample.2 | cat |
| |
| should produce the same output, only the differences |
| should be underlined However, on many terminals the underlining |
| does not appear. So try the command |
| |
| spiff Sample.1 Sample.2 | cat -v |
| |
| or whatever the equivalent to cat -v is on your system. |
| |
| A more complicated test set is found in Sample.3 and Sample.4 |
| These files show how to use embedded commands to do things |
| like change the commenting convention and tolerances on the |
| fly. Be sure to run the command with the -s option to spiff: |
| |
| spiff -s 'command spiffword' Sample.3 Sample.4 |
| |
| These files by no means provide an exhaustive test of |
| spiff's features. But they should give you some idea if things |
| are working right. |
| |
| This code (or it's closely related cousins) has been run on |
| Vaxen running 4.3BSD, a CCI Power 6, some XENIX machines, and some |
| other machines running System V derivatives as well as |
| (thanks to eugene@ames.arpa) Cray, Amdahl and Convex machines. |
| |
| 4) Share and enjoy. |
| |
| AUTHOR'S ADDRESS |
| Please send complaints, comments, praise, bug reports, etc to |
| Dan Nachbar |
| Bell Communications Research (also known as BELLCORE) |
| 445 South St. Room 2B-389 |
| Morristown, NJ 07960 |
| |
| nachbar@bellcore.com |
| or |
| bellcore!nachbar |
| or |
| (201) 829-4392 (praise only, please) |
| |
| OVERVIEW OF OPERATION |
| |
| Each of two input files is read and stored in core. |
| Then it is parsed into a series of tokens (literal strings and |
| floating point numbers, white space is ignored). |
| The token sequences are stored in core as well. |
| After both files have been parsed, a differencing algorithm is applied to |
| the token sequences. The differencing algorithm |
| produces an edit script, which is then passed to an output routine. |
| |
| SIZE LIMITS AND OTHER DEFAULTS |
| file implementing limit name default value |
| maximum number of lines lines.h _L_MAXLINES 10000 |
| per file |
| maximum number of tokens token.h K_MAXTOKENS 50000 |
| per file |
| maximum line length misc.h Z_LINELEN 1024 |
| maximum word length misc.h Z_WORDLEN 20 |
| (length of misc buffers for |
| things like literal |
| delimiters. |
| NOT length of tokens which |
| can be virtually any length) |
| default absolute tolerance tol.h _T_ADEF "1e-10" |
| default relative tolerance tol.h _T_RDEF "1e-10" |
| maximum number of commands command.h _C_CMDMAX 100 |
| in effect at one time |
| maximum number of commenting comment.h W_COMMAX 20 |
| conventions that can be |
| in effect at one time |
| (not including commenting |
| conventions that are |
| restricted to beginning |
| of line) |
| maximum number of commenting comment.h W_BOLMAX 20 |
| conventions that are |
| restricted to beginning of |
| line that are in effect at |
| one time |
| maximum number of literal comment.h W_LITMAX 20 |
| string conventions that |
| can be in effect at one time |
| maximum number of tolerances tol.h _T_TOLMAX 10 |
| that can be in effect at one |
| time |
| |
| |
| DIFFERENCES BETWEEN THE CURRENT VERSION AND THE ENCLOSED PAPER |
| |
| The files paper.ms and paper.out contain the nroff -ms input and |
| output respectively of a paper on spiff that was given the Summer '88 |
| USENIX conference in San Francisco. Since that time many changes |
| have been made to the code. Many flags have changed and some have |
| had their meanings reversed, see the enclosed man page for the current |
| usage. Also, there is no longer control over the |
| granularity of object used when applying the differencing algorithm. |
| The current version of spiff always applies the differencing |
| in terms of individual tokens. The -t flag controls how the edit script |
| is printed. This arrangement more closely reflects the original intent |
| of having multiple differencing granularities. |
| |
| PERFORMANCE |
| |
| Spiff is big and slow. It is big because all the storage is |
| in core. It is a straightforward but boring task to move the temporary |
| storage into a file. Someone who cares is invited to take on the job. |
| Spiff is slow because whenever a choice had to be made between |
| speed of operation and ease of coding, speed of operation almost always lost. |
| As the program matures it will almost certainly get smaller and faster. |
| Obvious performance enhancements have been avoided in order to make the |
| program available as soon as possible. |
| |
| COPYRIGHT |
| |
| Our lawyers advise the following: |
| |
| Copyright (c) 1988 Bellcore |
| All Rights Reserved |
| Permission is granted to copy or use this program, EXCEPT that it |
| may not be sold for profit, the copyright notice must be reproduced |
| on copies, and credit should be given to Bellcore where it is due. |
| BELLCORE MAKES NO WARRANTY AND ACCEPTS NO LIABILITY FOR THIS PROGRAM. |
| |
| Given that all of the above seems to be very reasonable, there should be no |
| reason for anyone to not play by the rules. |
| |
| |
| NAMING CONVENTIONS USED IN THE CODE |
| |
| All symbols (functions, data declarations, macros) are named as follows: |
| |
| L_foo -- for names exported to other modules |
| and possibly used inside the module as well. |
| _L_foo -- for names used by more than one routine |
| within a module |
| foo -- for names used inside a single routine. |
| |
| Each module uses a different value for "L" -- |
| module files letter used implements |
| spiff.c Y top level routines |
| misc.[ch] Z various routines used throughout |
| strings.[ch] S routines for handling strings |
| edit.h E list of changes found and printed |
| tol.[ch] T tolerances for real numbers |
| token.[ch] K storage for objects |
| float.[ch] F manipulation of floats |
| floatrep.[ch] R representation of floats |
| line.[ch] L storage for input lines |
| parse.[ch] P parse for input files |
| command.[ch] C storage and recognition of commands |
| comment.[ch] W comment list maintenance |
| compare.[ch] X comparisons of a single token |
| exact.[ch] Q exact match differencing algorithm |
| miller.[ch] G miller/myers differencing algorithm |
| output.[ch] O print listing of differences |
| flagdefs.h U define flag bits that are used in |
| several of the other modules. |
| These #defines could have been |
| included in misc.c, but were separated |
| out because of their explicit |
| communication function. |
| visual.[ch] V screen oriented display for MGR |
| window manager, also contains |
| dummy routines for people who don't |
| have MGR |
| |
| I haven't cleaned up visual.c yet. It probably doesn't even compile |
| in this version anyway. But since most people don't have mgr, this |
| isn't urgent. |
| |
| NON-OBVIOUS DATA STRUCTURES |
| |
| The Floating Point Representation |
| |
| Floating point numbers are stored in a struct R_flstr |
| The fractional part is often called the mantissa. |
| |
| The structure consists of |
| a flag for the sign of the factional part |
| the exponent in binary |
| a character string containing the fractional part |
| |
| The structure could be converted to a float via |
| atof(strcat(".",mantissa)) * (10^exponent) |
| |
| To be properly formed, the mantissa string must: |
| start with a digit between 1 and 9 (i.e. no leading zeros) |
| except for the zero, in which case the mantissa is exactly "0" |
| for the special case of zero, the exponent is always 0, and the |
| sign is always positive. (i.e. no negative 0) |
| |
| In other words, (except for the value 0) |
| the mantissa is a fractional number ranging |
| between 0.1 (inclusive) and 1.0 (exclusive). |
| The exponent is interpreted as a power of 10. |
| |
| Lines |
| there are three sets of lines: |
| implemented in line.c and line.h |
| real_lines -- |
| the lines as they come from the file |
| content_lines -- |
| a subset of reallines that excluding embedded commands |
| implemented in token.c and token.h |
| token_lines -- |
| a subset of content_lines consisting of those lines that |
| have tokens that begin on them (literals can go on for |
| more than one line) |
| i.e. content_lines excluding comments and blank lines. |
| |
| |
| THE STATE OF THE CODE |
| Things that should be added |
| visual mode should handle tabs and wrapped lines |
| handling huge files in chunks when in using the ordinal match |
| algorithm. right now you have to parse and then diff the |
| whole thing before you get any output. often, you run out of memory. |
| |
| Things that would be nice to add |
| output should optionally be expressed in real line numbers |
| (i.e. including command lines) |
| at present, all storage is in core. there should |
| be a compile time decision to allow temporary storage |
| in files rather than core. |
| that way the user could decide how to handle the |
| speed/space tradeoff |
| a front end that looked like diff should be added so that |
| one could drop spiff into existing shell scripts |
| the parser converts floats into their internal form even when |
| it isn't necessary. |
| in the miller/myer code, the code should check for matching |
| end sequences. it currently looks matching beginning |
| sequences. |
| |
| Minor programming improvements (programming botches) |
| some of the #defines should really be enumerated types |
| all the routines in strings.c that alter the data at the end of |
| a pointer but return void should just return the correct |
| data. the current arrangement is a historical artifact |
| of the days when these routines returned a status code. |
| but then the code was never examined, |
| so i made them void . . . |
| comments should be added to the miller/myer code |
| in visual mode, ask for font by name rather than number |