blob: c4f8632b17aa88cd4c6b98e25d3e8d2072270367 [file] [log] [blame]
Format of Lout hyphenation information files
Jeffrey H. Kingston
22 December 1992
21 September 1994
6 June 1995
3 April 1996
Basser Lout Version 3 incorporates automatic hyphenation using the
method introduced by TeX (see Appendix H of the TeXBook by D. E. Knuth),
with support for multilingual hyphenation. No special action is required
to install hyphenation unless it is desired to change the hyphenation
information that controls it.
There is one hyphenation information file for each language, and it is
named in the langdef of that language. For example:
langdef German Deutsch { german }
(There will usually be other information between the hyphenation file
name and the closing brace, not relevant here.) This example means that
unpacked Lout hyphenation file german.lh or its packed equivalent
german.lp (see below) is to be used when hyphenating German words. These
files are kept in the Lout system hyphenation directory (this directory). If
a language is desired but no hyphenation information file is available, the
file name may be replaced with -, and then the language will be defined but
hyphenation in that language will never be attempted. Another possibility
is to include a placeholder file for the language (see below).
The first time on any run that German hyphenation is required, Lout will
search the directories of the hyphenation path for a binary file called
german.lp, which contains a binary form of the hyphenation patterns in
german.lh, modified so that the file may be shared by big-endian and
little-endian machines. If german.lp cannot be found, Lout then searches
for the text file german.lh instead, and uses it to construct german.lp.
To change the German hyphenation patterns, delete german.lp and modify
german.lh; the rest is automatic.
Alternatively, if lout is invoked with the -x flag and the langdef line
above appears in its input, it will read german.lh and produce german.lp
immediately. This is intended for setting up: it is good to create all
these packed files at setup time, since a subsequent lout run that needs them
will not have write permission in the Lout system hyphenation directory.
An unpacked Lout hyphenation information (.lh) file mainly contains a
long list of TeX hyphenation patterns. It must begin with either
Lout hyphenation information
or
Lout hyphenation placeholder
alone on the first line. In the second case, it is understood that the
file is a placeholder (i.e. a stub file which might be overwritten with
a real file in the future), and Lout does not read any futher; the effect
is that Lout will not hyphenate this language, but not complain about the
absence of the file either.
In the non-placeholder case, following the header line comes the "Classes:"
heading followed by the character classes. For example:
Classes:
@!$%^&*()_-+=~`{[}]:;'|<,.>?/0123456789
aA
bB
cC
...
yY
zZ
The hyphenation process treats the characters in each class as identical
(so the classes above ensure that the distinction between upper and lower
case is ignored). By definition, the characters of the first class are
"non-letters", and the characters of the remaining classes are "letters".
Notice that these are actual characters, not character names: hyphenation
files are encoding-specific.
Next comes the "Exceptions:" heading followed by the exceptions, which
are words (composed of letters and "-" only) whose hyphenation is to be
treated as a special case. For example:
Exceptions:
ta-ble
phil-an-thropic
These words may be hyphenated in the places shown by the "-" characters.
Character classes are in effect here (Table will be hyphenated as Ta-ble).
If there are no exceptions, "Exceptions:" may be omitted.
Next comes an optional LengthLimit section, which tells Lout to ignore
some patterns. For example,
LengthLimit:
4
means that patterns containing more than 4 letters (note that
. counts as a letter) are to be ignored. The purpose is to discard
the least important patterns from files that are too large for Lout
to handle otherwise. None of the files actually use this at present,
but hyphenation files seem to be getting larger and larger, and if
any whoppers come along they might have to be trimmed in this way.
Finally comes the "Patterns:" heading followed by the list of TeX
hyphenation patterns. Apart from the weighting digits, the patterns
should contain only letters. Lout understands some TeX escape sequences
e.g. it will accept \^e anywhere in a hyphenation file as the ecircumflex
character.
The file may contain comments, which begin with % (either at the start
of a line or after a white space character) and go to end of line. The
headings, classes, exceptions and patterns are separated by arbitrary
white space.
Briefly, hyphenation of a word works like this. If the word contains a
character not found in any character class, it will not be hyphenated.
Otherwise the word is analysed into sequences of letters separated by
sequences of non-letters. Each sequence of five or more letters is
then matched, either with an exception or else with the hyphenation
patterns, and hyphenated. The hyphen character "-" is treated specially.
Extreme lengths were resorted to to compress the .lp file as much as
possible. Files significantly larger than german.lh are likely to cause
Lout to abort with an error message. Please contact jeff@cs.usyd.edu.au
if you have problems with this or anything else.