blob: d546824b7073bf63940c3ba837b0a83a2eb4b086 [file] [log] [blame]
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1.2//EN"
"" [
<!ENTITY app-fdl SYSTEM "fdl.xml">
<!-- "file:/usr/share/sgml/docbook/dtd/xml/4.1.2/docbookx.dtd" [ -->
<!-- This is the developers documentation for the term processor Kimwitu++ -->
<book lang="en" xml:lang="en">
<title>Kimwitu++ developers reference</title>
<orgname>Humboldt-University Berlin</orgname>
<orgdiv>Institute for Informatics</orgdiv>
<edition>This document is $Id$.</edition>
<!-- - - - - - - - - - - - - - - - Smart Pointers - - - - - - - - - - - - -->
<chapter><title>Smart Pointers</title>
<para>Smart pointers were implemented by Gerd Kurzbach. Their usage is not
straightforward, because that is made impossible by a few challenges.</para>
What are the pointer types called?
For a phylum called <replaceable>phylum</replaceable> there'll be the following
pointer types:<itemizedlist>
<listitem><para><replaceable>phylum</replaceable>, a simple C
<listitem><para>c_<replaceable>phylum</replaceable>, a simple constant C
<listitem><para><replaceable>phylum</replaceable>_ptr, the normal smart
<listitem><para>weak_<replaceable>phylum</replaceable>_ptr, the weak smart
pointer (does increment reference count),</para></listitem>
<!-- - - - - - - - - - - - - - Kimwitu++ Source Management - - - - - - - - - - - - -->
<chapter><title>Kimwitu++ Source Management</title>
<sect1><title>Formatting the Kimwitu++ source code</title>
<sect2><title>The input (<abbrev>ie.</abbrev> Kimwitu++ text)</title>
<para>Tabs have a width of 8, in words eight, which means they are exactly 8 characters
wide, not two (2), not 4 (four), but 2<superscript>3</superscript> characters.</para>
<para>The indentation however should be four characters, <abbrev>ie.</abbrev>
<para>If you don't want to use tabs, but expand them to spaces, fine. However, recall it's
shorter. In a proper editor, it should be easy to get the desired behaviour,
<abbrev>eg.</abbrev> in Vim you can set <userinput>set tabstop=8 softtabstop=4
<sect2><title>The output (<abbrev>ie.</abbrev> generated text)</title>
<para>In general care should be taken to make the generated source code human readable.
It should always look according to the guidelines above.</para>
<para>Kimwitu++ features a smart printer. This means you don't have to worry about
spaces, tabs and alignment, the printer will do it automatically. It will even ignore
your spaces and tabs, so don't bother except for making the input more readable.
Occasionally the printer will fail (it's simple, and not a complete parser like
entity), so you have to use the following hints in the output strings:
<listitem><para>The printer likes to elide superfluous spaces. Sometimes spaces
are important (most notably to avoid nested template specifications with
<userinput>&gt; &gt;</userinput> being interpreted as the
<userinput>&gt;&gt;</userinput> operator. The space following
<userinput>\b</userinput> will <emphasis>not</emphasis> be ignored.
<listitem><para>This causes the following text to be indented one additional level.
Mnemonic for <wordasword lang="de">vorwärts</wordasword>, which is German for
<listitem><para>This causes the following text to one level of indenting less.
Mnemonic for <wordasword lang="de">rückwärts</wordasword>, which is German for
<para>All changes which have an influence on the user should be marked with a
<literal>USER:</literal> prefix in the CSV change log message. This allows to collect
all user visible changes in the <filename>CHANGELOG</filename> file on release.</para>
<!-- - - - - - - - - - - - - - Uniq Phyla - - - - - - - - - - - - -->
<chapter><title>Uniq Phyla</title>
<sect1><title>Hash Sets</title>
<para>Why are hash sets used?</para>
<para>Performance. Red-Black trees are known to be always good, hash tables can have
poor performance. But usually they are fast. As very many casestrings are created
in common programs, we optimize for this.</para>
<para>Why another level of indirection?</para>
<para>Performance (compute hash only once).</para>
<para>What are the preprocessor decisions for?</para>
<para>Hash sets are not standardized yet.</para>
<para>There are different implementations. The three compilers I looked at (gcc and
Intel's compiler) use three different ways to implement it. (The two gccs only
differ in the location of the header file.)</para>
<para>The GNU compiler expects two functors: One doing a hash, one doing a test for
equality. Predefined tests are not suitable, of course.</para>
<para>The Intel compiler expects a single functor both unary and binary: A unary
function doing a hash, a binary function doing a test for lessness. Predefined
tests are not suitable, of course.</para>
<para>There is no knowing how other compilers might implement this. If a known
compiler is used, hash sets are employed by default, but can be deactivated by
defining DONT_USE_HASHSET. If an unknown compiler is used, hash sets are not
employed by default, but can be activated by defining USE_HASHSET (but then keep
fingers crossed we guess the right way to use them).</para>
<!-- - - - - - - - - - - - - - Patterns - - - - - - - - - - - - -->
<sect1><title>Allowed patterns</title>
<para>Patterns over phyla are not allowed. Patterns over predefined operators (e. g. _Str)
are not allowed.</para>
<sect2><title>Patterns over predefined operators</title>
<para>It is not desirable to define new unparse and rewrite rules for the predefined phyla.
Predefined phyla are primitive, rewriting should not need to change them.
Different unparse behaviour can easily be implemented in the unparse function, should
it really be needed.</para>
<para>A pattern over a predefined operator might be useful in a with-statement.
However, this is not necessary or possible: All operators in a with-statement must
be of the same phylum, as patterns over phyla are (not yet) allowed.</para>
<para>Primitive phyla only have one operator, so a with-statement would have only one
alternative and is superfluous.</para>
<sect2><title>Patterns over phyla</title>
<para>Patterns over phyla are not allowed. This should be changed, considering the
<para>In a pattern over a phylum, there cannot be any references to subphyla.
Therefore, they are a bit limited. One possible exception could be lists and
abstract_list, as they always have two children.</para>
<para>Patterns over phyla are useless in with-statements. They can equally well be
implemented with a switch-statement over prod_sel().</para>
<sect1><title>Pattern sorting</title>
<para>The sorting of patterns is a difficult area. There are some obvious cases, and
some which are more twisted.</para>
<para>If, at a certain point, two patterns may match the current phylum, which one will
be chosen? Let us consider the following cases:</para>
<listitem><para>A pattern that is truly more specific than another, i. e. all
terms that match it would also match the other, but not vice versa, should be
tested for before the other.</para></listitem>
<listitem><para>For two patterns that are disjunct, i. e. there cannot exist any
terms that match both, ordering does not matter.</para></listitem>
<listitem><para>Two patterns which cover the same terms should be considered an
error. Otherwise one would always be preferred over the other, probably not
what the user intended. This is not so when there is a guard condition on the
pattern (<userinput>provided</userinput>); it is difficult to decide for us what
the semantics of these are (except when there are two identical conditions,
<listitem><para>For overlapping patterns, the situation is complex. There must
exist a pattern (or multiple patterns) that covers exactly the intersection.
In general, for each term, there must always be a most specific
<para>This condition is difficult to implement, and it is hoped to be rare. We
might focus on easy special cases (as <emphasis>one</emphasis> pattern
covering the intersection of <emphasis>two</emphasis> others) and give
warnings whenever we are not sure.</para></listitem>
<para>In all other cases, the patterns have to be sorted, and a warning might be issued
to the user if that ordering is not obvious. Currently, pattern sorting in Kimwitu is
through an algorithm that is unintuitive. It frequently leads to an unexpected order.
A pattern is considered more specific if it is more specific further to the left; when
in doubt, the order of appearance in the input is taken.</para>
<para>To this end, Gerd proposed two different algorithms, both intuitive but already
fairly complex, which both resulted in different orderings. I believe this problem
cannot be solved.</para>
<para>Unfortunately, for some projects, Kimwitu++ issues many warnings, when order
doesn't matter or is indeed obvious. A way to silence this warnings selectively is on
the wishlist</para>
<!-- vim:set sw=2 ts=8: -->