What is &kpp;? This chapter sketches out the field on which &kpp; is usefully employed and explains important terms. It develops an example to outline the advantages of this tool compared to conventional techniques. The example will be used throughout the following chapters to introduce concept by concept. The complete code to make it work can be found in appendix . What for is &kpp; used? To illustratively explain what we can do with the help of &kpp; we call upon an example. Let us imagine we want to write a computer game. Neat example. Respectable programmers as we are we calculate the overall costs of the new project in advance. It will take us, say, 30 days, on each we need to have a pizza at 7 € and 3 beer at 2 € each, but luckily meanwhile our grandma supports us with 100 €. We get the expression 30*(8+3*2)-100 and type it in postfix notation into our RPNrpn-calculator. You know, one of the antiquated devices knowing nothing about precedence nor parentheses and expecting input in Reverse Polish Notation, where operands precede their operator. So we type 30 8 3 2 * + * 100 -. Er, we cannot. I remember. The calculator gave up the ghost last week. But don't despair. We quickly program a new one. A Simple Example What exactly should the new calculator be able to do? Anyhow, it better would accept also variables. Why? Just imagine a drastic shortage of pizza and beer what would urge us to setup a new expression again. We better use a flexible one which can be seen in example , in which x is the price of a beer and we know the price of a pizza always to be 4 times a beer. Sample Expression 30*(4*x+3*x)-100 (and resulting input in RPN: 30 4 x * 3 x * + * 100 -) We now list the requirements on the calculator in proper order. We want it to analyse an input expression (term) of digits, arithmetical operators, and variables. It should calculate the expression if possible or simplify it at least. It then should output a result. We assume the existence of a code component (a scanner) providing us with syntactical tokens like operators, numbers and identifiers (variables). We then need a description of correct input, a grammargrammar. Our example demands a valid arithmetical term only to consist of numbers, identifiers, and the four arithmetical operators. A formal notation as used by the parser generator &yacc; (see ) would look like in example . &yacc; Grammar for Sample aritherm: simpleterm | aritherm aritherm '+' | aritherm aritherm '-' | aritherm aritherm '*' | aritherm aritherm '/'; simpleterm: NUMBER | IDENT; Such a grammar is made up of rules, each having two sides separated by a colon. These rules describe how the left-hand side called nonterminalnonterminal can be composed of syntactical tokens called terminalsterminal, which are typed in upper case letters or as single quoted characters. Different alternatives are separated by bar. The right-hand side may also contain nonterminals which there can be regarded to be an application of their respective composition rule. The first rule of this grammar defines an &aritherm; to be a &simpleterm; or a composition of two &aritherm;s and an operator sign. These &aritherm;s in turn may be composed according to the &aritherm; rule. The second rule describes what an &aritherm; looks like, if it is a &simpleterm;. A common way to hold an input term internally is to keep it as a syntax treetreesyntax tree, that is a hierarchical structure of nodesnode (upside down tree). A nonterminal can be regarded as a node typenode type and every alternative of the assigned right-hand side as one &kind;&kind; of that type. Every actual node thus is a &kind; of a node type with a specific number of child nodes, which depends on the &kind;. For example '+' is a &kind; of &aritherm; and it has 2 child nodes, while NUMBER is a &kind; of &simpleterm; and it has none. Figure shows a syntax tree for our sample input term. Since every node is a termterm itself such a tree is more generally called a term treetreeterm tree. To summarise the task identified: we want to build a tree from the term which has been typed in, walk over the tree nodes, and perform appropriate actions like calculating, simplifying or printing some results. Conventional Approach In a programming language a node type is usually represented as a structured data type, that is, as a class in object oriented languages and as a sort of record in others. A &kind;, then, may be a class variant or a subclass, and a variant record respectively. The code fragment in example illustrates a possible implementation in &cpp; (the &kind; as a subclass). The classes left out are to define similar. Node Types as Classes with &cpp; class Aritherm { /* ... */ }; class Simpleterm : Aritherm { /* ... */ }; class Plus : public Aritherm { public : Plus( Aritherm *a, Aritherm *b ) : t1 ( a ), t2 ( b ) { /* ... */ }; private: Aritherm *t1, *t2; }; class Number : public Simpleterm { public : Number( int a ) : n ( a ) { /* ... */ }; private: int n; }; From these classes we can instantiate &cpp; objects to represent our sample tree. We can navigate through it by accessing the child nodes of nodes. Not really yet, if you look closely at it. The child nodes are private members and thus they can not be accessed. We have to make them public or to add methods for their access. But what about the next step, simplifying parts of the tree? The subtree 4*x+3*x could be transformed to, right, (4+3)*x , by putting x outside the parentheses, as illustrated in figure . For &cpp; this may look like listed in example . Term Substitution with &cpp; ( A ) !=0 && dynamic_cast ( dynamic_cast( A ) -> t1 ) !=0 && dynamic_cast ( dynamic_cast( A ) -> t2 ) !=0 && dynamic_cast ( dynamic_cast( A ) -> t1 ) -> t2 == dynamic_cast ( dynamic_cast( A ) -> t2 ) -> t2 ) { A = new Mul ( new Sum ( dynamic_cast ( dynamic_cast( A ) -> t1 ) -> t1, dynamic_cast ( dynamic_cast( A ) -> t2 ) -> t1 ), dynamic_cast ( dynamic_cast( A ) -> t1 ) -> t2 ); }; ]]> That seems quite complicated and it is even more so! Not only have we to cast for every child node access, to avoid memory leaks we had to free memory of unused nodes as old A and its child nodes. Furthermore the equality check between t1 and t2 will merely compare pointer values, instead of checking whether the subtrees are structurally equal. Thus we additionally need to overload the equality operators. What a lot of trouble for such a simple example! &kpp; Approach &kpp;'s&kpp; name is formed after Swahili while the ‘++’ reminds on &cpp;. witu : ‘tree’ m- : plural prefix ki- : adjectival prefix: ‘being like’ kimwitu = ‘tree-s-ish’ Thus the name indicates affiliation to trees and to &cpp;. You guessed it before, didn't you? More strictly spoken &kpp; is a tool which allows to describe in an easy way how to manipulate and to evaluate a given term tree. From these descriptions &cpp; code is generated and compiled to a program which processes terms. That is why &kpp; itself is called a term processor. The code for building a tree, we have to write ourselves, or we let preferably other tools generate it. We use &yacc; to call term creation routines which are generated from a &kpp; abstract grammar. This we have to specify first. It is similar to the &yacc; grammar, but its right-hand side alternatives are operators applied to nonterminals. An abstract grammar for our sample can be seen in example . The nonterminals integer and casestring are predefined in &kpp;. Abstract Grammar for Sample aritherm: SimpleTerm ( simpleterm ) | Plus ( aritherm aritherm ) | Minus ( aritherm aritherm ) | Mul ( aritherm aritherm ) | Div ( aritherm aritherm ); simpleterm: Number ( integer ) | Ident ( casestring ); Next we have to complete the &yacc; grammar by adding semantic actions in braces to every alternative. These recursively create a term tree which has its root element assigned to the variable root_term. Example shows this grammar, in which $$denotes the term under construction and 1 and 2 its first and its second subtermsubterm (child node). Completed &yacc; Grammar for Sample aritherm: simpleterm { root_term =$$ = SimpleTerm( $1 ); } | aritherm aritherm '+' { root_term = $$= Plus( 1, 2 ); } | aritherm aritherm '-' { root_term =$$ = Minus($1, $2 ); } | aritherm aritherm '*' { root_term = $$= Mul( 1, 2 ); } | aritherm aritherm '/' { root_term =$$ = Div($1, $2 ); }; simpleterm: NUMBER { $$= Number( 1 ); } | IDENT {$$ = Ident($1 ); }; That is it for building a tree. Everything else is left to the automatic code generation. Modifying or evaluating the tree needs its own rules (see chapter ). Summary The language &kpp; is an extension of &cpp; for handling of term trees. It allows the definition of term typesterm typenode type, creation of terms, and provides mechanisms for transforming and traversing trees as well as saving and restoring them. Besides creating it in static code a tree can dynamically be obtained by interfacing with compiler generator &cpp; code (as from &yacc;/&bison;). The &kpp; processor generates &cpp; code from the contents of &kpp; input (.k-files). Compilation of that code yields the term processing program. How to Define Term Types This chapter describes possibilities to define term types, which make up the &kpp; input. In &kpp; they are called phylaphylaphylum (singular phylum)phylum, and that is what we will call them from now on. A phylum instance is called a termterm. Definition phylumdefinition How are phyla defined? Example shows that each phylum is defined as an enumeration of its &kind;s, each being an operator applied to a number of phyla, maybe zero. So the operator SimpleTerm takes one &simpleterm; phylum, Plus takes two &aritherm; phyla. There are several predefined phyla, of which integer and casestring already have been mentioned. The latter denotes a case sensitive character string. If a phylum is defined more than once, all occurrences contribute to the first one. For each phylum, a &cpp; class is generated. Lists<indexterm><primary>list</primary></indexterm> list We may want to define a phylum as a list of phyla. Imagine we wanted not only to type one expression into our calculator but several ones at once, separated in input by, say, a semicolon. The main phylum, representing whole the input, would be a list of &aritherm;s. This is a right-recursive definition of a list which may be a nil (empty) list. The name of the list phylum prefixed by NilNil and ConsCons make up common names for the two list operators. The other way to define a list phylum is to use the built-in list operator. This not only looks more simple but causes the generation of additional list functions. Example shows both definitions. Recursive and built-in List Definition arithermlist: Nilarithermlist ( ) | Consarithermlist( aritherm arithermlist ); arithermlist: list aritherm; Attributes<indexterm id="intro_attributes" class="startofrange"><primary>attributes</primary></indexterm> Each phylum definition can contain declarations of attributes of phylum or arbitrary &cpp; types. They follow the operator enumeration as a block enclosed in braces. What purpose do they serve? With them, we can attach additional information to nodes, which otherwise could only unfavourably be represented in the tree. In our example, we may take advantage of attributes by saving intermediate results to support the calculation. Therefore we extend the definition of &aritherm; from example to example . Phylum Definition with Attributes aritherm: SimpleTerm ( simpleterm ) | Plus ( aritherm aritherm ) | Minus ( aritherm aritherm ) | Mul ( aritherm aritherm ) | Div ( aritherm aritherm ) { int result = 0; bool evaluated = false; bool computable = true; }; Attribute result should hold the intermediate result of an &aritherm;, if it already has been evaluated (evaluated==true) and found computable during the evaluation (computable==true), that is the subterms contain no variables. The attributes can be initialized at the declaration or inside an additional braces block which may follow the declarations and can contain arbitrary &cpp; code. Example shows an alternative to the initialization from example . Alternative Attributes Initialization result = 0; $0->evaluated = false;$0->computable = true; } };]]> That &cpp; code is executed when a term of that phylum has been created. It can be referred to as $0 and its attributes can be accessed via the operator ]]>. Aid with Term Handling This chapter describes techniques necessary and useful to traverse a term structure: the application of term patterns and the use of special &kpp; language constructs for term handling. Patterns<indexterm><primary>patterns</primary></indexterm> Patterns are a means to select specific terms which can be associated with a desired action. Example shows some patterns and explains what terms they match. Patterns are used in special statements and in rewrite and unparse rules (see and respectively). Patterns Special Statements &kpp; provides two statements as extensions to &cpp; which make it more comfortable to deal with terms. These are the &with;-statement and the &foreach;-statement. They can be used in functions and in the &cpp; parts of unparse rules (see ). The &with;&with;-statement can be considered as a switch-statement for a phylum. It contains an enumeration of patterns which must describe &kind;s of the specified phylum. From these, the one is chosen which matches a given term best. Then the &cpp; code is executed, which was assigned to that pattern. Example takes a term of the phylum &aritherm; and calculates the attribute result, if the term is a sum or a difference, by adding or subtracting the results of the subterms. The keyword default serves as a special pattern in &with;, which matches when none of the others does. In the first case, the two subterms are assigned to the variables a and b, which can be used in the &cpp; part. The term itself is assigned to variable c which thus refers to the same term as the variable at. Variable at is visible throughout the entire &with; body and it can be accessed in all &cpp; parts. It may get a new term assigned as it is done in the second case whithin which the variable at refers to the first subterm. &with;-statement result = a -> result + b -> result; } Minus ( at, b ) : { c -> result = at -> result - b -> result; } default : { } }]]> The second special construct is the &foreach;&foreach;-statement, which iterates over a term of a list phylum and performs the specified actions for every list element. Example determines the greatest result from the terms in the list a. &foreach;-statement result > max ) max = a -> result; }]]> Modifying and Evaluating Term Trees This chapter describes how the tree of terms can be processed once it has been created. On one hand we can change its structure and hopefully simplify it by applying rewrite rules. On the other hand we can step through it and create a formatted output by applying unparse rules. Transforming Rewritingrewriting denotes the process of stepping through the tree, seeking the terms that match a pattern and substituting them by the appropriate substitution term. Rewrite rules consist of two parts, where the left-hand side is a pattern (as described in ) and the right-hand side is a substitution term enclosed by angle brackets. A term must always be substituted by a simpler one, that is by one that is nearer to the desired form of result. Otherwise the substitution process may never stop. Let us try simplifying according to figure to demonstrate the usage of rewrite rules. What would the rules look like in &kpp; to achieve a simplification of that kind? Example shows a solution using rewriting. It is quite short in comparison with example , isn't it? Term Substitution using &kpp; <: Plus( Mul( a, c ), b )> ;]]> An equivalent part would cover the case that b takes the first position in both subterms. The meaning of the colon will be explained in . Traversing Originally unparsing was meant to be a reverse parse, used to give a formatted output of the tree built. In general it should better be recognized as a way to traverse the tree. Unparse rules have a structure similar to that of rewrite rules. The left-hand side consists of a pattern, the right-hand side of a list of unparse items enclosed by square brackets. Some more common unparse items are strings, pattern variables, attributes, and blocks of arbitrary &cpp; code enclosed in braces. Unparsing starts by calling the unparseunparse()-method of a term, usually the root term, and when a rule matches a term the specified items are ‘printed’. Only string items are really delivered to the current printer. This printer has to be defined by the user, and it usually writes to the standard output or into a file. Variable items and attribute items are further unparsed, code fragments are executed. If no pattern matches then the default rule is used, which exists for every phylum operator and which simply unparses the subterms. We could do quite a lot of different things with the information saved in the tree. It just depends on the rules we use. For example we choose to print the input term in infix notation, because it is better readable to humans. [infix: "(" a "+" b ")" ];]]> For every Plus term this rule prints an opening parenthesis, unparses the first subterm, prints a plus sign, unparses the second subterm, and then prints the closing parenthesis. The other operators are handled by similar rules. We also may want to eventually compute the result of expressions. This can be achieved with rules like this. [: a b { c -> result = a -> result + b -> result; } ];]]> Here the subterms are unparsed and then an attribute of the term gets assigned the sum of the subterm results. This will work only if these do not contain Idents. The meaning of the colon will be explained in . The &cpp; code is enclosed by braces, but can itself contain braces in matching pairs. If a single brace is needed, as when mixing code and variable items, it has to be escaped with the dollar sign. Example shows an application. If the function yields true the first branch is taken, and b is unparsed before a. Escaped Braces in Unparse Rule [ : { if( smaller_than( a, b ) ) }${ b "+" a $} { else }${ a "+" b \$} ];]]> Views<indexterm class="startofrange" id="idx:view"><primary>view</primary></indexterm> The whole process of rewriting and unparsing as well as parts of it can be executed under different views. Each rule contains a view list between the respective opening brace and the colon, and it is used only if the current view appears in the list. This allows to specify different rules for a pattern. If a term is visited twice under different views, different rules are applied. These rules can be merged into one by listing all right-hand sides after one left-hand side, separating them by a comma. Example defines two rewrite views (simplify and canonify) and two rules, each of which will only be applied to a matching term if the current view is among the specified ones. The first rule replaces the quotient of the same two identifiers by the number 1, the second expresses the associativity of addition. Both change the tree. Views in Rewrite Rules mkinteger < simplify: SimpleTerm( Number( mkinteger( 1 ) ) ) >; Plus( Plus( a, b ), c ) -> < canonify: Plus( a, Plus( b, c ) ) >;]]> Example defines two unparse views (infix and postfix) and two rules for the same pattern, the one of which is used which matches both the term and the current view. It is possible to force a variable item to be unparsed under a desired view by specifying it after the variable and an intermediate colon. Subterm b is further unparsed under the view check_zero instead of the view infix. %uview [ infix : a "/" b : check_zero ]; ]]> Views in Unparse Rules [ infix : a "+" b ], [ postfix : a b "+" ];]]>