blob: 17ff05daa4b997971e7966d5194459c29bca77a2 [file] [log] [blame]
 Definition of Phyla Basic Definition A phylum definition consists of two sides. The left-hand side specifies the phylum name, the right-hand side, behind a colon, a set of alternative operators, which are separated by bars. An operator is followed by a matching pair of parentheses, which may enclose a list of phyla as arguments. A nullary operator has an empty list. Example presents a basic definition with operators of different arity. Before the semicolon closing the definition an attribute block may appear. The definition of a phylum follows the general form: For the names of phyla and operators, the same restrictions hold as for &cpp; identifiers. Multiple definitions of of a phylum with the same name are interpreted as additional alternative branches of the first definition. There are some simple predefined phyla: integerinteger, realreal, casestringcasestring, and nocasestringnocasestring, representing integer values, real values, case sensitive strings, and case insensitive strings respectively. Terms of these are not created by calls of operators but of special generated functions: mkinteger()mkinteger, mkreal()mkreal, mkcasestring()mkcasestring and mknocasestring()mknocasestring. The phylum abstract_phylum represents the direct base of all phyla. Adding properties, like attributes or methods, to it makes them available for all phyla. Other direct bases may be specified for a phylum by using the keyword %base%base. One of them has to be derived from abstract_phylum, not necessarily directly. Example makes phylum math_expression to be derived from a phylum expression and a user defined &cpp; class counter_class. Definition of a Phylum Changing base of a phylum %base List Phyla<indexterm id="idx:list_man" significance="preferred" class="startofrange"><primary>list</primary></indexterm> phylumlistlist A list phylum can be defined by using the basic form of phylum definitions in a (right-) recursive way. The nullary operator constructs a list which is empty, and the binary, a list which is concatenated of a single list element and a list. The other variant uses the predefined operator list after which is specified the phylum of the list elements. The latter notation is to prefer because it is concise and causes the generation of additional list functions. That includes two operators which are named according to the scheme which is used in the right-recursive definition (prefixes Nil and Const). Other names could be chosen as well but this may cause some confusion. With either variants, a term is created by calling one of the two operators. The two definitions in example yield the same list of elements of a phylum expression. Alternative Definitions of one List Attributes of Phyla<indexterm significance="preferred" class="startofrange" id="idx:attributes"><primary>attributes</primary></indexterm> A phylum definition may have an attribute part attached which is enclosed in braces. It contains declarations of attributes of phylum types or other &cpp; types and optionally their initial value assignment. After that arbitrary &cpp; code can follow again enclosed in braces. This is the general form of the attribute part: In particular, the initialization of attributes can be done in the &cpp; part too; though technically then it is not initialization, but assignment. Attributes may also be defined outside the phylum definition according to the general form beneath. Only attributes of phylum types can be defined with %attr%attr. This is because they are considered part of the enclosing phylum by &kpp;. When a term is written to a file using the CSGIO functions (see ), its attributes are saved and can thus be restored. Using the keyword %member%member instead allows also &cpp; types to be used. The values of attributes defined in this way will get lost when their term is written to a file. Restoring a saved term will assign the initial values to all %member attributes. The &cpp; code is executed after the term creation and the attribute initializations. Inside that code the newly created term can be referred to as $0$0. Attributes are accessed via the operator ]]>. Each predefined phylum has an attribute which holds its value, which is value for integerinteger and realreal, and name for casestringcasestring and nocasestringnocasestring. Example shows three alternative ways to define and initialize one attribute (of &cpp; type). Alternative Definitions of one Attribute is_valid = true; } }; expression: Div ( expression expression ); %member bool expression::is_valid = true;]]> Supplemental Definitions It is possible to supplement the classes which will be generated from phylum definitions with additional methods. These are defined as usual but have as qualifier the name of either a phylum or of an operator. Such a method is known for all terms of the phylum or only for those constructed by the specified operator. Predefined phyla can get additional methods too. Additional Methods Definitions It may appear desirable to initialize attributes attributes of a phylum with non-default values immediately at term creation. This can be realized by using the &kpp; keyword %ctor%ctor to define own constructors which will replace the default ones. Additional constructors are possible for phyla as well as for operators, but not for predefined phyla. If these constructors are defined to have arguments then of some points are to take note. First, when used with operators the arguments will be added to those in the operator definition. Second, since the default constructors are replaced but relied on by some internal mechanism, the user has to define new ones or alternatively just provide default values for all new arguments. The latter may cause performance loss in the case of many or non-simple arguments. The keyword %dtor%dtor allows own destructor definitions for freeing memory of attributes or something similar. It is applicable to phyla and operators but not to predefined phyla. User provided Constructors and Destructors Phylum Storage Options storage optionsstorage class storage class For every phylum a &cpp; class is generated, with one create-method for every operator. A term is created by calling the appropriate method, which returns a pointer to the object representing the term. As a default for every such object a new cell is allocated in memory. But the user may influence the memory management for optimization purposes. At phylum definition time the phylum can be declared to use a special storage class. There is one predefined storage class: uniquniq. It is allowed to specify !uniq what is the same as specifying nothing and results in usage of the default storage. Subphyla of a phylum with storage class can not use the default storage, but must be defined with a storage class too. The completed general form of a phylum definition is the following one. Phylum with Storage Class The first phylum in example declares explicitly to use the default storage. All other phyla defined until now got that implicitly. Memory of such terms can be freed individually, terms of the second phylum can not. They are kept under uniq storage. What does this mean? Each storage class has a hashtables hashtable assigned. All terms of phyla with that class are stored there. If a term is to be created the create routine does conditionally allocate new storage. It checks first whether there is already stored an object created with the same arguments. If found true, the routine will return a pointer to that object in memory. Such every term is created only once. All predefined phyla, such as integer, are declared uniq. So the example has the effect that if two &simpleterm;s are created from the same int value they will both point to the same memory cell. It is possible to define additional storage classes, which each get their own table. Tables also can be explicitly created and assigned to storage classes, as well as cleared or freed (see ). Example declares two additional storage classes and defines two phyla using them. Storage Class Definition and Application Processing phyla Pattern Matching<indexterm significance="preferred"><primary>patterns</primary></indexterm> Patterns make it easier to select terms and subterms and to distinguish cases. They appear in rules for rewriting and unparsing and in &with;- and &foreach;-statements. Here there are explained common features while the slight differences will be mentioned in the appropriate place. The term ‘pattern’ can be defined through induction. Each of the following is a pattern in the context of &kpp;: the literal of a predefined phylum or the asterisk sign, the phylum operator with zero or more patterns as arguments, the assignment of a pattern to a variable, and the enumeration of patterns delimited by commas. Additionally some restrictions hold regarding the use of patterns. The patterns of item are not allowed as the outermost pattern, while these of item are allowed only as the outermost pattern. The assignment of an asterisk to a variable can be abbreviated by stating only the variable. If more than one pattern matches a term, the most specific pattern is chosen. If there is no most specific one, the first of the matches is chosen. The matched term can be accessed as $0$0, its first subtermsubterm as $1, the second as$2 etc. (not in rewrite rules). Table lists pattern examples, which are in each group equally specific. &kpp; is not yet able to decide between more complex patterns (maybe partly overlapping each other) which to be most specific, but chooses the first found. Pattern groups increasingly specific pattern matches * any term; not allowed as the outermost pattern. SimpleTerm term SimpleTerm with an unspecified number of subterms; only allowed as the outermost pattern. SimpleTerm(*) term SimpleTerm with a subterm. a=SimpleTerm(b) term SimpleTerm with a subterm; the term is assigned to a variable a, the subterm, to b. Number(7) term Number with an integer subterm of value the 7. SimpleTerm(*),Mul(b,b) either term SimpleTerm with a subterm or term Mul with two subterms which are structurally equal (the first of the two is assigned to b). SimpleTerm(Number(*)) term SimpleTerm with a subterm being a Number. a=SimpleTerm(Number(b)) term SimpleTerm with a subterm being a Number; the term is assigned to a and the sub-subterm to b. value!=0)]]> term SimpleTerm with a subterm being a Number, which is assigned to b, but matches only if the integer b is not zero. &kpp; Control Structures &kpp; provides two control structures which help dealing with terms: the &with;&with;-statement and the &foreach;&foreach;-statement. They appear in different variants, fitting slightly different purposes. Explicit &with;<indexterm><primary>&with;</primary><secondary>explicit</secondary></indexterm> The &with;-statement is similar to a &cpp; switch and decides for a given term which alternative branch to choose. The alternatives are patterns describing &kind;s of one phylum and have assigned a &cpp; block, maybe an empty one. The example lists a code piece containing an explicitly stated &with;. It decides whether the term a is an identifier or a number and executes its code. That term is accessable throughout the whole &with; body unless an other term is assigned to the variable a, as it is done in the second case. There, variable a gets assigned the subterm of the Number. The special pattern default matches when the preceding patterns do not. Here this would never occur, because a &simpleterm; is defined to be one of the two specified. If no default case is specified and none of the patterns matches the term a runtime exception is released (program execution aborted). The pattern default is allowed in all &with;-variants (implicit-, explicit-, and &foreach;-&with;). Explicit &with; computable = false; } c=Number( a ) : { c -> computable = true; c -> result = a -> value; } default : { // never reached, because simpleterm has only // the above two ]]>&kind; Implicit &with;<indexterm><primary>&with;</primary><secondary>implicit</secondary></indexterm> If a function is defined to have a phylum argument whose variable name begins with a dollar sign, the function body is assumed to be the body of an implicitly given &with;-statement. Example presents a function which returns true if the given &aritherm; matches one of the specified patterns, that is represents an arithmetic term. If the term has the &kind; SimpleTerm, the default case would catch. Implicit &with; Simple &foreach; The &foreach;-statement is a loop which runs over all elements of a term which is a list phylum and executes at every run the code which is specified in the statement body. The &foreach; in example counts the appearances of arithmetic terms by calling the function is_arithmetic_term for a during each run. This variable holds the current list element. Simple foreach &foreach;-&with;<indexterm><primary>&foreach;-&with;</primary></indexterm> This foreach variant also steps through a list, but performs a &with; for every element. A dollar prefixed list variable is used instead of a simple variable. The statement body contains patterns with a &cpp; block assigned to each. Example demonstrates the usage. It counts the number of sums and differences within the term list. &foreach;-&with; &foreach; with Pattern This third variant of &foreach; allows to specify a pattern instead of a list variable. The action is executed only for those list elements which match the pattern. Thus it combines &foreach; and an implicit &with; containing only one pattern of interest. Example is similar to example but it counts only the number of sums in the list. &foreach; with Pattern Multiple Patterns<indexterm><primary>multiple patterns</primary></indexterm> For every one of the preceding statements it is possible to specify not only one but multiple variables or patterns respectively. Used with &foreach;, multiple lists can be iterated over at one time. The variables or patterns are separated by ampersand signs (&), the list specifications by commas. Used with &with;, multiple terms can be checked at one time whether matching complex patterns. Here, the variables are separated by commas. The complex patterns are made up by concatenating single patterns by ampersand signs (&)&, where the first pattern has to match the first term, the second the second term (and so on) to have the complex pattern to match. A complex pattern can also be a grouping of patterns, but then it must be enclosed in parentheses. Example shows a &with; over two variables. The statement simply prints out whether the two terms have the same &kind;. Multiple Patterns in &with; <constant>afterforeach</constant><indexterm><primary>afterforeach</primary></indexterm> When a &foreach; iterates over more than one list at one time, the execution will stop if one of the lists reaches its end, while the others may still contain elements. The afterforeach-statement is useful if it is desired to iterate further over the remainders of the lists. The variables of the afterforeach refer to the list remainders. Their phyla are already known from the preceding &foreach;. The code fragment in example uses &foreach;-&with; and afterforeach to decide whether list A is longer than list B, returning true if it is. <constant>afterforeach</constant> in Length Test Rewriting<indexterm significance="preferred" id="idx:rewriting" class="startofrange"><primary>rewriting</primary></indexterm> The process of rewriting transforms a term tree. The left-hand side of each rewrite rule is a pattern specifying which term to substitute. The right-hand side denotes the term to substitute by, which has to be of the same phylum as the original one. This is done by calls of operators and term returning functions. Variables from the left-hand side may be used. Example simplifies terms by replacing every sum of numbers by its evaluated result. A helper function is necessary since it is not possible directly to use &cpp; operators in a rewrite rule (except method calls, see ). Rewriting <: SimpleTerm( Number( plus( a, b ) ) ) >; %{ KC_REWRITE /* code redirection */ integer plus( integer a, integer b ){ return mkinteger( a -> value + b -> value ); } %}]]> To allow rewriting to end, each rule has to replace the term in question by one which is really simpler, reduced, or closer to a normal form. Since rewriting searches depth first, the subtermssubterm are usually already the result of a transformation. Calling the rewriterewrite() method of the root term starts the transforming process for the tree. Choosing an arbitrary term instead rewrites the subtree beneath. Unparsing<indexterm class="startofrange" significance="preferred" id="idx:unparsing"><primary>unparsing</primary></indexterm> The process of unparsing traverses a term tree and executes the instructions for every term matching a pattern as specified in the unparse rules. The left-hand side of a rule denotes a term pattern, the right-hand side a list of unparse items which are evaluated for matching terms. The various items allowed are listed below and appear all in example , which is completely nonsensically. string Text strings in double quotes are delivered to the printer unchanged. variable The term denoted by this term variable will be unparsed by calling its unparse-method. attribute The attributeattributes of a term will be unparsed by calling its unparseunparse()-method. If it is of non-phylum type the user has to provide such a method. &cpp; code Inside a pair of braces arbitrary &cpp; code may be placed. escaped braces ${$} If a non-matching brace is needed it has to be escaped by a dollar sign. variable with view A view can be specified if a variable should be unparsed under other than the current view (see ). unparse view variable definition Variables of user defined unparse view classes can be defined inside a rule (see ). Unparse Items [: "zero" a b->result { if (a->value == 0) } ${ a:infix$} %uviewvar prefix p1; a:p1 ];]]> For every operator, there is a default pattern which matches if no other pattern does. Its associated rule unparses all subtermssubterm. The unparse-method generated for every phylum can also be called explicitly. It has 3 arguments: the term to be unparsed, a printer and an unparse view. The names kc_printerkc_printer and kc_current_viewkc_current_view respectively refer to the printer and the view which are currently in use. Printer printer The user himself has to provide a printer function which satisfies his needs. Usually it prints to the standard output or into some file, and may take actions dependent on the view. void printer( const char* the_string, uview the_view ) { ... }; Since several printer instances may be needed also a printer functor can be specified as the printer. The printer functor class must be derived public from the class printer_functor_class. %{ HEADER /* redirection since no class definitions allowed in .k */ class example_printer : public printer_functor_class { public: void operator( ) ( const char* the_string, uview the_view ) { ... }; } %} Language Options language options Often a term tree has to be pretty printed into different but similar destination languages, which sometimes require only slightly different output to be generated. To avoid whole rule sets to be multiplied and to allow a more flexible choice concerning the destination language, the concept of language options has been introduced. Every unparse item can be preceded by a language option, which is a language name in angle brackets followed by a colon. That item will be unparsed only if the language specified is active. Languages are declared using the keyword %language%language and they are set by calling set_language(...)set_language(). The active language can be checked by calling is_language(...)is_language(). The language names must satisfy the requirements for &cpp; identifiers. Example demonstrates the application of language options. Language Options [: : "public " "class " name : " extends " : " : " base_name " {\n" class_body "}" : ";" "\n" ]; ]]> View Classes<indexterm significance="preferred" class="startofrange" id="idx:view_man"><primary>view</primary></indexterm> Rewriting and unparsing of each term is done under a certain view. The view serves as a means to further differentiate between rules when choosing one to apply. To be a match a rule must have the current view to appear in its view list, which is the left part of the right-hand side between the bracket and the colon. If no rule matches the rules with empty list are considered. Below is shown the general form of rules with views. " "<" rview_list ":" {operator|function} ">;" unparse_rule := pattern "->" "[" uview_list ":" unparse_items "];"]]> Views are declared by enumerating them after the keyword %rview%rview and %uview%uview for rewriting and unparsing respectively, separated by a space or a comma. These declarations enable &kpp; to check for view consistency, although it is possible to leave them out entirely. But that should be avoided, because then even simple misspellings in a view list cause the implicit introduction of new views. One view is predefined for rewriting and unparsing respectively, base_rviewbase_rview and base_uviewbase_uview, which is implicitly included in the empty view list. The view can be changed for a term by calling its rewrite/unparserewrite()-method with a new view argument. In unparse rules there the same can also be achieved by appending a colon and a new view name to a variable unparse item. Thus a whole subtree can be rewritten/unparsed under a different view, or even multiple times under changing views. Changing views allows to interlock several tasks on a certain subtree. <: a -> rewrite( demo1 ) >; %uview demo2 ; Plus( a, b ) -> [: a:demo2 { b->unparse( kc_printer, demo2 ); } ]; // both subterms of Plus are further unparsed under view demo2;]]> viewclassEvery view introduced by the user actually causes the generation of a view class and one view variable of the same name. Since the user cannot distinguish them, the generalizing term ‘view’ is used. For unparse views that may matter since the user can define his own unparse view classes. These are declared by enclosing the view name in parentheses. The user has to provide a class view_class derived from a generated class view_baseclass. In particular, that class may contain member variables to hold information persistent over several rules. The base class provides an equality operator (==) deciding whether two view variables are of the same class and a name-method returning the name of the view. No global view variable of the same name is generated for a user defined unparse view class. Variables of such a view are instantiated inside of unparse rules by %uviewvar%uviewvar and may bear the same name as their class. They can be used like the implicitly created view variables, but additionally provide all features of their class. Example shows the definition of an unparse view class and demonstrates its usage. User defined Unparse View Class [: %uviewvar number_count nc; // instantiate view variable c:nc // and unparse c with it { std::cout << "Numbers counted: " << nc.counter << std::endl; } ]; Plus( a, b ), Minus( a, b ), Mul( a, b ), Div( a, b ) -> [ number_count: a b ]; SimpleTerm( a ) -> [ number_count: a ]; Number -> [ number_count: { kc_current_view.counter++; } ]; Ident -> [ number_count: ];]]> View lists contain names of view classes, all other occurrences of views actually are view variables. The scope of an unparse view variable ends with its defining rule. Since its name is not known inside other rules there it can be accessed only by means of the name kc_current_viewkc_current_view, which always refers to the view variable currently used. Restrictions on &cpp; in &kpp; There are many places in a .k-file where &cpp; code can be used. But for some of them, the &kpp; processor allows only restricted use of &cpp; constructs. These places are listed in the following along with the restrictions they impose. .k-file Only function definitions are allowed. These must have no types as arguments which have compound names (for example no long int). &cpp; commentscomments are allowed everywhere in .k-files. &cpp; unparse item Almost arbitrary &cpp; code is allowed, that is, everything which is allowed inside a local &cpp; block. rewrite rule Only simple function calls are allowed, that is, calls which have as arguments only term variables and term literals, phylum operators and other simple function calls; in particular no &cpp; operators, except access of member functions. code redirection Arbitrary &cpp; code is allowed, but it has to be pure &cpp; since redirection code is not evaluated by &kpp;. There is a way to get around the restrictions of the &kpp; processor using macros. A macro is defined inside a code redirection, which the processor does not evaluate. Therefore it can be as complex as necessary, while the macro call inside the rewrite rule looks as simple as &kpp; wishes. Example defines a function for addition, which can be avoided when using a macro as in example . Macro Application in Rewriting <: SimpleTerm( Number( PLUS( a, b ) ) ) >; %{ KC_REWRITE /* code redirection */ #define PLUS( a, b ) mkinteger( ( a ) -> value + ( b ) -> value ) %}]]> Some generated functions return terms of the phylum abstract_phylum which have to be cast to the actual phylum. The &cpp; cast operators may be used also for phylum conversionphylumconversion but &kpp; provides phylum_castphylum_cast, a cast operator for phyla, which is better to use. Generated Code From the &kpp; definitions, rules and &cpp; code pieces, several classes and functions in pure &cpp; are generated and distributed over multiple files. Compiled, they will perform the desired tree handling. Additional code is needed to create the trees, probably created by scanner and parser generators, for instance Flex and &bison;. Generated Classes and Types The definition of phyla and operators result in generated &cpp; classes. But these should be of no further interest for the user since the phylum names can be used in &cpp; code as if being pointer types of these classes, the operators as if being &cpp; constructors. Every phylum has a const counterpart of the same name prefixed by c_, which is the only means to get a const phylum variable. Just for the sake of completeness, be it mentioned that every phylum corresponds to a class impl_phylum and every operator to a subclass impl_phylum_operator. All the classes are derived from a common base class which can be referred to as abstract_phylum. By adding constructors, methods or attributes to it, all phyla will be changed in that way. The interworking with &yacc;/&bison; requires a type YYSTYPE which will be generated by &kpp; when the option yystype is specified (see ) Smart-pointer<indexterm><primary>smart pointer</primary></indexterm> Memory often leaks when phylum operators are used in expressions, and that is sometimes hard to detect. The option smart-pointer enables a smart memory management which avoids unnecessary copying of terms and automatically frees memory of unused terms. This is achieved by using so called smart-pointers which do reference counting and allow to free a term if it is no longer referenced. An additional type is generated for every phylum with the suffix _ptr. Variables of such types are unnecessary ever to be freed. Avoid mixing them with variables of the usual types, especially never assign between them, because that is likely to cause memory access errors. Weak-pointer<indexterm><primary>weak pointer</primary></indexterm> The option weak-pointer extends the smart-pointer technique and supports a third type for every phylum. It gets prefix weak_ and suffix _ptr. Weak-pointer variables of a term will not contribute to the reference counting, such that the term already is freed if merely weak-pointers reference it yet. That is why they are only usefully employed in conjunction with smart-pointers. In contrast to usual variables, weak-pointers have their own reference counting, which allows to determine whether such a pointer dangles, that is points to a term already freed and thus is no longer valid. Generated Functions &kpp; generates a number of functions which are available wherever &cpp; code is allowed in .k-files. The table lists all these functions and the sections which contain a more detailed description. Generated Functions function see section append concat eq CSGIOread CSGIOwrite filter fprint fprintdot fprintdotepilogue fprintdotprologue free freelist ht_create_simple ht_assign ht_assigned ht_clear ht_delete is_nil function see section last length map merge mkcasestring mkinteger mknocasestring mkreal op_name phylum_name print reduce reverse rewrite set_subphylum subphylum unparse Common Functions Since abstract_phylum has an unparse-method defined and all phyla are derived from abstract_phylum all phyla have it. The same is true for some other methods. copy<indexterm><primary>copy()</primary></indexterm> abstract_phylum copy( bool copy_attributes ) const; The method copies this term completely, including its subterms. Since the result is always abstract_phylum it has to be casted to the phylum of this term. If true is specified as argument, the attributes are copied too. But beware! Merely the addresses are copied if the attributes are phyla or &cpp; pointers, that is, the new term references the attributes of the old one. eq<indexterm><primary>eq()</primary></indexterm> bool eq( c_abstract_phylum c_p ) const; The method returns true if this term is structurally equal to the argument, that is, both terms have equal subtrees. fprint<indexterm><primary>fprint()</primary></indexterm> void fprint( FILE* file ); The method prints a textual presentation of this term to the specified file. This simple example produces the output underneath. simpleterm a_number = SimpleTerm( mkinteger( 23 ) ); a_number -> print( ); SimpleTerm( Number( 23 ) ) fprintdot<indexterm><primary>fprintdot()</primary></indexterm> void fprintdot( FILE *f, const char *root_label_prefix, const char *edge_label_prefix, const char *edge_attributes, bool print_node_labels, bool use_context_when_sharing_leaves, bool print_prologue_and_epilogue ) const; This function creates a representation of the term in a format understood by the program dot, which is part of the graphics package graphviz and draws directed acyclic graphs in various output formats like PostScript or GIF. The target of the operation is the file f, while the other arguments control the details of the graphs appearence. root_label_prefix Adds a label to the graph denoting the root term. The label has the name of the phylum of that term prefixed by this string argument. edge_label_prefix Every edge in the graph is labelled with a number. This string argument appears as the prefix of these labels. edge_attributes For dot, the edges can have attributes which specify additional features like font name or edge colour (see dot manual for attribute names and values). This string argument is a list of attribute/value pairs (attribute=value), separated by commas. print_node_labels If this argument is set to true, the names of the subterms phyla appear in the graph. Otherwise they are suppressed and only the term names (operator names) are printed. use_context_when_sharing_leaves Terms which are shared in the tree usually appear only once in the graph (terms of uniq phyla). In particular, terms of predefined phyla are shared if they are equal. They are always leaves since they have no subphyla. If this argument is set to true, the leaves appear shared in the graph only if they are subterms of shared (uniq) terms. print_prologue_and_epilogue This argument is usually set to true since a certain prologue and epilogue are necessary to frame the graph. This is set to false if multiple graphs are to be grouped into one figure. In that case the prologue function has to be called explicitly, then some fprintdot calls follow, and finally the epilogue call finishes the figure creation. The following call of fprintdot writes a a presentation of the term t to a file exa. From that file dot creates a graph like that in figure . aterm t = Plus( SimpleTerm( Number( mkinteger( 7 ) ) ), SimpleTerm( Number( mkinteger( 7 ) ) ) ); t -> fprintdot(exa, "root_", "edge", "style=dashed", true, false, true); fprintdotprologue<indexterm><primary>fprintdotprologue()</primary></indexterm> void fprintdotprologue ( FILE *f ); This function writes the prologue to f, which is needed to set up graphviz. Usually, when the figure contains only one graph, this function will be called implicitly by fprintdot; call this function when you set print_prologue_and_epilogue to false in the function call above. fprintdotepilogue<indexterm><primary>fprintdotepilogue()</primary></indexterm> void fprintdotepilogue ( FILE *f ); This function writes the epilogue to f, which is needed to finish the graph for graphviz. Usually, when the figure contains only one graph, this function will be called implicitly by fprintdot; call this function when you set print_prologue_and_epilogue to false in the function call above. op_name<indexterm><primary>op_name()</primary></indexterm> const char* op_name( ) const; This function returns the name of the phylum operator which has been used to create this term. phylum_name<indexterm><primary>phylum_name()</primary></indexterm> const char* phylum_name( ) const; This function returns the name of the phylum of this term. print<indexterm><primary>print()</primary></indexterm> void print( ); This function prints a textual presentation of this term to the standard output. It is similar to the output of fprint. set_subphylum<indexterm><primary>set_subphylum()</primary></indexterm> void set_subphylum( int n, abstract_phylum p, bool=false ); This function replaces the nth subterm of this term by term p, which must be of a phylum castable to the phylum of the appropriate subterm. Numbering starts with 0. subphylum<indexterm><primary>subphylum()</primary></indexterm> abstract_phylum subphylum( int n, bool=false ) const; This function returns the nth subterm of this term. Numbering starts with 0. unparse<indexterm significance= "preferred"><primary>unparse()</primary></indexterm> void unparse( printer_functor pf, uview uv); void unparse( printer_function opf, uview uv ); This function starts unparsing for this term. It is recursively called for every subterm. Unparsing is processed under the specified unparse view, and the strings to output are delivered to the printer functor or function respectively. rewrite<indexterm><primary>rewrite()</primary></indexterm> rewrite( rview rv ); ]]> This functions starts rewriting for this term. It returns a new term of the actual phylum. Usually it is called at the root term whereupon the entire tree is searched under the specified view. CSGIO Functions The generated files csgiok.h and csgiok.cccsgiok.cc,h provide means to write terms to files and to reconstruct terms from such files. Whole term trees thus can be saved and exchanged between different applications. Reading and writing is performed by two functions. The format of the files has once been designed to be compatible to the structure files of the commercial tool Synthesizer Generator. The format written now by &kpp; is somewhat extended so that they are not compatible any more, but old structure files are expected to be still understood. CSGIOwrite<indexterm><primary>CSGIOwrite()</primary></indexterm> The methods writes this term to f, that is, the entire subterm tree. The attributes are ignored except they are phyla which have been defined using the keyword %member. CSGIOread<indexterm><primary>CSGIOread()</primary></indexterm> void CSGIOread( FILE *f, P &p ) ]]> The function reads from f the presentation of a term. The term is constructed by successively calling the appropriate operators of the subterms. The operators initialize the attributes according to the phylum definition; except the %member-attributes which get their values from the saved term. The created term is assigned to p which has to be a variable of the correct phylum. Creation Functions Terms of predefined phyla are created by functions. mkcasestring<indexterm significance= "preferred"><primary>mkcasestring()</primary></indexterm> casestring mkcasestring( const char *str ); casestring mkcasestring( const char *str, unsigned int length ); The function creates a term of the phylum casestring from the specified string. Upper and lower case characters are distinguished. The second variant uses only the first length characters of the specified string. mkinteger<indexterm significance= "preferred"><primary>mkinteger()</primary></indexterm> integer mkinteger( const INTEGER i ); The function creates a term of the phylum integer from the specified value. INTEGER is a macro which can be defined by the user as needed but defaults to int. mknocasestring<indexterm significance= "preferred"><primary>mknocasestring()</primary></indexterm> nocasestring mknocasestring( const char *str ); nocasestring mknocasestring( const char *str, unsigned int length ); The function creates a term of the phylum nocasestring from the specified string. Upper and lower case characters are not distinguished. The second variant uses only the first length characters of the specified string. mkreal<indexterm significance= "preferred"><primary>mkreal()</primary></indexterm> real mkreal( const REAL r ); The function creates a term of the phylum real from the specified value. REAL is a macro which can be defined by the user as needed but defaults to double. Memory Management Functions When terms, once constructed, are no longer needed it is usually reasonable to free the memory they allocate, especially when dealing with large numbers of terms. The same does not hold not for the use of smart-pointers, because these keep track of allocated memory by their own. Never apply free or freelist to smart-pointers. The &cpp; delete should never be applied to any term, since that would get around some &kpp; mechanisms. free<indexterm><primary>free()</primary></indexterm> void free( bool recursive=true ); The method frees the memory allocated by this term and by default it frees also the subterms recursively. When it is applied to a list term, the whole list and all its elements are freed. The non-recursive form only separates the list into its first element and the remainder of the list. Terms of phyla under non-default storage management can not be freed individually, calling free on them has no effect. freelist<indexterm><primary>freelist()</primary></indexterm> void freelist( ); The method frees the spine of this list term and leaves the list elements untouched. Hashtable Functions hashtables The memory management of terms of storage class uniquniq or a user defined one can only be influenced by hashtable operations. ht_create_simple<indexterm><primary>ht_create_simple()</primary></indexterm> hashtable_t ht_create_simple ( int size ); The function creates a new hashtable and returns it. The current implementation ignores the size argument. ht_assign<indexterm><primary>ht_assign()</primary></indexterm> hashtable_t ht_assign ( hashtable_t ht, storageclass_t sc, bool still_unique=false ); The function assigns the hashtable ht to the storage class sc and returns the hashtable which has previously been assigned to sc. ht_assigned<indexterm><primary>ht_assigned()</primary></indexterm> hashtable_t ht_assigned ( storageclass_t sc ); The function returns the hashtable which is assigned to the storageclass sc. ht_clear<indexterm><primary>ht_clear()</primary></indexterm> void ht_clear ( hashtable_t ht ); The function removes all entries from the hashtable ht. ht_delete<indexterm><primary>ht_delete()</primary></indexterm> void ht_delete ( hashtable_t ht ); The function deletes the hashtable ht entirely. List Functions<indexterm id="idx:listfunc" class="startofrange"><primary>list</primary></indexterm> List phyla which have been defined using the list keyword get some methods performing convenient tasks. In the function signatures, the name ]]> denotes the actual list phylum, ]]> denotes the phylum of the list elements. append<indexterm><primary>append()</primary></indexterm> append( p );]]> The method appends the specified term to this list and returns the tail of the new list, ie. the sublist that has p as its only element. This make appending several elements in a row more efficient. concat<indexterm><primary>concat()</primary></indexterm> concat( c_ l1, c_ l2 );]]> The function constructs a new list from the terms of l1 followed by the terms of l2 and returns that list. filter<indexterm><primary>filter()</primary></indexterm> filter( bool (*fp) () );]]> The method constructs a new list from the terms of this list for which the function fp yields true. is_nil<indexterm><primary>is_nil()</primary></indexterm> The method returns true if this list is empty. last<indexterm><primary>last()</primary></indexterm> last( ) const;]]> The method returns the remainder of this list which contains only one, the last, element. If this list is empty the empty list is returned. length<indexterm><primary>length()</primary></indexterm> The method returns the number of elements in this list. map<indexterm><primary>map()</primary></indexterm> map( (*fp) () );]]> The method constructs a new list containing the terms which are returned by the fp which is called for every element of this list. The new list is returned. merge<indexterm><primary>merge()</primary></indexterm> merge( l, (*fp) (, ) );]]> The method constructs a new list containing the terms which are returned by the fp which is called for every element of this list taking the second argument from the specified list. The new list is returned. reduce<indexterm><primary>reduce()</primary></indexterm> reduce( p, (*fp) (, ) );]]> The method successively applies the function fp to each element of this list and fps last result which initially is the term p. The final result is returned. reverse<indexterm><primary>reverse()</primary></indexterm> reverse( ) const;]]> The method constructs a new list which contains the elements of this list in reverse order. The new list is returned. Generated Files The generated code is spread over several files. The table lists these files and a description of their contents. Every file defines a macro symbol which can be used in preprocessor instructions and in code redirectionsredirection. These symbols are listed as well. From every &kpp; file a &cpp; file and a header file are generated. The name file in the table refers to such files. Generated files file symbol contents csgiok.cccsgiok.cc,h KC_CSGIO functions for saving and restoring of terms csgiok.h KC_CSGIO_HEADER some definitions for saving and restoring of terms k.cck.cc,h KC_TYPES implementation of all classes generated from phylum definitions k.h KC_TYPES_HEADER all class declarations generated from phylum definitions; included by all implicitly generated files rk.ccrk.cc,h KC_REWRITE rewrite methods for all phyla rk.h KC_REWRITE_HEADER rewrite view class definitions unpk.ccunpk.cc,h KC_UNPARSE unparse methods for all phyla unpk.h KC_UNPARSE_HEADER unparse view class definitions file.cc KC_FUNCTIONS_file or CODE function definitions from file.k-file file.h KC_FUNCTIONS_file_HEADER or HEADER declarations of functions from file.k-file Code Redirection<indexterm significance="preferred"><primary>redirection</primary></indexterm> A .k-file can contain pieces of arbitrary &cpp; enclosed between a line starting with %{%{ and one starting with %}%}. Since it will not be parsed by &kpp; but copied directly into generated code, it can not contain special &kpp; constructs, but merely pure &cpp;. It will go to the matching .cc-file, if no redirection is specified. Giving a list of file symbols after %{ will copy the code each of the specified files instead. The available redirection symbols are listed in table . // this be a file example.k %{ // everything between the brace lines will be copied to example.cc %} %{ HEADER KC_UNPARSE /* beware of //-comments here */ // everything between the brace lines will be copied to example.h and unpk.cc %} Running &kpp; The &kpp; processor is invoked with the command kc++. It can be invoked on any number of .k-files and will create a number of output files as outlined above. A typical call looks like this: kc++ abstr.k rpn.k main.k When used together with other tools (see ) a makefile is helpful. Be aware, though, that every source file may influence every generated file (because of the code redirections). Thus multiple destination files depend on multiple source files. That means the makefile becomes more complicated in order to handle these dependencies. That is why an example makefile is provided in appendix (see ). It is sufficient for the RPN example and may easily be adapted for many more. Options<indexterm id="idx:options" class="startofrange"><primary>options</primary></indexterm> &kpp; recognizes a number of command line options which affect the process of parsing and code generation, some rather drastically. Table presents, in alphabetical order, all available options and their explanation. In most environments, two forms are provided, short and GNU style long options. Suppose you do not need CSGIO input/output, but want to interface with your favourite compiler compiler, you might use: kc++ --no-csgio --yystype abstr.k rpn.k main.k Some vital options can be specified directly in &kpp; using the keyword %option. Such specified options take higher priority than command line options and thus override them. Table lists them in alphabetical order. They behave like their command line counterparts. A line like this could be specified in a &kpp; file: %option yystype smart-pointer Command line options option explanation –c ––no–csgio do not generate phylum read/write functions (csgiok.{h,cc}) –r ––no–rewrite do not generate code for rewrite rules (rk.{h,cc}) –u ––no–unparse do not generate code for unparse rules (unpk.{h,cc}) –d ––no–printdot no fprintdot functions are generated –t ––no–hashtables do not generate code for hashtable operations –n ––covariant=C use covariant return types: y|n|p (yes, no or generate both and decide per preprocessor macro NO_COVARIANT_RETURN) ––stdafx[=FILE] generate include for Microsoft precompiled header files (default stdafx.h) –e ––dllexport=STRING generates string between keyword class and the class name of all operators and phyla –m ––smart–pointer generates code for smart pointers (reference counting) –w ––weak–pointer generates code for weak pointers (implies smart pointers) –s ––suffix=EXT extension for generated source files (default .cc) –f ––file–prefix=PREF prefix all generated files –o ––overwrite always write generated files even if not changed –b ––yystype[=FILE] generate file (default yystype.h) containing YYSTYPE, for &yacc; and &bison; –y ––yxxunion generate file yxx_union.h) for use with for Yacc++. –l ––no–linedirec omit the line directives (#line) altogether ––comment–line change line directives to mere comments ––dir–line prepends the current working directory to the file name in line directives –p ––pipe=CMD process all files while piping them through CMD –M ––msg–format=PAT specifies format of (error) messages, PAT can contain: %p (program name), %s (severity), %f (file name), %d (current working directory), %l (line number), %c (column); the actual message is appended –q ––quiet quiet operation (is default) –v ––verbose print additional status information while processing –h ––help display the help and exit –V ––version output version information and exit Built-in options no–csgiono–hashtablesno–printdotno–rewriteno–unparsesmart–pointerweak–pointeryystype &yacc;/&bison; Interfacing with a compiler generator is useful when a tree should be build from some kind of input. &kpp; provides the yystype-option (see ) which causes the generation of a header file needed by &yacc; to cooperate. For every token found, the desired &kpp; operator is called to create a term. If lex/flex is used too, &yacc; has to be run with the option which causes the generation of an other header file needed by lex/flex (–d for &bison;). Appropriate files for the example can be found in appendix . The makefile uses implicit rules for flex and &bison;.