| \input texinfo @c -*-texinfo-*- |
| @c %** start of header |
| @setfilename treecc.info |
| @settitle Tree Compiler-Compiler |
| @setchapternewpage off |
| @c %** end of header |
| |
| @dircategory DotGNU |
| @direntry |
| * TreeCC: (treecc). Generate code for compilers to build |
| abstract syntax trees. |
| @end direntry |
| |
| @ifinfo |
| The treecc program converts descriptions of abstract syntax |
| trees into source code that can be used to support compiler |
| development. |
| |
| @noindent |
| Copyright @copyright{} 2001 Southern Storm Software, Pty Ltd |
| @*Copyright @copyright{} 2003 Free Software Foundation, Inc. |
| @end ifinfo |
| |
| @titlepage |
| @sp 10 |
| @center @titlefont{Tree Compiler-Compiler} |
| |
| @vskip 50pt |
| |
| @center{Rhys Weatherley} |
| |
| @vskip 50pt |
| @center{Copyright @copyright{} 2001, 2002 Southern Storm Software, Pty Ltd} |
| @center{Copyright @copyright{} 2003 Free Software Foundation, Inc.} |
| @end titlepage |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Top, Overview, , (dir) |
| @menu |
| * Overview:: Treecc in brief |
| * Expression Example:: A simple example of using treecc |
| * Invoking Treecc:: Invoking treecc from the command-line |
| * Syntax:: Syntax of input files |
| * Line Tracking:: Tracking line numbers in source files |
| * Output APIs:: API's that are available in the generated output |
| * Full Expression Example:: Full code for the expression example |
| * EBNF Syntax:: Full EBNF syntax for treecc input files |
| * Index:: Index of concepts and facilities |
| @end menu |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Overview, Expression Example, Top, Top |
| @chapter Overview |
| @cindex Overview |
| |
| @section Introduction |
| |
| Traditional compiler construction tools such as lex and yacc focus on |
| the lexical analysis and parsing phases of compilation. But they |
| provide very little to support semantic analysis and code generation. |
| |
| Yacc allows grammar rules to be tagged with semantic actions and values, |
| but it doesn't provide any routines that assist in the process of tree |
| building, semantic analysis, or code generation. Because those processes |
| are language-specific, yacc leaves the details to the programmer. |
| |
| Support for semantic analysis was also a lot simpler in the languages |
| that were prevalent when lex and yacc were devised. C and Pascal |
| require declare before use, which allows the semantic information |
| about a statement to be determined within the parser at the point of |
| use.@footnote{K&R C did allow functions that weren't declared to be called, |
| but only if they returned an "int". This allowed the compiler to |
| guess the declaration if it wasn't available, and to proceed as |
| though all symbols were declared before use.} If extensive optimization |
| is not required, then code generation can also be performed within |
| the grammar, leading to a simple one-pass compiler structure. |
| |
| Modern languages allow deferred declaration of methods, fields, and |
| types. For example, Java allows a method to refer to a field that |
| is declared further down the .java source file. A field can be |
| declared with a type whose class definition has not yet been parsed. |
| |
| Hence, most of the semantic analysis that used to be performed inline |
| within a yacc grammar must now be performed after the entire program |
| has been parsed. Tree building and walking is now more important |
| than it was in older declare before use languages. |
| |
| @section Tree walking: the need for something better |
| |
| Building parse tree data structures and walking them is not terribly |
| difficult, but it is extremely time-consuming and error-prone. A |
| modern programming language may have hundreds of node types, divided |
| into categories for statements, expressions, types, declarations, etc. |
| When a new programming language is being devised, new node types may |
| be added quite frequently. This has ramifications in trying to manage |
| the code's complexity.@footnote{Implementing an existing programming |
| language has the same problems as a new language. Most programming |
| languages are too large to be implemented all at once, and so the problem |
| must be tackled in stages. These stages are very similar to those the |
| programmer goes through implementing a new language.} |
| |
| For example, consider nodes that correspond to programming language |
| types in a C-like language. There will be node types for integer |
| types, floating-point types, pointers, structures, functions, etc. |
| There will be semantic analysis routines for testing types for |
| equality, comparing types for coercions and casts, evaluating the |
| size of a type for memory layout purposes, determining if the type |
| falls into a general category such as "integer" or "pointer", etc. |
| |
| Let's say we wanted to add a new "128-bit integer" type to this |
| language. Adding a new node type is fairly straight-forward. |
| But we also need to track down every place in the code where the |
| compiler walks a type or deals with integers and add an appropriate |
| case for the new type. This is very error-prone. Such code is |
| likely to be split over many files, and good coding practices only |
| help to a certain extent. |
| |
| This problem gets worse when new kinds of expressions and statements |
| are added to the language. The change not only affects semantic |
| analysis, but also optimization and code generation. Some compilers |
| use multiple passes over the tree to perform optimization, with |
| different algorithms used in each pass. Code generation may use a |
| number of different strategies, depending upon how an expression or |
| statement is used. If even one of these places is missed when the |
| new node type is added, then there is the potential for a very nasty |
| bug that may go unnoticed for months or years. |
| |
| Object-oriented languages such as C++ can help a bit in constructing |
| robust tree structures. The base class can declare abstract methods |
| for any semantic analysis, optimization, or code generation routine |
| that needs to be implemented for all members of the node category. |
| But another code maintainence problem arises. What happens when |
| we want to add a new optimization pass in the future? We must go |
| into hundreds of classes and implement the methods. |
| |
| To avoid changing hundreds of classes, texts on Design Patterns |
| suggest using a Visitor pattern. Then the new optimization pass |
| can be encapsulated in a visitor. This would work, except for |
| the following drawback of visitor patterns, as described in Gamma, |
| et al: |
| |
| @quotation |
| @emph{The Visitor pattern makes it hard to add new subclasses of |
| Element. Each new ConcreteElement gives rise to a new abstract |
| operation on Visitor and a corresponding implementation in |
| every ConcreteVisitor class.} |
| |
| @emph{... The Visitor class hierarchy can be difficult to maintain |
| when new ConcreteElement classes are added frequently. In such |
| cases, it's probably easier just to define operations on the |
| classes that make up the structure.} |
| @end quotation |
| |
| That is, if we add a new node type in the future, we have a large |
| maintainence problem on our hands. The solution is to scatter the |
| implementation through-out every class, which is the situation we |
| were trying to avoid by using the Visitor pattern. |
| |
| Because compiler construction deals with a large set of rapidly |
| changing node types and operations, neither of the usual approaches |
| work very well. |
| |
| The ideal programming language for designing compilers needs to have |
| some way to detect when the programmer forgets to implement an operation |
| for a new node type, and to ensure that a new operation covers all |
| existing node types adequately. Existing OO languages do not perform |
| this kind of global error checking. What few checking procedures they |
| have change the maintainence problem into a different problem of |
| similar complexity. |
| |
| @section Aspect-oriented programming |
| |
| A new field in language design has emerged in recent years called |
| "Aspect-Oriented Programming" (AOP). A good review of the field |
| can be found in the October 2001 issue of the @emph{Communications of |
| the ACM}, and on the AspectJ Web site, @url{http://www.aspectj.org/}. |
| |
| The following excerpt from the introduction to the AOP section in the |
| CACM issue describes the essential aspects of AOP, and the difference |
| between OOP and AOP: |
| |
| @quotation |
| @emph{AOP is based on the idea that computer systems are better programmed |
| by separately specifying the various concerns (properties or areas |
| of interest) of a system and some description of their relationships, |
| and then relying on mechanisms in the underlying AOP environment to |
| weave or compose them together into a coherent program. ... |
| While the tendancy in OOP's is to find commonality among classes |
| and push it up the inheritance tree, AOP attempts to realize |
| scattered concerns as first-class elements, and eject them |
| horizontally from the object structure.} |
| @end quotation |
| |
| Aspect-orientation gives us some hope of solving our compiler |
| complexity problems. We can view each operation on node types |
| (semantic analysis, optimization, code generation, etc) as an |
| "aspect" of the compiler's construction. The AOP language weaves |
| these aspects with the node types to create the final compiler. |
| |
| @section The treecc approach |
| |
| We don't really want to implement a new programming language |
| just for compiler construction. Especially since the new language's |
| implementation would have all of the problems described above and would |
| therefore also be difficult to debug and maintain. |
| |
| The approach that we take with "treecc" is similar to that used by |
| "yacc". A simple rule-based language is devised that is used to describe |
| the intended behaviour declaratively. Embedded code is used to provide |
| the specific implementation details. A translator then converts the input |
| into source code that can be compiled in the usual fashion. |
| |
| The translator is responsible for generating the tree building and |
| walking code, and for checking that all relevant operations have been |
| implemented on the node types. Functions are provided that make |
| it easier to build and walk the tree data structures from within |
| a "yacc" grammar and other parts of the compiler. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Expression Example, Invoking Treecc, Overview, Top |
| @chapter A simple example for expressions |
| @cindex Expression example |
| |
| Consider the following yacc grammar for a simple expression language: |
| |
| @example |
| %token INT FLOAT |
| |
| %% |
| |
| expr: INT |
| | FLOAT |
| | '(' expr ')' |
| | expr '+' expr |
| | expr '-' expr |
| | expr '*' expr |
| | expr '/' expr |
| | '-' expr |
| ; |
| @end example |
| |
| (We will ignore the problems of precedence and associativity and |
| assume that the reader is familiar with how to resolve such issues |
| in yacc grammars). |
| |
| There are 7 types of nodes for this grammar: @samp{intnum}, @samp{floatnum}, |
| @samp{plus}, @samp{minus}, @samp{multiply}, @samp{divide}, and @samp{negate}. |
| They are defined in treecc as follows: |
| |
| @example |
| %node expression %abstract %typedef |
| |
| %node binary expression %abstract = |
| @{ |
| expression *expr1; |
| expression *expr2; |
| @} |
| |
| %node unary expression %abstract = |
| @{ |
| expression *expr; |
| @} |
| |
| %node intnum expression = |
| @{ |
| int num; |
| @} |
| |
| %node floatnum expression = |
| @{ |
| float num; |
| @} |
| |
| %node plus binary |
| %node minus binary |
| %node multiply binary |
| %node divide binary |
| %node negate unary |
| @end example |
| |
| We have introduced three extra node types that refer |
| to any expression, binary expressions, and unary expressions. These |
| can be seen as superclasses in an OO-style framework. We have |
| declared these node types as @samp{abstract} because the yacc grammar |
| will not be permitted to create instances of these classes directly. |
| |
| The @samp{binary}, @samp{unary}, @samp{intnum}, and @samp{floatnum} |
| node types have field definitions associated with them. These have |
| a similar syntax to C @code{struct} declarations. |
| |
| The yacc grammar is augmented as follows to build the parse tree: |
| |
| @example |
| %union @{ |
| expression *node; |
| int inum; |
| float fnum; |
| @} |
| |
| %token INT FLOAT |
| |
| %type <node> expr |
| %type <inum> INT |
| %type <fnum> FLOAT |
| |
| %% |
| |
| expr: INT @{ $$ = intnum_create($1); @} |
| | FLOAT @{ $$ = floatnum_create($1); @} |
| | '(' expr ')' @{ $$ = $2; @} |
| | expr '+' expr @{ $$ = plus_create($1, $3); @} |
| | expr '-' expr @{ $$ = minus_create($1, $3); @} |
| | expr '*' expr @{ $$ = multiply_create($1, $3); @} |
| | expr '/' expr @{ $$ = divide_create($1, $3); @} |
| | '-' expr @{ $$ = negate_create($2); @} |
| ; |
| @end example |
| |
| The treecc translator generates the @samp{*_create} functions so that |
| the rest of the compiler can build the necessary data structures |
| on demand. The parameters to the @samp{*_create} functions |
| are identical in type and order to the members of the structure for |
| that node type. |
| |
| Because @samp{expression}, @samp{binary}, and @samp{unary} are abstract, |
| there will be no @samp{*_create} functions associated with them. This will |
| help the programmer catch certain kinds of errors. |
| |
| The type that is returned from a @samp{*_create} function is the first |
| superclass of the node that has a @samp{%typedef} keyword associated with it; |
| @samp{expression *} in this case. |
| |
| @section Storing extra information |
| |
| Normally we will want to store extra information with a node beyond |
| that which is extracted by the yacc grammar. In our expression |
| example, we probably want to store type information in the nodes |
| so that we can determine if the whole expression is integer or |
| floating point during semantic analysis. We can add type information |
| to the @samp{expression} node type as follows: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type; |
| @} |
| @end example |
| |
| The @samp{%nocreate} flag indicates that the field should not be passed |
| to the @samp{*_create} functions as a parameter. i.e. it provides semantic |
| information that isn't present in the grammar. When nodes are created, |
| any fields that are declared as @samp{%nocreate} will be undefined in value. |
| A default value can be specified as follows: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type = @{int_type@}; |
| @} |
| @end example |
| |
| Default values must be enclosed in @samp{@{} and @samp{@}} because they are |
| pieces of code in the underlying source language (C, C++, etc), instead |
| of tokens in the treecc syntax. Any legitimate expression in the |
| underlying source language may be used. |
| |
| We also need to arrange for @samp{type_code} to be declared. One way to |
| do this is by adding a @samp{%decls} section to the front of the treecc |
| input file: |
| |
| @example |
| %decls %@{ |
| |
| typedef enum |
| @{ |
| int_type, |
| float_type |
| |
| @} type_code; |
| |
| %@} |
| @end example |
| |
| We could have introduced the definition by placing a @samp{#include} |
| directive into the @samp{%decls} section instead, or by defining a |
| treecc enumerated type: |
| |
| @example |
| %enum type_code = |
| @{ |
| int_type, |
| float_type |
| @} |
| @end example |
| |
| Now that we have these definitions, type-inferencing can be implemented |
| as follows: |
| |
| @example |
| %operation void infer_type(expression *e) |
| |
| infer_type(binary) |
| @{ |
| infer_type(e->expr1); |
| infer_type(e->expr2); |
| |
| if(e->expr1->type == float_type || e->expr2->type == float_type) |
| @{ |
| e->type = float_type; |
| @} |
| else |
| @{ |
| e->type = int_type; |
| @} |
| @} |
| |
| infer_type(unary) |
| @{ |
| infer_type(e->expr); |
| e->type = e->expr->type; |
| @} |
| |
| infer_type(intnum) |
| @{ |
| e->type = int_type; |
| @} |
| @end example |
| |
| This example demonstrates using the abstract node types @samp{binary} and |
| @samp{unary} to define operations on all subclasses. The treecc translator |
| will generate code for a full C function called @samp{infer_type} that |
| incorporates all of the cases. |
| |
| But hang on a second! What happened to @samp{floatnum}? Where did it |
| go? It turns out that treecc will catch this. It will report |
| an error to the effect that @samp{node type `floatnum' is not handled in |
| operation `infer_type'}. Here is its definition: |
| |
| @example |
| infer_type(floatnum) |
| @{ |
| e->type = float_type; |
| @} |
| @end example |
| |
| As we can see, treecc has just caught a bug in the language |
| implementation and reported it to us as soon as we introduced it. |
| |
| Let's now extend the language with a @samp{power} operator: |
| |
| @example |
| yacc: |
| |
| expr: expr '^' expr @{ $$ = create_power($1, $3); @} |
| ; |
| |
| treecc: |
| |
| %node power binary |
| @end example |
| |
| That's all there is to it! When treecc re-translates the input |
| file, it will modify the definition of @samp{infer_type} to include the |
| extra case for @samp{power} nodes. Because @samp{power} is a subclass of |
| @samp{binary}, treecc already knows how to perform type inferencing for the |
| new node and it doesn't warn us about a missing declaration. |
| |
| What if we wanted to restrict the second argument of @samp{power} to be |
| an integer value? We can add the following case to @samp{infer_type}: |
| |
| @example |
| infer_type(power) |
| @{ |
| infer_type(e->expr1); |
| infer_type(e->expr2); |
| |
| if(e->expr2->type != int_type) |
| @{ |
| error("second argument to `^' is not an integer"); |
| @} |
| |
| e->type = e->expr1->type; |
| @} |
| @end example |
| |
| The translator now notices that there is a more specific implementation |
| of @samp{infer_type} for @samp{power}, and won't use the @samp{binary} |
| case for it. |
| |
| The most important thing to realise here is that the translator always |
| checks that there are sufficient declarations for @samp{infer_type} to cover |
| all relevant node types. If it detects a lack, it will immediately |
| raise an error to the user. This allows tree coverage problems to |
| be found a lot sooner than with the traditional approach. |
| |
| @xref{Full Expression Example}, for a complete listing of the above |
| example files. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Invoking Treecc, Syntax, Expression Example, Top |
| @chapter Invoking treecc from the command-line |
| @cindex Invoking treecc |
| @cindex Command-line options |
| |
| The general form of treecc's command-line syntax is as follows: |
| |
| @example |
| treecc [OPTIONS] INPUT ... |
| @end example |
| |
| Treecc accepts the following command-line options: |
| |
| @table @code |
| @item -o FILE |
| @itemx --output FILE |
| Set the name of the output file to @samp{FILE}. If this option is not |
| supplied, then the name of the first input file will be used, with its |
| extension changed to @samp{.c}. If the input is standard input, |
| the default output file is @samp{yy_tree.c}. |
| |
| This option may be overridden using the @samp{%output} keyword in |
| the input files. |
| |
| @item -h FILE |
| @itemx --header FILE |
| Set the name of the header output file to @samp{FILE}. This is only |
| used for the C and C++ output languages. If this option is not supplied, |
| then the name of the output file will be used, with its extension |
| changed to @samp{.h}. If the input is standard input, the default header |
| output file is @samp{yy_tree.h}. |
| |
| This option may be overriden using the @samp{%header} keyword in the |
| input files. If this option is used with a language that does not require |
| headers, it will be ignored. |
| |
| @item -d DIR |
| @itemx --output-dir DIR |
| Set the name of the Java output directory to @samp{DIR}. This is only |
| used for the Java language. If this option is not supplied, then the |
| directory corresponding to the first input file is used. If the input |
| is standard input, the default is the current directory. |
| |
| This option may be overriden using the @samp{%outdir} keyword in the |
| input files. If this option is used with a language other than Java, |
| it will be ignored. |
| |
| @item -e EXT |
| @itemx --extension EXT |
| Change the default output file extension to @samp{ext}, instead of |
| @samp{.c}. The value @samp{ext} can have a leading dot, but this is |
| not required. |
| |
| @item -f |
| @itemx --force-create |
| Treecc normally attempts to optimise the creation of output files |
| so that they are only modified if a non-trivial change has |
| occurred in the input. This can reduce the number of source |
| code recompiles when treecc is used in combination with make. |
| |
| This option forces the output files to be created, even if they |
| are the same as existing files with the same name. |
| |
| The declaration @samp{%option force} can be used in the input files |
| to achieve the same effect as this option. |
| |
| @item -O OPT |
| @itemx --option OPT |
| Set a treecc option value. This is a command-line version of |
| the @samp{%option} keyword in the input files. |
| |
| @item -n |
| @itemx --no-output |
| Suppress the generation of output files. Treecc parses the |
| input files, checks for errors, and then stops. |
| |
| @item --help |
| Print a usage message for the treecc program. |
| |
| @item -v |
| @itemx --version |
| Print the version of the treecc program. |
| |
| @item -- |
| Marks the end of the command-line options, and the beginning of |
| the input filenames. You may need to use this if your filename |
| begins with @samp{-}. e.g. @samp{treecc -- -input.tc}. This is |
| not needed if the input is standard input: @samp{treecc -} |
| is perfectly valid. |
| @end table |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Syntax, Nodes, Invoking Treecc, Top |
| @chapter Syntax of input files |
| @cindex Syntax |
| |
| Treecc input files consist of zero or more declarations that define |
| nodes, operations, options, etc. The following sections describe each |
| of these elements. |
| |
| @menu |
| * Nodes:: Node declarations |
| * Types:: Types used in fields and parameters |
| * Enumerations:: Enumerated type declarations |
| * Operations:: Operation declarations |
| * Options:: Options that modify treecc's behaviour |
| * Literal Code:: Literal code declarations |
| * Changing Files:: Changing input and output files |
| @end menu |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Nodes, Types, Syntax, Syntax |
| @section Node declarations |
| @cindex Nodes |
| @cindex %node keyword |
| @cindex Fields |
| |
| Node types are defined using the @samp{node} keyword in input files. |
| The general form of the declaration is: |
| |
| @example |
| %node NAME [ PNAME ] [ FLAGS ] [ = FIELDS ] |
| @end example |
| |
| @table @samp |
| @item NAME |
| An identifier that is used to refer to the node type elsewhere |
| in the treecc definition. It is also the name of the type that will be |
| visible to the programmer in literal code blocks. |
| |
| @item PNAME |
| An identifier that refers to the parent node type that @samp{NAME} inherits |
| from. If @samp{PNAME} is not supplied, then @samp{NAME} is a top-level |
| declaration. It is legal to supply a @samp{PNAME} that has not yet |
| been defined in the input. |
| |
| @item FLAGS |
| Any combination of @samp{%abstract} and @samp{%typedef}: |
| |
| @table @samp |
| @item %abstract |
| @cindex %abstract keyword |
| The node type cannot be constructed by the programmer. In addition, |
| the programmer does not need to define operation cases for this node |
| type if all subtypes have cases associated with them. |
| |
| @item %typedef |
| @cindex %typedef keyword |
| The node type is used as the common return type for node creation |
| functions. Top-level declarations must have a @samp{%typedef} keyword. |
| @end table |
| @end table |
| |
| The @samp{FIELDS} part of a node declaration defines the fields that |
| make up the node type. Each field has the following general form: |
| |
| @example |
| [ %nocreate ] TYPE FNAME [ = VALUE ] ';' |
| @end example |
| |
| @table @samp |
| @item %nocreate |
| @cindex %nocreate keyword |
| The field is not used in the node's constructor. When the node is |
| constructed, the value of this field will be undefined unless |
| @samp{VALUE} is specified. |
| |
| @item TYPE |
| The type that is associated with the field. Types can be declared |
| using a subset of the C declaration syntax, augmented with some C++ |
| and Java features. @xref{Types}, for more information. |
| |
| @item FNAME |
| The name to associate with the field. Treecc verifies that the field |
| does not currently exist in this node type, or in any of its ancestor |
| node types. |
| |
| @item VALUE |
| The default value to assign to the field in the node's constructor. |
| This can only be used on fields that are declared with @samp{%nocreate}. |
| The value must be enclosed in braces. For example @samp{@{NULL@}} would |
| be used to initialize a field with @samp{NULL}. |
| |
| The braces are required because the default value is expressed in |
| the underlying source language, and can use any of the usual constant |
| declaration features present in that language. |
| @end table |
| |
| When the output language is C, treecc creates a struct-based type |
| called @samp{NAME} that contains the fields for @samp{NAME} and |
| all of its ancestor classes. The type also contains some house-keeping |
| fields that are used internally by the generated code. The following |
| is an example: |
| |
| @example |
| typedef struct binary__ binary; |
| struct binary__ @{ |
| const struct binary_vtable__ *vtable__; |
| int kind__; |
| char *filename__; |
| long linenum__; |
| type_code type; |
| expression * expr1; |
| expression * expr2; |
| @}; |
| @end example |
| |
| The programmer should avoid using any identifier that |
| ends with @samp{__}, because it may clash with house-keeping |
| identifiers that are generated by treecc. |
| |
| When the output language is C++, Java, or C#, treecc creates a class |
| called @samp{NAME}, that inherits from the class @samp{PNAME}. |
| The field definitions for @samp{NAME} are converted into public members |
| in the output. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Types, Enumerations, Nodes, Syntax |
| @section Types used in fields and parameters |
| @cindex Types |
| |
| Types that are used in field and parameter declarations have a |
| syntax which is subset of features found in C, C++, and Java: |
| |
| @example |
| TypeAndName ::= Type [ IDENTIFIER ] |
| |
| Type ::= TypeName |
| | Type '*' |
| | Type '&' |
| | Type '[' ']' |
| |
| TypeName ::= IDENTIFIER @{ IDENTIFIER @} |
| @end example |
| |
| Types are usually followed by an identifier that names the field or |
| parameter. The name is required for fields and is optional for parameters. |
| For example @samp{int} is usually equivalent to @samp{int x} in parameter |
| declarations. |
| |
| The following are some examples of using types: |
| |
| @example |
| int |
| int x |
| const char *str |
| expression *expr |
| Element[][] array |
| Item& |
| unsigned int y |
| const Element |
| @end example |
| |
| The grammar used by treecc is slightly ambiguous. The last example above |
| declares a parameter called @samp{Element}, that has type @samp{const}. |
| The programmer probably intended to declare an anonymous parameter with type |
| @samp{const Element} instead. |
| |
| This ambiguity is unavoidable given that treecc is not fully |
| aware of the underlying language's type system. When treecc |
| sees a type that ends in a sequence of identifiers, it will |
| always interpret the last identifier as the field or parameter |
| name. Thus, the programmer must write the following instead: |
| |
| @example |
| const Element e |
| @end example |
| |
| Treecc cannot declare types using the full power of C's type system. |
| The most common forms of declarations are supported, and the rest |
| can usually be obtained by defining a @samp{typedef} within a |
| literal code block. @xref{Literal Code}, for more information |
| on literal code blocks. |
| |
| It is the responsibility of the programmer to use type constructs |
| that are supported by the underlying programming language. Types such |
| as @samp{const char *} will give an error when the output is compiled |
| with a Java compiler, for example. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Enumerations, Operations, Types, Syntax |
| @section Enumerated type declarations |
| @cindex Enumerations |
| @cindex enum declaration |
| @cindex %enum keyword |
| |
| Enumerated types are a special kind of node type that can be used |
| by the programmer for simple values that don't require a full abstract |
| syntax tree node. The following is an example of defining a list |
| of the primitive machine types used in a Java virtual machine: |
| |
| @example |
| %enum JavaType = |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF |
| @} |
| @end example |
| |
| Enumerations are useful when writing code generators and type |
| inferencing routines. The general form is: |
| |
| @example |
| %enum NAME = @{ VALUES @} |
| @end example |
| |
| @table @samp |
| @item NAME |
| An identifier to be used to name the enumerated type. The name must |
| not have been previously used as a node type, an enumerated type, or |
| an enumerated value. |
| |
| @item VALUES |
| A comma-separated list of identifiers that name the values within |
| the enumeration. Each of the names must be unique, and must not have |
| been used previously as a node type, an enumerated type, or an |
| enumerated value. |
| @end table |
| |
| Logically, each enumerated value is a special node type that inherits from |
| a parent node type corresponding to the enumerated type @samp{NAME}. |
| |
| When the output language is C or C++, treecc generates an enumerated |
| typedef for @samp{NAME} that contains the enumerated values in the |
| same order as was used in the input file. The typedef name can be |
| used elsewhere in the code as the type of the enumeration. |
| |
| When the output language is Java, treecc generates a class called |
| @samp{NAME} that contains the enumerated values as integer constants. |
| Elsewhere in the code, the type @samp{int} must be used to declare |
| variables of the enumerated type. Enumerated values are referred |
| to as @samp{NAME.VALUE}. If the enumerated type is used as a trigger |
| parameter, then @samp{NAME} must be used instead of @samp{int}: |
| treecc will convert the type when the Java code is output. |
| |
| When the output language is C#, treecc generates an enumerated value |
| type called @samp{NAME} that contains the enumerated values as |
| members. The C# type @samp{NAME} can be used elsewhere in the code |
| as the type of the enumeration. Enumerated values are referred to |
| as @samp{NAME.VALUE}. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Operations, Options, Enumerations, Syntax |
| @section Operation declarations |
| @cindex Operations |
| @cindex operation declarations |
| @cindex operation cases |
| @cindex %operation keyword |
| @cindex trigger parameters |
| |
| Operations are declared in two parts: the declaration, and the |
| cases. The declaration part defines the prototype for the |
| operation and the cases define how to handle specific kinds of |
| nodes for the operation. |
| |
| Operations are defined over one or more trigger parameters. Each |
| trigger parameter specifies a node type or an enumerated type that |
| is selected upon to determine what course of action to take. The |
| following are some examples of operation declarations: |
| |
| @example |
| %operation void infer_type(expression *e) |
| %operation type_code common_type([type_code t1], [type_code t2]) |
| @end example |
| |
| Trigger parameters are specified by enclosing them in square |
| brackets. If none of the parameters are enclosed in square |
| brackets, then treecc assumes that the first parameter is the |
| trigger. |
| |
| The general form of an operation declaration is as follows: |
| |
| @example |
| %operation @{ %virtual | %inline | %split @} RTYPE [CLASS::]NAME(PARAMS) |
| @end example |
| |
| @table @samp |
| @item %virtual |
| @cindex %virtual keyword |
| Specifies that the operation is associated with a node type as |
| a virtual method. There must be only one trigger parameter, |
| and it must be the first parameter. |
| |
| Non-virtual operations are written to the output source files |
| as global functions. |
| |
| @item %inline |
| @cindex %inline keyword |
| Optimise the generation of the operation code so that all cases |
| are inline within the code for the function itself. This can |
| only be used with non-virtual operations, and may improve |
| code efficiency if there are lots of operation cases with a |
| small amount of code in each. |
| |
| @item %split |
| @cindex %split keyword |
| Split the generation of the multi-trigger operation code across |
| multiple functions, to reduce the size of each individual function. |
| It is sometimes necessary to split large @code{%inline} operations |
| to avoid compiler limits on function size. |
| |
| @item RTYPE |
| The type of the return value for the operation. This should be |
| @samp{void} if the operation does not have a return value. |
| |
| @item CLASS |
| The name of the class to place the operation's definition within. |
| This can only be used with non-virtual operations, and is |
| intended for languages such as Java and C# that cannot declare |
| methods outside of classes. The class name will be ignored if |
| the output language is C. |
| |
| If a class name is required, but the programmer did not supply it, |
| then @samp{NAME} will be used as the default. The exception to |
| this is the C# language: @samp{CLASS} must always be supplied and |
| it must be different from @samp{NAME}. This is due to a "feature" |
| in some C# compilers that forbid a method with the same name as |
| its enclosing class. |
| |
| @item NAME |
| The name of the operation. |
| |
| @item PARAMS |
| The parameters to the operation. Trigger parameters may be |
| enclosed in square brackets. Trigger parameters must be |
| either node types or enumerated types. |
| @end table |
| |
| Once an operation has been declared, the programmer can specify |
| its cases anywhere in the input files. It is not necessary that |
| the cases appear after the operation, or that they be contiguous |
| within the input files. This permits the programmer to place |
| operation cases where they are logically required for maintainence |
| reasons. |
| |
| There must be sufficient operation cases defined to cover every |
| possible combination of node types and enumerated values that |
| inherit from the specified trigger types. An operation case |
| has the following general form: |
| |
| @example |
| NAME(TRIGGERS) [, NAME(TRIGGERS2) ...] |
| @{ |
| CODE |
| @} |
| @end example |
| |
| @table @samp |
| @item NAME |
| The name of the operation for which this case applies. |
| |
| @item TRIGGERS |
| A comma-separated list of node types or enumerated values that |
| define the specific case that is handled by the following code. |
| |
| @item CODE |
| Source code in the output source language that implements the |
| operation case. |
| @end table |
| |
| Multiple trigger combinations can be associated with a single |
| block of code, by listing them all, separated by commas. For |
| example: |
| |
| @example |
| common_type(int_type, int_type) |
| @{ |
| return int_type; |
| @} |
| |
| common_type(int_type, float_type), |
| common_type(float_type, int_type), |
| common_type(float_type, float_type) |
| @{ |
| return float_type; |
| @} |
| @end example |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Options, Literal Code, Operations, Syntax |
| @section Options that modify treecc's behaviour |
| @cindex Options |
| @cindex option declaration |
| @cindex %option keyword |
| |
| "(*)" is used below to indicate an option that is enabled by default. |
| |
| @table @samp |
| @item %option track_lines |
| @cindex track_lines option |
| Enable the generation of code that can track the current filename and |
| line number when nodes are created. @xref{Line Tracking}, for more |
| information. (*) |
| |
| @item %option no_track_lines |
| @cindex no_track_lines option |
| Disable the generation of code that performs line number tracking. |
| |
| @item %option singletons |
| @cindex singletons option |
| Optimise the creation of singleton node types. These are |
| node types without any fields. Treecc can optimise the code |
| so that only one instance of a singleton node type exists in |
| the system. This can speed up the creation of nodes for |
| constants within compilers. (*) |
| |
| Singleton optimisations will have no effect if @samp{track_lines} |
| is enabled, because line tracking uses special hidden fields in |
| every node. |
| |
| @item %option no_singletons |
| @cindex no_singletons option |
| Disable the optimisation of singleton node types. |
| |
| @item %option reentrant |
| @cindex reentrant option |
| Enable the generation of reentrant code that does not rely |
| upon any global variables. Separate copies of the compiler |
| state can be used safely in separate threads. However, the |
| same copy of the compiler state cannot be used safely in two or |
| more threads. |
| |
| @item %option no_reentrant |
| @cindex no_reentrant option |
| Disable the generation of reentrant code. The interface to |
| node management functions is simpler, but cannot be used |
| in a threaded environment. (*) |
| |
| @item %option force |
| @cindex force option |
| Force output source files to be written, even if they are |
| unchanged. This option can also be set using the @samp{-f} |
| command-line option. |
| |
| @item %option no_force |
| @cindex no_force option |
| Don't force output source files to be written if they are the |
| same as before. (*) |
| |
| This option can help smooth integration of treecc with make. |
| Only those output files that have changed will be modified. |
| This reduces the number of files that the underlying source |
| language compiler must process after treecc is executed. |
| |
| @item %option virtual_factory |
| @cindex virtual_factory option |
| Use virtual methods in the node type factories, so that the |
| programmer can subclass the factory and provide new |
| implementations of node creation functions. This option is |
| ignored for C, which does not use factories. |
| |
| @item %option no_virtual_factory |
| @cindex no_virtual_factory option |
| Don't use virtual methods in the node type factories. (*) |
| |
| @item %option abstract_factory |
| @cindex abstract_factory option |
| Use abstract virtual methods in the node type factories. |
| The programmer is responsible for subclassing the factory |
| to provide node creation functionality. |
| |
| @item %option no_abstract_factory |
| @cindex no_abstract_factory option |
| Don't use abstract virtual methods in the node type factories. (*) |
| |
| @item %option kind_in_node |
| @cindex kind_in_node option |
| Put the kind field in the node, for more efficient access at runtime. (*) |
| |
| @item %option kind_in_vtable |
| @cindex kind_in_vtable option |
| Put the kind field in the vtable, and not the node. This saves some |
| memory, at the cost of slower access to the kind value at runtime. |
| This option only applies when the language is C. The kind field is |
| always placed in the node in other languages, because it isn't possible |
| to modify the vtable. |
| |
| @item %option prefix = PREFIX |
| @cindex prefix option |
| Specify the prefix to be used in output files in place of "yy". |
| |
| @item %option state_type = NAME |
| @cindex state_type option |
| Specify the name of the state type. The state type is generated |
| by treecc to perform centralised memory management and reentrancy |
| support. The default value is @samp{YYNODESTATE}. If the output language |
| uses factories, then this will also be the name of the factory |
| base class. |
| |
| @item %option namespace = NAME |
| @cindex namespace option |
| Specify the namespace to write definitions to in the output |
| source files. This option is ignored when the output language |
| is C. |
| |
| @item %option package = NAME |
| @cindex package option |
| Same as @samp{%option namespace = NAME}. Provided because @samp{package} |
| is more natural for Java programmers. |
| |
| @item %option base = NUM |
| @cindex base option |
| Specify the numeric base to use for allocating numeric values to |
| node types. By default, node type allocation begins at 1. |
| |
| @item %option lang = LANGUAGE |
| @cindex lang option |
| Specify the output language. Must be one of @code{"C"}, @code{"C++"}, |
| @code{"Java"}, or @code{"C#"}. The default is @code{"C"}. |
| |
| @item %option block_size = NUM |
| @cindex block_size option |
| Specify the size of the memory blocks to use in C and C++ node allocators. |
| |
| @item %option strip_filenames |
| @cindex strip_filenames option |
| Strip filenames down to their base name in @code{#line} directives. |
| i.e. strip off the directory component. This can be helpful in |
| combination with the @code{%include %readonly} command when |
| treecc input files may processed from different directories, |
| causing common output files to change unexpectedly. |
| |
| @item %option no_strip_filenames |
| @cindex no_strip_filenames option |
| Don't strip filenames in @code{#line} directives. (*) |
| |
| @item %option internal_access |
| @cindex internal_access option |
| Use @code{internal} as the access mode for classes in C#, rather than |
| @code{public}. |
| |
| @item %option public_access |
| @cindex public_access option |
| Use @code{public} as the access mode for classes in C#, rather than |
| @code{internal}. (*) |
| |
| @item %option print_lines |
| @cindex print_lines option |
| Print @code{#line} markers in languages that use them. (*) |
| |
| @item %option no_print_lines |
| @cindex no_print_lines option |
| Do not print @code{#line} markers, even in languages that normally |
| use them. |
| |
| @item %option allocator |
| @cindex allocator option |
| Use treecc's standard node allocator for C and C++. This option has |
| no effect for other output languages. (*) |
| |
| @item %option no_allocator |
| @cindex no_allocator option |
| Do not use treecc's standard node allocator for C and C++. This can be |
| useful when the programmer wants to redirect node allocation to their |
| own routines. |
| |
| @item %option gc_allocator |
| @cindex gc_allocator option |
| Use libgc as a garbage-collecting node allocator for C and C++. This |
| option has no effect for other output languages. |
| |
| @item %option no_gc_allocator |
| @cindex no_gc_allocator option |
| Do not use libgc as a garbage-collecting node allocator for C and C++. (*) |
| |
| @item %option base_type |
| @cindex base_type option |
| Specify the base type for the root node of the treecc node heirarchy. |
| The default is no base type. |
| |
| @end table |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Literal Code, Changing Files, Options, Syntax |
| @section Literal code declarations |
| @cindex Literal code |
| |
| Sometimes it is necessary to embed literal code within output @samp{.h} |
| and source files. Usually this is to @samp{#include} definitions |
| from other files, or to define functions that cannot be easily expressed |
| as operations. |
| |
| A literal code block is specified by enclosing it in @samp{%@{} and |
| @samp{%@}}. The block can also be prefixed with the following flags: |
| |
| @table @samp |
| @item %decls |
| @cindex %decls keyword |
| Write the literal code to the currently active declaration header file, |
| instead of the source file. |
| |
| @item %both |
| @cindex %both keyword |
| Write the literal code to both the currently active declaration header file |
| and the currently active source file. |
| |
| @item %end |
| @cindex %end keyword |
| Write the literal code to the end of the file, instead of the beginning. |
| @end table |
| |
| Another form of literal code block is one which begins with @samp{%%} and |
| extends to the end of the current input file. This form implicitly has |
| the @samp{%end} flag. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Changing Files, Line Tracking, Literal Code, Syntax |
| @section Changing input and output files |
| @cindex Changing files |
| |
| Most treecc compiler definitions will be too large to be manageable |
| in a single input file. They also will be too large to write to a |
| single output file, because that may overload the source language |
| compiler. |
| |
| Multiple input files can be specified on the command-line, or |
| they can be explicitly included by other input files with |
| the following declarations: |
| |
| @table @samp |
| @item %include [ %readonly ] FILENAME |
| @cindex include declaration |
| @cindex %include keyword |
| @cindex %readonly keyword |
| Include the contents of the specified file at the current point |
| within the current input file. @samp{FILENAME} is interpreted |
| relative to the name of the current input file. |
| |
| If the @samp{%readonly} keyword is supplied, then any output |
| files that are generated by the included file must be read-only. |
| That is, no changes are expected by performing the inclusion. |
| |
| The @samp{%readonly} keyword is useful for building compilers |
| in layers. The programmer may group a large number of useful |
| node types and operations together that are independent of the |
| particulars of a given language. The programmer then defines |
| language-specific compilers that "inherit" the common definitions. |
| |
| Read-only inclusions ensure that any extensions that are added |
| by the language-specific parts do not "leak" into the common code. |
| @end table |
| |
| Output files can be changed using the follow declarations: |
| |
| @table @samp |
| @item %header FILENAME |
| @cindex header declaration |
| @cindex %header keyword |
| Change the currently active declaration header file to @samp{FILENAME}, |
| which is interpreted relative to the current input file. This option |
| has no effect for languages without header files (Java and C#). |
| |
| Any node types and operations that are defined after a @samp{%header} |
| declaration will be declared in @samp{FILENAME}. |
| |
| @item %output FILENAME |
| @cindex output declaration |
| @cindex %output keyword |
| Change the currently active source file to @samp{FILENAME}, |
| which is interpreted relative to the current input file. This option |
| has no effect for languages that require a single class per file (Java). |
| |
| Any node types and operations that are defined after a @samp{%header} |
| declaration will have their implementations placed in @samp{FILENAME}. |
| |
| @item %outdir DIRNAME |
| @cindex outdir declaration |
| @cindex %outdir keyword |
| Change the output source directory to @samp{DIRNAME}. This is only |
| used for Java, which requires that a single file be used for each class. |
| All classes are written to the specified directory. By default, |
| @samp{DIRNAME} is the current directory where treecc was invoked. |
| @end table |
| |
| When treecc generates the output source code, it must insert several |
| common house-keeping functions and classes into the code. By default, |
| these are written to the first header and source files. This can |
| be changed with the @samp{%common} declaration: |
| |
| @table @samp |
| @item %common |
| @cindex common declaration |
| @cindex %common keyword |
| Output the common house-keeping code to the currently active |
| declaration header file and the currently active source file. |
| This is typically used as follows: |
| |
| @example |
| %header "common.h" |
| %output "common.c" |
| %common |
| @end example |
| @end table |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Line Tracking, Output APIs, Changing Files, Top |
| @chapter Tracking line numbers in source files |
| @cindex Line tracking |
| |
| When compilers emit error messages to the programmer, it is generally |
| a good idea to indicate which file and which line gave rise to the |
| error. Syntax errors can be emitted fairly easily because the parser |
| usually has access to the current line number. However, semantic |
| errors are harder to report because the parser may no longer be |
| active when the error is detected. |
| |
| Treecc can generate code that automatically keeps track of what line |
| in the source file was active when a node is created. Every node |
| has two extra private fields that specify the name of the file and the |
| line number. Semantic analysis routines can query this information |
| when reporting errors. |
| |
| Because treecc is not aware of how to obtain this information, the |
| programmer must supply some additional functions. @xref{Output APIs}, |
| for more information. |
| |
| @xref{Output APIs}, for more information. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Output APIs, C Language, Line Tracking, Top |
| @chapter API's available in the generated output |
| @cindex Output APIs |
| |
| The source code that is generated by treecc exports a number of |
| application programmer interfaces (API's) to the programmer. These |
| can be used elsewhere in the compiler implementation to manipulate |
| abstract syntax trees. The following sections describe the API's |
| for each of the output languages. |
| |
| @menu |
| * C Language:: C Language API's |
| * C++ Language:: C++ Language API's |
| * Java Language:: Java Language API's |
| * C# Language:: C# Language API's |
| * Ruby Language:: Ruby Language API's |
| @end menu |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node C Language, C++ Language, Output APIs, Output APIs |
| @section C Language APIs |
| @cindex C APIs |
| |
| In the C output language, each node type is converted into a @samp{typedef} |
| that contains the node's fields, and the fields of its ancestor node |
| types. The following example demonstrates how treecc node declarations |
| are converted into C source code: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type; |
| @} |
| %node binary expression %abstract = |
| @{ |
| expression *expr1; |
| expression *expr2; |
| @} |
| %node plus binary |
| @end example |
| |
| becomes: |
| |
| @example |
| typedef struct expression__ expression; |
| typedef struct binary__ binary; |
| typedef struct plus__ plus; |
| |
| struct expression__ @{ |
| const struct expression_vtable__ *vtable__; |
| int kind__; |
| char *filename__; |
| long linenum__; |
| type_code type; |
| @}; |
| struct binary__ @{ |
| const struct binary_vtable__ *vtable__; |
| int kind__; |
| char *filename__; |
| long linenum__; |
| type_code type; |
| expression * expr1; |
| expression * expr2; |
| @}; |
| struct plus__ @{ |
| const struct plus_vtable__ *vtable__; |
| int kind__; |
| char *filename__; |
| long linenum__; |
| type_code type; |
| expression * expr1; |
| expression * expr2; |
| @}; |
| @end example |
| |
| Programmers should avoid using any identifiers that end in |
| @samp{__}. Such identifiers are reserved for internal use by treecc |
| and its support routines. |
| |
| For each non-abstract node type called @samp{NAME}, treecc generates a |
| function called @samp{NAME_create} that creates nodes of that type. |
| The general form of the function's prototype is as follows: |
| |
| @example |
| TYPE *NAME_create([YYNODESTATE *state,] PARAMS) |
| @end example |
| |
| @table @samp |
| @item TYPE |
| The return node type, which is the nearest ancestor that has the |
| @samp{%typedef} flag. |
| |
| @item NAME |
| The name of the node type that is being created. |
| |
| @item state |
| The system state, if reentrant code is being generated. |
| |
| @item PARAMS |
| The create parameters, consisting of every field that does not |
| have the @samp{%nocreate} flag. The parameters appear in the |
| same order as the fields in the node types, from the top-most |
| ancestor down to the node type itself. For example: |
| |
| @example |
| expression *plus_create(expression * expr1, expression * expr2); |
| @end example |
| @end table |
| |
| Enumerated types are converted into a C @samp{typedef} with the |
| same name and values: |
| |
| @example |
| %enum JavaType = |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF |
| @} |
| @end example |
| |
| becomes: |
| |
| @example |
| typedef enum |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF |
| |
| @} JavaType; |
| @end example |
| |
| Virtual operations are converted into C macros that invoke the |
| correct vtable entry on a node type: |
| |
| @example |
| %operation %virtual void infer_type(expression *e) |
| @end example |
| |
| becomes: |
| |
| @example |
| #define infer_type(this__) \ |
| ((*(((struct expression_vtable__ *) \ |
| ((this__)->vtable__))->infer_type_v__)) \ |
| ((expression *)(this__))) |
| @end example |
| |
| Calls to @samp{infer_type} can then be made with @samp{infer_type(node)}. |
| |
| Non-virtual operations are converted into C functions: |
| |
| @example |
| %operation void infer_type(expression *e) |
| @end example |
| |
| becomes: |
| |
| @example |
| extern void infer_type(expression *e); |
| @end example |
| |
| Because virtual and non-virtual operations use a similar call syntax, |
| it is very easy to convert a virtual operation into a non-virtual |
| operation when the output language is C. This isn't possible with |
| the other output languages. |
| |
| Other house-keeping tasks are performed by the following functions |
| and macros. Some of these must be supplied by the programmer. |
| The @samp{state} parameter is required only if a reentrant compiler is |
| being built. |
| |
| @table @code |
| @item int yykind(ANY *node) |
| @cindex yykind macro |
| Gets the numeric kind value associated with a particular node. |
| The kind value for node type @samp{NAME} is called @samp{NAME_kind}. |
| |
| @item const char *yykindname(ANY *node) |
| @cindex yykindname macro |
| Gets the name of the node kind associated with a particular node. |
| This may be helpful for debugging and logging code. |
| |
| @item int yyisa(ANY *node, type) |
| @cindex yyisa macro |
| Determines if @samp{node} is an instance of the node type @samp{type}. |
| |
| @item char *yygetfilename(ANY *node) |
| @cindex yygetfilename macro |
| Gets the filename corresponding to where @samp{node} was created |
| during parsing. This macro is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item long yygetlinenum(ANY *node) |
| @cindex yygetlinenum macro |
| Gets the line number corresponding to where @samp{node} was created |
| during parsing. This macro is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item void yysetfilename(ANY *node, char *value) |
| @cindex yysetfilename macro |
| Sets the filename associated with @samp{node} to @samp{value}. The |
| string is not copied, so @samp{value} must persist for the lifetime |
| of the node. This macro will rarely be required, unless a node |
| corresponds to a different line than the current parse line. This |
| macro is only generated if @samp{%option track_lines} was specified. |
| |
| @item void yysetlinenum(ANY *node, long value) |
| @cindex yysetlinenum macro |
| Sets the line number associated with @samp{node} to @samp{value}. |
| This macro will rarely be required, unless a node corresponds to a |
| different line than the current parse line. This macro is only |
| generated if @samp{%option track_lines} was specified. |
| |
| @item char *yycurrfilename([YYNODESTATE *state]) |
| @cindex yycurrfilename function |
| Get the name of the current input file from the parser. The pointer |
| that is returned from this function is stored as-is: the string is |
| not copied. Therefore, the value must persist for at least as long |
| as the node will persist. This function must be supplied by the programmer |
| if @samp{%option track_lines} was specified. |
| |
| @item long yycurrlinenum([YYNODESTATE *state]) |
| @cindex yycurrlinenum function |
| Get the number of the current input line from the parser. This |
| function must be supplied by the programmer if @samp{%option track_lines} |
| was specified. |
| |
| @item void yynodeinit([YYNODESTATE *state]) |
| @cindex yynodeinit function |
| Initializes the node memory manager. If the system is reentrant, then |
| the node memory manager is @samp{state}. Otherwise a global node |
| memory manager is used. |
| |
| @item void *yynodealloc([YYNODESTATE *state,] unsigned int size) |
| @cindex yynodealloc function |
| Allocates a block of memory of @samp{size} bytes in size from the |
| node memory manager. This function is called automatically from |
| the node-specific @samp{*_create} functions. The programmer will |
| not normally need to call this function. |
| |
| This function will return @code{NULL} if the system is out of |
| memory, or if @samp{size} is too large to be allocated within |
| the node memory manager. If the system is out of memory, then |
| @samp{yynodealloc} will call @samp{yynodefailed} prior to |
| returning @code{NULL}. |
| |
| @item int yynodepush([YYNODESTATE *state]) |
| @cindex yynodepush function |
| Pushes the current node memory manager position. The next time |
| @code{yynodepop} is called, the node memory manager will reset to |
| the pushed position. This function returns zero if the system |
| is out of memory. |
| |
| @item void yynodepop([YYNODESTATE *state]) |
| @cindex yynodepop function |
| Pops the current node memory manager position. This function has |
| no effect if @code{yynodepush} was not called previously. |
| |
| The @code{yynodepush} and @code{yynodepop} functions can be used |
| to perform a simple kind of garbage collection on nodes. When |
| the parser enters a scope, it pushes the node memory manager |
| position. After all definitions in the scope have been dealt |
| with, the parser pops the node memory manager to reclaim all |
| of the memory used. |
| |
| @item void yynodeclear([YYNODESTATE *state]) |
| @cindex yynodeclear function |
| Clears the entire node memory manager and returns it to the |
| state it had after calling @code{yynodeinit}. This is typically |
| used upon program shutdown to free all remaining node memory. |
| |
| @item void yynodefailed([YYNODESTATE *state]) |
| @cindex yynodefailed function |
| Called when @code{yynodealloc} or @code{yynodepush} detects that |
| the system is out of memory. This function must be supplied by |
| the programmer. The programmer may choose to exit to program |
| when the system is out of memory; in which case @code{yynodealloc} |
| will never return @code{NULL}. |
| @end table |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node C++ Language, Java Language, C Language, Output APIs |
| @section C++ Language APIs |
| @cindex C++ APIs |
| |
| In the C++ output language, each node type is converted into a @samp{class} |
| that contains the node's fields, virtual operations, and other house-keeping |
| definitions. The following example demonstrates how treecc node declarations |
| are converted into C++ source code: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type; |
| @} |
| %node binary expression %abstract = |
| @{ |
| expression *expr1; |
| expression *expr2; |
| @} |
| %node plus binary |
| @end example |
| |
| becomes: |
| |
| @example |
| class expression |
| @{ |
| protected: |
| |
| int kind__; |
| char *filename__; |
| long linenum__; |
| |
| public: |
| |
| int getKind() const @{ return kind__; @} |
| const char *getFilename() const @{ return filename__; @} |
| int getLinenum() const @{ return linenum__; @} |
| void setFilename(char *filename) @{ filename__ = filename; @} |
| void setLinenum(long linenum) @{ linenum__ = linenum; @} |
| |
| void *operator new(size_t); |
| void operator delete(void *, size_t); |
| |
| protected: |
| |
| expression(); |
| |
| public: |
| |
| type_code type; |
| |
| virtual int isA(int kind) const; |
| virtual const char *getKindName() const; |
| |
| protected: |
| |
| virtual ~expression(); |
| |
| @}; |
| |
| class binary : public expression |
| @{ |
| protected: |
| |
| binary(expression * expr1, expression * expr2); |
| |
| public: |
| |
| expression * expr1; |
| expression * expr2; |
| |
| virtual int isA(int kind) const; |
| virtual const char *getKindName() const; |
| |
| protected: |
| |
| virtual ~binary(); |
| |
| @}; |
| |
| class plus : public binary |
| @{ |
| public: |
| |
| plus(expression * expr1, expression * expr2); |
| |
| public: |
| |
| virtual int isA(int kind) const; |
| virtual const char *getKindName() const; |
| |
| protected: |
| |
| virtual ~plus(); |
| |
| @}; |
| @end example |
| |
| The following standard methods are available on every node type: |
| |
| @table @code |
| @item int getKind() |
| @cindex getKind method (C++) |
| Gets the numeric kind value associated with a particular node. |
| The kind value for node type @samp{NAME} is called @samp{NAME_kind}. |
| |
| @item virtual const char *getKindName() |
| @cindex getKindName method (C++) |
| Gets the name of the node kind associated with a particular node. |
| This may be helpful for debugging and logging code. |
| |
| @item virtual int isA(int kind) |
| @cindex isA method (C++) |
| Determines if the node is a member of the node type that corresponds |
| to the numeric kind value @samp{kind}. |
| |
| @item const char *getFilename() |
| @cindex getFilename method (C++) |
| Gets the filename corresponding to where the node was created |
| during parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item long getLinenum() |
| @cindex getLinenum method (C++) |
| Gets the line number corresponding to where the node was created |
| during parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item void setFilename(char *value) |
| @cindex setFilename method (C++) |
| Sets the filename associated with the node to @samp{value}. The |
| string is not copied, so @samp{value} must persist for the lifetime |
| of the node. This method will rarely be required, unless a node |
| corresponds to a different line than the current parse line. This |
| method is only generated if @samp{%option track_lines} was specified. |
| |
| @item void setLinenum(long value) |
| @cindex setLinenum method (C++) |
| Sets the line number associated with the node to @samp{value}. |
| This method will rarely be required, unless a node corresponds to a |
| different line than the current parse line. This method is only |
| generated if @samp{%option track_lines} was specified. |
| @end table |
| |
| If the generated code is non-reentrant, then the constructor for the |
| class can be used to construct nodes of the specified node type. The |
| constructor parameters are the same as the fields within the node type's |
| definition, except for @samp{%nocreate} fields. |
| |
| If the generated code is reentrant, then nodes cannot be constructed |
| using the C++ @samp{new} operator. The @samp{*Create} methods |
| on the @samp{YYNODESTATE} factory class must be used instead. |
| |
| The @samp{YYNODESTATE} class contains a number of house-keeping methods |
| that are used to manage nodes: |
| |
| @table @code |
| @item static YYNODESTATE *getState() |
| @cindex getState method (C++) |
| Gets the global @samp{YYNODESTATE} instance that is being used by |
| non-reentrant code. If an instance has not yet been created, |
| this method will create one. |
| |
| When using non-reentrant code, the programmer will normally subclass |
| @samp{YYNODESTATE}, override some of the methods below, and then |
| construct an instance of the subclass. This constructed instance |
| will then be returned by future calls to @samp{getState}. |
| |
| @item void *alloc(size_t size) |
| @cindex alloc method (C++) |
| Allocates a block of memory of @samp{size} bytes in size from the |
| node memory manager. This function is called automatically from |
| the node-specific constructors and @samp{*Create} methods. The programmer |
| will not normally need to call this function. |
| |
| This function will return @code{NULL} if the system is out of |
| memory, or if @samp{size} is too large to be allocated within |
| the node memory manager. If the system is out of memory, then |
| @samp{alloc} will call @samp{failed} prior to returning @code{NULL}. |
| |
| @item int push() |
| @cindex push method (C++) |
| Pushes the current node memory manager position. The next time |
| @code{pop} is called, the node memory manager will reset to |
| the pushed position. This function returns zero if the system |
| is out of memory. |
| |
| @item void pop() |
| @cindex pop method (C++) |
| Pops the current node memory manager position. This function has |
| no effect if @code{push} was not called previously. |
| |
| The @code{push} and @code{pop} methods can be used |
| to perform a simple kind of garbage collection on nodes. When |
| the parser enters a scope, it pushes the node memory manager |
| position. After all definitions in the scope have been dealt |
| with, the parser pops the node memory manager to reclaim all |
| of the memory used. |
| |
| @item void clear() |
| @cindex clear method (C++) |
| Clears the entire node memory manager and returns it to the |
| state it had after construction. |
| |
| @item virtual void failed() |
| @cindex failed method (C++) |
| Called when @code{alloc} or @code{push} detects that |
| the system is out of memory. This method is typically |
| overridden by the programmer in subclasses. The programmer may |
| choose to exit to program when the system is out of memory; in |
| which case @code{alloc} will never return @code{NULL}. |
| |
| @item virtual char *currFilename() |
| @cindex currFilename method (C++) |
| Get the name of the current input file from the parser. The pointer |
| that is returned from this function is stored as-is: the string is |
| not copied. Therefore, the value must persist for at least as long |
| as the node will persist. This method is usually overrriden by |
| the programmer in subclasses if @samp{%option track_lines} was specified. |
| |
| @item virtual long currLinenum() |
| @cindex currLinenum method (C++) |
| Get the number of the current input line from the parser. This |
| method is usually overridden by the programmer in subclasses |
| if @samp{%option track_lines} was specified. |
| @end table |
| |
| The programmer will typically subclass @samp{YYNODESTATE} to provide |
| additional functionality, and then create an instance of this class |
| to act as the node memory manager and node creation factory. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Java Language, C# Language, C++ Language, Output APIs |
| @section Java Language APIs |
| @cindex Java APIs |
| |
| In the Java output language, each node type is converted into a @samp{class} |
| that contains the node's fields, virtual operations, and other house-keeping |
| definitions. The following example demonstrates how treecc node declarations |
| are converted into Java source code: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type; |
| @} |
| %node binary expression %abstract = |
| @{ |
| expression expr1; |
| expression expr2; |
| @} |
| %node plus binary |
| @end example |
| |
| becomes: |
| |
| @example |
| public class expression |
| @{ |
| protected int kind__; |
| protected String filename__; |
| protected long linenum__; |
| |
| public int getKind() @{ return kind__; @} |
| public String getFilename() @{ return filename__; @} |
| public long getLinenum() const @{ return linenum__; @} |
| public void setFilename(String filename) @{ filename__ = filename; @} |
| public void setLinenum(long linenum) @{ linenum__ = linenum; @} |
| |
| public static final int KIND = 1; |
| |
| public type_code type; |
| |
| protected expression() |
| @{ |
| this.kind__ = KIND; |
| this.filename__ = YYNODESTATE.getState().currFilename(); |
| this.linenum__ = YYNODESTATE.getState().currLinenum(); |
| @} |
| |
| public int isA(int kind) |
| @{ |
| if(kind == KIND) |
| return 1; |
| else |
| return 0; |
| @} |
| |
| public String getKindName() |
| @{ |
| return "expression"; |
| @} |
| @} |
| |
| public class binary extends expression |
| @{ |
| public static final int KIND = 2; |
| |
| public expression expr1; |
| public expression expr2; |
| |
| protected binary(expression expr1, expression expr2) |
| @{ |
| super(); |
| this.kind__ = KIND; |
| this.expr1 = expr1; |
| this.expr2 = expr2; |
| @} |
| |
| public int isA(int kind) |
| @{ |
| if(kind == KIND) |
| return 1; |
| else |
| return super.isA(kind); |
| @} |
| |
| public String getKindName() |
| @{ |
| return "binary"; |
| @} |
| @} |
| |
| public class plus extends binary |
| @{ |
| public static final int KIND = 3; |
| |
| public plus(expression expr1, expression expr2) |
| @{ |
| super(expr1, expr2); |
| this.kind__ = KIND; |
| @} |
| |
| public int isA(int kind) |
| @{ |
| if(kind == KIND) |
| return 1; |
| else |
| return super.isA(kind); |
| @} |
| |
| public String getKindName() |
| @{ |
| return "plus"; |
| @} |
| @} |
| @end example |
| |
| The following standard members are available on every node type: |
| |
| @table @code |
| @item int KIND |
| @cindex KIND field (Java) |
| The kind value for the node type corresponding to this class. |
| |
| @item int getKind() |
| @cindex getKind method (Java) |
| Gets the numeric kind value associated with a particular node. |
| The kind value for node type @samp{NAME} is called @samp{NAME.KIND}. |
| |
| @item String getKindName() |
| @cindex getKindName method (Java) |
| Gets the name of the node kind associated with a particular node. |
| This may be helpful for debugging and logging code. |
| |
| @item int isA(int kind) |
| @cindex isA method (Java) |
| Determines if the node is a member of the node type that corresponds |
| to the numeric kind value @samp{kind}. |
| |
| @item String getFilename() |
| @cindex getFilename method (Java) |
| Gets the filename corresponding to where the node was created |
| during parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item long getLinenum() |
| @cindex getLinenum method (Java) |
| Gets the line number corresponding to where the node was created |
| during parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item void setFilename(String value) |
| @cindex setFilename method (Java) |
| Sets the filename associated with the node to @samp{value}. |
| This method will rarely be required, unless a node corresponds to |
| a different line than the current parse line. This method is only |
| generated if @samp{%option track_lines} was specified. |
| |
| @item void setLinenum(long value) |
| @cindex setLinenum method (Java) |
| Sets the line number associated with the node to @samp{value}. |
| This method will rarely be required, unless a node corresponds to a |
| different line than the current parse line. This method is only |
| generated if @samp{%option track_lines} was specified. |
| @end table |
| |
| If the generated code is non-reentrant, then the constructor for the |
| class can be used to construct nodes of the specified node type. The |
| constructor parameters are the same as the fields within the node type's |
| definition, except for @samp{%nocreate} fields. |
| |
| If the generated code is reentrant, then nodes cannot be constructed |
| using the Java @samp{new} operator. The @samp{*Create} methods |
| on the @samp{YYNODESTATE} factory class must be used instead. |
| |
| Enumerated types are converted into a Java @samp{class}: |
| |
| @example |
| %enum JavaType = |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF |
| @} |
| @end example |
| |
| becomes: |
| |
| @example |
| public class JavaType |
| @{ |
| public static final int JT_BYTE = 0; |
| public static final int JT_SHORT = 1; |
| public static final int JT_CHAR = 2; |
| public static final int JT_INT = 3; |
| public static final int JT_LONG = 4; |
| public static final int JT_FLOAT = 5; |
| public static final int JT_DOUBLE = 6; |
| public static final int JT_OBJECT_REF = 7; |
| @} |
| @end example |
| |
| References to enumerated types in fields and operation parameters |
| are replaced with the type @samp{int}. |
| |
| Virtual operations are converted into public methods on the Java |
| node classes. |
| |
| Non-virtual operations are converted into a static method within |
| a class named for the operation. For example, |
| |
| @example |
| %operation void InferType::infer_type(expression e) |
| @end example |
| |
| becomes: |
| |
| @example |
| public class InferType |
| @{ |
| public static void infer_type(expression e) |
| @{ |
| ... |
| @} |
| @} |
| @end example |
| |
| If the class name (@samp{InferType} in the above example) is omitted, |
| then the name of the operation is used as both the class name and the |
| the method name. |
| |
| The @samp{YYNODESTATE} class contains a number of house-keeping methods |
| that are used to manage nodes: |
| |
| @table @code |
| @item static YYNODESTATE getState() |
| @cindex getState method (Java) |
| Gets the global @samp{YYNODESTATE} instance that is being used by |
| non-reentrant code. If an instance has not yet been created, |
| this method will create one. |
| |
| When using non-reentrant code, the programmer will normally subclass |
| @samp{YYNODESTATE}, override some of the methods below, and then |
| construct an instance of the subclass. This constructed instance |
| will then be returned by future calls to @samp{getState}. |
| |
| This method will not be present if a reentrant system is being |
| generated. |
| |
| @item String currFilename() |
| @cindex currFilename method (Java) |
| Get the name of the current input file from the parser. This method |
| is usually overrriden by the programmer in subclasses if |
| @samp{%option track_lines} was specified. |
| |
| @item long currLinenum() |
| @cindex currLinenum method (Java) |
| Get the number of the current input line from the parser. This |
| method is usually overridden by the programmer in subclasses |
| if @samp{%option track_lines} was specified. |
| @end table |
| |
| The programmer will typically subclass @samp{YYNODESTATE} to provide |
| additional functionality, and then create an instance of this class |
| to act as the global state and node creation factory. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node C# Language, Ruby Language, Java Language, Output APIs |
| @section C# Language APIs |
| @cindex C# APIs |
| |
| In the C# output language, each node type is converted into a @samp{class} |
| that contains the node's fields, virtual operations, and other house-keeping |
| definitions. The following example demonstrates how treecc node declarations |
| are converted into C# source code: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type; |
| @} |
| %node binary expression %abstract = |
| @{ |
| expression expr1; |
| expression expr2; |
| @} |
| %node plus binary |
| @end example |
| |
| becomes: |
| |
| @example |
| public class expression |
| @{ |
| protected int kind__; |
| protected String filename__; |
| protected long linenum__; |
| |
| public int getKind() @{ return kind__; @} |
| public String getFilename() @{ return filename__; @} |
| public long getLinenum() const @{ return linenum__; @} |
| public void setFilename(String filename) @{ filename__ = filename; @} |
| public void setLinenum(long linenum) @{ linenum__ = linenum; @} |
| |
| public const int KIND = 1; |
| |
| public type_code type; |
| |
| protected expression() |
| @{ |
| this.kind__ = KIND; |
| this.filename__ = YYNODESTATE.getState().currFilename(); |
| this.linenum__ = YYNODESTATE.getState().currLinenum(); |
| @} |
| |
| public virtual int isA(int kind) |
| @{ |
| if(kind == KIND) |
| return 1; |
| else |
| return 0; |
| @} |
| |
| public virtual String getKindName() |
| @{ |
| return "expression"; |
| @} |
| @} |
| |
| public class binary : expression |
| @{ |
| public const int KIND = 2; |
| |
| public expression expr1; |
| public expression expr2; |
| |
| protected binary(expression expr1, expression expr2) |
| : expression() |
| @{ |
| this.kind__ = KIND; |
| this.expr1 = expr1; |
| this.expr2 = expr2; |
| @} |
| |
| public override int isA(int kind) |
| @{ |
| if(kind == KIND) |
| return 1; |
| else |
| return base.isA(kind); |
| @} |
| |
| public override String getKindName() |
| @{ |
| return "binary"; |
| @} |
| @} |
| |
| public class plus : binary |
| @{ |
| public const int KIND = 5; |
| |
| public plus(expression expr1, expression expr2) |
| : binary(expr1, expr2) |
| @{ |
| this.kind__ = KIND; |
| @} |
| |
| public override int isA(int kind) |
| @{ |
| if(kind == KIND) |
| return 1; |
| else |
| return base.isA(kind); |
| @} |
| |
| public override String getKindName() |
| @{ |
| return "plus"; |
| @} |
| @} |
| @end example |
| |
| The following standard members are available on every node type: |
| |
| @table @code |
| @item const int KIND |
| @cindex KIND field (C#) |
| The kind value for the node type corresponding to this class. |
| |
| @item int getKind() |
| @cindex getKind method (C#) |
| Gets the numeric kind value associated with a particular node. |
| The kind value for node type @samp{NAME} is called @samp{NAME.KIND}. |
| |
| @item virtual String getKindName() |
| @cindex getKindName method (C#) |
| Gets the name of the node kind associated with a particular node. |
| This may be helpful for debugging and logging code. |
| |
| @item virtual int isA(int kind) |
| @cindex isA method (C#) |
| Determines if the node is a member of the node type that corresponds |
| to the numeric kind value @samp{kind}. |
| |
| @item String getFilename() |
| @cindex getFilename method (C#) |
| Gets the filename corresponding to where the node was created |
| during parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item long getLinenum() |
| @cindex getLinenum method (C#) |
| Gets the line number corresponding to where the node was created |
| during parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @item void setFilename(String value) |
| @cindex setFilename method (C#) |
| Sets the filename associated with the node to @samp{value}. |
| This method will rarely be required, unless a node corresponds to |
| a different line than the current parse line. This method is only |
| generated if @samp{%option track_lines} was specified. |
| |
| @item void setLinenum(long value) |
| @cindex setLinenum method (C#) |
| Sets the line number associated with the node to @samp{value}. |
| This method will rarely be required, unless a node corresponds to a |
| different line than the current parse line. This method is only |
| generated if @samp{%option track_lines} was specified. |
| @end table |
| |
| If the generated code is non-reentrant, then the constructor for the |
| class can be used to construct nodes of the specified node type. The |
| constructor parameters are the same as the fields within the node type's |
| definition, except for @samp{%nocreate} fields. |
| |
| If the generated code is reentrant, then nodes cannot be constructed |
| using the C# @samp{new} operator. The @samp{*Create} methods |
| on the @samp{YYNODESTATE} factory class must be used instead. |
| |
| Enumerated types are converted into a C# @samp{enum} definition: |
| |
| @example |
| %enum JavaType = |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF |
| @} |
| @end example |
| |
| becomes: |
| |
| @example |
| public enum JavaType |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF, |
| @} |
| @end example |
| |
| Virtual operations are converted into public virtual methods on the C# |
| node classes. |
| |
| Non-virtual operations are converted into a static method within |
| a class named for the operation. For example, |
| |
| @example |
| %operation void InferType::infer_type(expression e) |
| @end example |
| |
| becomes: |
| |
| @example |
| public class InferType |
| @{ |
| public static void infer_type(expression e) |
| @{ |
| ... |
| @} |
| @} |
| @end example |
| |
| If the class name (@samp{InferType} in the above example) is omitted, |
| then the name of the operation is used as both the class name and the |
| the method name. |
| |
| The @samp{YYNODESTATE} class contains a number of house-keeping methods |
| that are used to manage nodes: |
| |
| @table @code |
| @item static YYNODESTATE getState() |
| @cindex getState method (C#) |
| Gets the global @samp{YYNODESTATE} instance that is being used by |
| non-reentrant code. If an instance has not yet been created, |
| this method will create one. |
| |
| When using non-reentrant code, the programmer will normally subclass |
| @samp{YYNODESTATE}, override some of the methods below, and then |
| construct an instance of the subclass. This constructed instance |
| will then be returned by future calls to @samp{getState}. |
| |
| This method will not be present if a reentrant system is being |
| generated. |
| |
| @item virtual String currFilename() |
| @cindex currFilename method (C#) |
| Get the name of the current input file from the parser. This method |
| is usually overrriden by the programmer in subclasses if |
| @samp{%option track_lines} was specified. |
| |
| @item virtual long currLinenum() |
| @cindex currLinenum method (C#) |
| Get the number of the current input line from the parser. This |
| method is usually overridden by the programmer in subclasses |
| if @samp{%option track_lines} was specified. |
| @end table |
| |
| The programmer will typically subclass @samp{YYNODESTATE} to provide |
| additional functionality, and then create an instance of this class |
| to act as the global state and node creation factory. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Ruby Language, Full Expression Example, C# Language, Output APIs |
| @section Ruby Language APIs |
| @cindex Ruby APIs |
| |
| In the Ruby output language, each node type is converted into a |
| @samp{class} that contains the node's fields, operations, and other |
| house-keeping definitions. The following example demonstrates how |
| treecc node declarations are converted into Ruby source code: |
| |
| @example |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type; |
| @} |
| %node binary expression %abstract = |
| @{ |
| expression expr1; |
| expression expr2; |
| @} |
| %node plus binary |
| @end example |
| |
| becomes: |
| |
| @example |
| class YYNODESTATE |
| |
| @@@@state = nil |
| |
| def YYNODESTATE.state |
| return @@@@state unless @@@@state.nil? |
| @@@@state = YYNODESTATE.new() |
| return @@@@state |
| end |
| |
| def intialize |
| @@@@state = self |
| end |
| |
| def currFilename |
| return nil |
| end |
| |
| def currLinenum |
| return 0 |
| end |
| end |
| |
| class Expression |
| protected |
| attr_reader :kind |
| public |
| |
| attr_accessor :Linenum, :Filename |
| |
| attr_accessor :type |
| |
| KIND = 1 |
| |
| def initialize() |
| @@kind = KIND |
| @@Filename = YYNODESTATE.state.currFilename() |
| @@Linenum = YYNODESTATE.state.currLinenum() |
| end |
| |
| def isA(kind) |
| if(@@kind == KIND) then |
| return true |
| else |
| return 0 |
| end |
| end |
| |
| def KindName |
| return "Expression" |
| end |
| |
| end |
| |
| class Binary < Expression |
| |
| attr_accessor :expr1 |
| attr_accessor :expr2 |
| |
| KIND = 2 |
| |
| def initialize(expr1, expr2) |
| super(expr1, expr2) |
| @@kind = KIND |
| self.expr1 = expr1 |
| self.expr2 = expr2 |
| end |
| |
| def isA(kind) |
| if(@@kind == Kind) then |
| return true |
| else |
| return super(kind) |
| end |
| end |
| |
| def KindName |
| return "Binary" |
| end |
| |
| end |
| |
| class Plus < Binary |
| |
| KIND = 3 |
| |
| def initialize(expr1, expr2) |
| super(expr1, expr2expr1, expr2) |
| @@kind = KIND |
| end |
| |
| def isA(kind) |
| if(@@kind == KIND) then |
| return true |
| else |
| return super(kind) |
| end |
| end |
| |
| def KindName |
| return "Plus" |
| end |
| |
| end |
| @end example |
| |
| The following standard members are available on every node type: |
| |
| @table @code |
| @item KIND |
| @cindex KIND field (Ruby) |
| The kind value for the node type corresponding to this class. |
| The kind value for node type @samp{NAME} is called @samp{NAME::KIND}. |
| |
| @item KindName |
| @cindex KindName field (Ruby) |
| The name of the node kind associated with a particular node. This may |
| be helpful for debugging and logging code. |
| |
| @item isA(int kind) |
| @cindex isA method (Ruby) |
| Determines if the node is a member of the node type that corresponds |
| to the numeric kind value @samp{kind}. |
| |
| @item Filename |
| @cindex Filename field (Ruby) |
| The filename corresponding to where the node was created during parsing. |
| This method is only generated if @samp{%option track_lines} was |
| specified. |
| |
| @item Linenum() |
| @cindex Linenum field (Ruby) |
| The line number corresponding to where the node was created during |
| parsing. This method is only generated if @samp{%option track_lines} |
| was specified. |
| |
| @end table |
| |
| @c Don't know if this is true for ruby |
| @ignore |
| If the generated code is non-reentrant, then the constructor for the |
| class can be used to construct nodes of the specified node type. The |
| constructor parameters are the same as the fields within the node type's |
| definition, except for @samp{%nocreate} fields. |
| |
| If the generated code is reentrant, then nodes cannot be constructed |
| using the C# @samp{new} operator. The @samp{*Create} methods |
| on the @samp{YYNODESTATE} factory class must be used instead. |
| @end ignore |
| |
| Enumerated types are converted into a Ruby @samp{class} definition: |
| |
| @example |
| %enum JavaType = |
| @{ |
| JT_BYTE, |
| JT_SHORT, |
| JT_CHAR, |
| JT_INT, |
| JT_LONG, |
| JT_FLOAT, |
| JT_DOUBLE, |
| JT_OBJECT_REF |
| @} |
| @end example |
| |
| becomes: |
| |
| @example |
| |
| class JavaType |
| JT_BYTE = 0 |
| JT_SHORT = 1 |
| JT_CHAR = 2 |
| JT_INT = 3 |
| JT_LONG = 4 |
| JT_FLOAT = 5 |
| JT_DOUBLE = 6 |
| JT_OBJECT_REF = 7 |
| end |
| |
| @end example |
| |
| @c |
| Virtual operations are converted into public methods on the Ruby |
| node classes. |
| |
| Non-virtual operations are converted into a class method within |
| a class named for the operation. For example, |
| |
| @example |
| %operation void InferType::infer_type(expression e) |
| @end example |
| |
| becomes: |
| |
| @example |
| class InferType |
| def InferType.infer_type(expression e) |
| ... |
| end |
| end |
| @end example |
| |
| If the class name (@samp{InferType} in the above example) is omitted, |
| then the name of the operation is used as both the class name and the |
| the method name. You then get a method with a name starting with an |
| uppercase letter. However, Ruby methods start with lowercase methods. |
| So never forget the class name. |
| |
| The @samp{YYNODESTATE} class contains a number of house-keeping methods |
| that are used to manage nodes: |
| |
| @table @code |
| @item YYNODESTATE::state() |
| @cindex state field (Ruby) |
| Gets the global @samp{YYNODESTATE} instance that is being used by |
| non-reentrant code. If an instance has not yet been created, |
| this method will create one. |
| |
| @c Don't know if the following is correct |
| @ignore |
| |
| When using non-reentrant code, the programmer will normally subclass |
| @samp{YYNODESTATE}, override some of the methods below, and then |
| construct an instance of the subclass. This constructed instance |
| will then be returned by future calls to @samp{getState}. |
| |
| This method will not be present if a reentrant system is being |
| generated. |
| |
| @end ignore |
| |
| @item currFilename |
| @cindex currFilename field (Ruby) |
| The name of the current input file from the parser. This fields |
| accessor method is usually overrriden by the programmer in subclasses if |
| @samp{%option track_lines} was specified. |
| |
| @item currLinenum |
| @cindex currLinenum field (Ruby) |
| The number of the current input line from the parser. This fields |
| accessor method is usually overridden by the programmer in subclasses if |
| @samp{%option track_lines} was specified. |
| @end table |
| |
| The programmer will typically subclass @samp{YYNODESTATE} to provide |
| additional functionality, and then create an instance of this class |
| to act as the global state and node creation factory. |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node Full Expression Example, EBNF Syntax, Ruby Language, Top |
| @appendix Full expression example code |
| @cindex Full expression example |
| |
| The full treecc input file for the expression example is as follows: |
| |
| @example |
| %enum type_code = |
| @{ |
| int_type, |
| float_type |
| @} |
| |
| %node expression %abstract %typedef = |
| @{ |
| %nocreate type_code type = @{int_type@}; |
| @} |
| |
| %node binary expression %abstract = |
| @{ |
| expression *expr1; |
| expression *expr2; |
| @} |
| |
| %node unary expression %abstract = |
| @{ |
| expression *expr; |
| @} |
| |
| %node intnum expression = |
| @{ |
| int num; |
| @} |
| |
| %node floatnum expression = |
| @{ |
| float num; |
| @} |
| |
| %node plus binary |
| %node minus binary |
| %node multiply binary |
| %node divide binary |
| %node power binary |
| %node negate unary |
| |
| %operation void infer_type(expression *e) |
| |
| infer_type(binary) |
| @{ |
| infer_type(e->expr1); |
| infer_type(e->expr2); |
| |
| if(e->expr1->type == float_type || e->expr2->type == float_type) |
| @{ |
| e->type = float_type; |
| @} |
| else |
| @{ |
| e->type = int_type; |
| @} |
| @} |
| |
| infer_type(unary) |
| @{ |
| infer_type(e->expr); |
| e->type = e->expr->type; |
| @} |
| |
| infer_type(intnum) |
| @{ |
| e->type = int_type; |
| @} |
| |
| infer_type(floatnum) |
| @{ |
| e->type = float_type; |
| @} |
| |
| infer_type(power) |
| @{ |
| infer_type(e->expr1); |
| infer_type(e->expr2); |
| |
| if(e->expr2->type != int_type) |
| @{ |
| error("second argument to `^' is not an integer"); |
| @} |
| |
| e->type = e->expr1->type; |
| @} |
| @end example |
| |
| The full yacc grammar is as follows: |
| |
| @example |
| %union @{ |
| expression *node; |
| int inum; |
| float fnum; |
| @} |
| |
| %token INT FLOAT |
| |
| %type <node> expr |
| %type <inum> INT |
| %type <fnum> FLOAT |
| |
| %% |
| |
| expr: INT @{ $$ = intnum_create($1); @} |
| | FLOAT @{ $$ = floatnum_create($1); @} |
| | '(' expr ')' @{ $$ = $2; @} |
| | expr '+' expr @{ $$ = plus_create($1, $3); @} |
| | expr '-' expr @{ $$ = minus_create($1, $3); @} |
| | expr '*' expr @{ $$ = multiply_create($1, $3); @} |
| | expr '/' expr @{ $$ = divide_create($1, $3); @} |
| | expr '^' expr @{ $$ = power_create($1, $3); @} |
| | '-' expr @{ $$ = negate_create($2); @} |
| ; |
| @end example |
| |
| @c ----------------------------------------------------------------------- |
| |
| @node EBNF Syntax, Index, Full Expression Example, Top |
| @appendix EBNF syntax for treecc input files |
| @cindex EBNF syntax |
| |
| The EBNF syntax for treecc input files uses the following |
| lexical tokens: |
| |
| @example |
| IDENTIFIER ::= <A-Za-z_> @{ <A-Za-z0-9_> @} |
| |
| STRING ::= '"' <anything that does not include '"'> '"' |
| | "'" <anything that does not include "'"> "'" |
| |
| LITERAL_DEFNS ::= "%@{" <anything except "%@}"> "%@}" |
| |
| LITERAL_END ::= "%%" <any character sequence until EOF> |
| |
| LITERAL_CODE ::= '@{' <anything with matched '@{' and '@}'> '@}' |
| @end example |
| |
| In addition, anything that begins with "%" in the following syntax |
| is a lexical keyword. |
| |
| The EBNF syntax is as follows: |
| |
| @example |
| File ::= @{ Declaration @} |
| |
| Declaration ::= Node |
| | Operation |
| | OperationCase |
| | Option |
| | Enum |
| | Literal |
| | Header |
| | Output |
| | Common |
| | Include |
| |
| Node ::= %node IDENTIFIER [ IDENTIFIER ] @{ NodeFlag @} [ '=' Fields ] |
| |
| NodeFlag ::= %abstract | %typedef |
| |
| Fields ::= '@{' @{ Field @} '@}' |
| |
| Field ::= [ %nocreate ] TypeAndName [ '=' LITERAL_CODE ] ';' |
| |
| TypeAndName ::= Type [ IDENTIFIER ] |
| |
| Type ::= TypeName |
| | Type '*' |
| | Type '&' |
| | Type '[' ']' |
| |
| TypeName ::= IDENTIFIER @{ IDENTIFIER @} |
| |
| Operation ::= %operation @{ OperFlag @} Type |
| [ ClassName ] IDENTIFIER '(' [ Params ] ')' |
| [ '=' LITERAL_CODE ] [ ';' ] |
| |
| OperFlag ::= %virtual | %inline | %split |
| |
| ClassName ::= IDENTIFIER "::" |
| |
| Params ::= Param @{ ',' Param @} |
| |
| Param ::= TypeAndName | '[' TypeAndName ']' |
| |
| OperationCase ::= OperationHead @{ ',' OperationHead @} LITERAL_CODE |
| |
| OperationHead ::= IDENTIFIER '(' [ TypeList ] ')' |
| |
| TypeList ::= IDENTIFIER @{ ',' IDENTIFIER @} |
| |
| Option ::= %option IDENTIFIER [ '=' Value ] |
| |
| Value ::= IDENTIFIER | STRING |
| |
| Enum ::= %enum IDENTIFIER '=' '@{' EnumBody [ ',' ] '@}' |
| |
| EnumBody ::= IDENTIFIER @{ ',' IDENTIFIER @} |
| |
| Literal ::= @{ LiteralFlag @} (LITERAL_DEFNS | LITERAL_END) |
| |
| LiteralFlag ::= %both | %decls | %end |
| |
| Header ::= %header STRING |
| |
| Output ::= %output STRING |
| |
| Common ::= %common |
| |
| Include ::= %include [ %readonly ] STRING |
| |
| @end example |
| |
| @c ----------------------------------------------------------------------- |
| |
| @page |
| |
| @node Index, , EBNF Syntax, Top |
| @unnumbered Index |
| |
| @printindex cp |
| |
| @contents |
| @bye |