| \input texinfo |
| @setfilename internals.info |
| @node Top |
| @top Assembler Internals |
| @raisesections |
| @cindex internals |
| |
| This chapter describes the internals of the assembler. It is incomplete, but |
| it may help a bit. |
| |
| This chapter was last modified on $Date$. It is not updated regularly, and it |
| may be out of date. |
| |
| @menu |
| * GAS versions:: GAS versions |
| * Data types:: Data types |
| * GAS processing:: What GAS does when it runs |
| * Porting GAS:: Porting GAS |
| * Relaxation:: Relaxation |
| * Broken words:: Broken words |
| * Internal functions:: Internal functions |
| * Test suite:: Test suite |
| @end menu |
| |
| @node GAS versions |
| @section GAS versions |
| |
| GAS has acquired layers of code over time. The original GAS only supported the |
| a.out object file format, with three sections. Support for multiple sections |
| has been added in two different ways. |
| |
| The preferred approach is to use the version of GAS created when the symbol |
| @code{BFD_ASSEMBLER} is defined. The other versions of GAS are documented for |
| historical purposes, and to help anybody who has to debug code written for |
| them. |
| |
| The type @code{segT} is used to represent a section in code which must work |
| with all versions of GAS. |
| |
| @menu |
| * Original GAS:: Original GAS version |
| * MANY_SEGMENTS:: MANY_SEGMENTS gas version |
| * BFD_ASSEMBLER:: BFD_ASSEMBLER gas version |
| @end menu |
| |
| @node Original GAS |
| @subsection Original GAS |
| |
| The original GAS only supported the a.out object file format with three |
| sections: @samp{.text}, @samp{.data}, and @samp{.bss}. This is the version of |
| GAS that is compiled if neither @code{BFD_ASSEMBLER} nor @code{MANY_SEGMENTS} |
| is defined. This version of GAS is still used for the m68k-aout target, and |
| perhaps others. |
| |
| This version of GAS should not be used for any new development. |
| |
| There is still code that is specific to this version of GAS, notably in |
| @file{write.c}. There is no way for this code to loop through all the |
| sections; it simply looks at global variables like @code{text_frag_root} and |
| @code{data_frag_root}. |
| |
| The type @code{segT} is an enum. |
| |
| @node MANY_SEGMENTS |
| @subsection MANY_SEGMENTS gas version |
| @cindex MANY_SEGMENTS |
| |
| The @code{MANY_SEGMENTS} version of gas is only used for COFF. It uses the BFD |
| library, but it writes out all the data itself using @code{bfd_write}. This |
| version of gas supports up to 40 normal sections. The section names are stored |
| in the @code{seg_name} array. Other information is stored in the |
| @code{segment_info} array. |
| |
| The type @code{segT} is an enum. Code that wants to examine all the sections |
| can use a @code{segT} variable as loop index from @code{SEG_E0} up to but not |
| including @code{SEG_UNKNOWN}. |
| |
| Most of the code specific to this version of GAS is in the file |
| @file{config/obj-coff.c}, in the portion of that file that is compiled when |
| @code{BFD_ASSEMBLER} is not defined. |
| |
| This version of GAS is still used for several COFF targets. |
| |
| @node BFD_ASSEMBLER |
| @subsection BFD_ASSEMBLER gas version |
| @cindex BFD_ASSEMBLER |
| |
| The preferred version of GAS is the @code{BFD_ASSEMBLER} version. In this |
| version of GAS, the output file is a normal BFD, and the BFD routines are used |
| to generate the output. |
| |
| @code{BFD_ASSEMBLER} will automatically be used for certain targets, including |
| those that use the ELF, ECOFF, and SOM object file formats, and also all Alpha, |
| MIPS, PowerPC, and SPARC targets. You can force the use of |
| @code{BFD_ASSEMBLER} for other targets with the configure option |
| @samp{--enable-bfd-assembler}; however, it has not been tested for many |
| targets, and can not be assumed to work. |
| |
| @node Data types |
| @section Data types |
| @cindex internals, data types |
| |
| This section describes some fundamental GAS data types. |
| |
| @menu |
| * Symbols:: The symbolS structure |
| * Expressions:: The expressionS structure |
| * Fixups:: The fixS structure |
| * Frags:: The fragS structure |
| @end menu |
| |
| @node Symbols |
| @subsection Symbols |
| @cindex internals, symbols |
| @cindex symbols, internal |
| @cindex symbolS structure |
| |
| The definition for @code{struct symbol}, also known as @code{symbolS}, is |
| located in @file{struc-symbol.h}. Symbol structures contain the following |
| fields: |
| |
| @table @code |
| @item sy_value |
| This is an @code{expressionS} that describes the value of the symbol. It might |
| refer to one or more other symbols; if so, its true value may not be known |
| until @code{resolve_symbol_value} is called in @code{write_object_file}. |
| |
| The expression is often simply a constant. Before @code{resolve_symbol_value} |
| is called, the value is the offset from the frag (@pxref{Frags}). Afterward, |
| the frag address has been added in. |
| |
| @item sy_resolved |
| This field is non-zero if the symbol's value has been completely resolved. It |
| is used during the final pass over the symbol table. |
| |
| @item sy_resolving |
| This field is used to detect loops while resolving the symbol's value. |
| |
| @item sy_used_in_reloc |
| This field is non-zero if the symbol is used by a relocation entry. If a local |
| symbol is used in a relocation entry, it must be possible to redirect those |
| relocations to other symbols, or this symbol cannot be removed from the final |
| symbol list. |
| |
| @item sy_next |
| @itemx sy_previous |
| These pointers to other @code{symbolS} structures describe a singly or doubly |
| linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not defined, the |
| @code{sy_previous} field will be omitted; @code{SYMBOLS_NEED_BACKPOINTERS} is |
| always defined if @code{BFD_ASSEMBLER}.) These fields should be accessed with |
| the @code{symbol_next} and @code{symbol_previous} macros. |
| |
| @item sy_frag |
| This points to the frag (@pxref{Frags}) that this symbol is attached to. |
| |
| @item sy_used |
| Whether the symbol is used as an operand or in an expression. Note: Not all of |
| the backends keep this information accurate; backends which use this bit are |
| responsible for setting it when a symbol is used in backend routines. |
| |
| @item sy_mri_common |
| Whether the symbol is an MRI common symbol created by the @code{COMMON} |
| pseudo-op when assembling in MRI mode. |
| |
| @item bsym |
| If @code{BFD_ASSEMBLER} is defined, this points to the BFD @code{asymbol} that |
| will be used in writing the object file. |
| |
| @item sy_name_offset |
| (Only used if @code{BFD_ASSEMBLER} is not defined.) This is the position of |
| the symbol's name in the string table of the object file. On some formats, |
| this will start at position 4, with position 0 reserved for unnamed symbols. |
| This field is not used until @code{write_object_file} is called. |
| |
| @item sy_symbol |
| (Only used if @code{BFD_ASSEMBLER} is not defined.) This is the |
| format-specific symbol structure, as it would be written into the object file. |
| |
| @item sy_number |
| (Only used if @code{BFD_ASSEMBLER} is not defined.) This is a 24-bit symbol |
| number, for use in constructing relocation table entries. |
| |
| @item sy_obj |
| This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no macro by |
| that name is defined in @file{obj-format.h}, this field is not defined. |
| |
| @item sy_tc |
| This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no macro |
| by that name is defined in @file{targ-cpu.h}, this field is not defined. |
| |
| @item TARGET_SYMBOL_FIELDS |
| If this macro is defined, it defines additional fields in the symbol structure. |
| This macro is obsolete, and should be replaced when possible by uses of |
| @code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}. |
| @end table |
| |
| There are a number of access routines used to extract the fields of a |
| @code{symbolS} structure. When possible, these routines should be used rather |
| than referring to the fields directly. These routines will work for any GAS |
| version. |
| |
| @table @code |
| @item S_SET_VALUE |
| @cindex S_SET_VALUE |
| Set the symbol's value. |
| |
| @item S_GET_VALUE |
| @cindex S_GET_VALUE |
| Get the symbol's value. This will cause @code{resolve_symbol_value} to be |
| called if necessary, so @code{S_GET_VALUE} should only be called when it is |
| safe to resolve symbols (i.e., after the entire input file has been read and |
| all symbols have been defined). |
| |
| @item S_SET_SEGMENT |
| @cindex S_SET_SEGMENT |
| Set the section of the symbol. |
| |
| @item S_GET_SEGMENT |
| @cindex S_GET_SEGMENT |
| Get the symbol's section. |
| |
| @item S_GET_NAME |
| @cindex S_GET_NAME |
| Get the name of the symbol. |
| |
| @item S_SET_NAME |
| @cindex S_SET_NAME |
| Set the name of the symbol. |
| |
| @item S_IS_EXTERNAL |
| @cindex S_IS_EXTERNAL |
| Return non-zero if the symbol is externally visible. |
| |
| @item S_IS_EXTERN |
| @cindex S_IS_EXTERN |
| A synonym for @code{S_IS_EXTERNAL}. Don't use it. |
| |
| @item S_IS_WEAK |
| @cindex S_IS_WEAK |
| Return non-zero if the symbol is weak. |
| |
| @item S_IS_COMMON |
| @cindex S_IS_COMMON |
| Return non-zero if this is a common symbol. Common symbols are sometimes |
| represented as undefined symbols with a value, in which case this function will |
| not be reliable. |
| |
| @item S_IS_DEFINED |
| @cindex S_IS_DEFINED |
| Return non-zero if this symbol is defined. This function is not reliable when |
| called on a common symbol. |
| |
| @item S_IS_DEBUG |
| @cindex S_IS_DEBUG |
| Return non-zero if this is a debugging symbol. |
| |
| @item S_IS_LOCAL |
| @cindex S_IS_LOCAL |
| Return non-zero if this is a local assembler symbol which should not be |
| included in the final symbol table. Note that this is not the opposite of |
| @code{S_IS_EXTERNAL}. The @samp{-L} assembler option affects the return value |
| of this function. |
| |
| @item S_SET_EXTERNAL |
| @cindex S_SET_EXTERNAL |
| Mark the symbol as externally visible. |
| |
| @item S_CLEAR_EXTERNAL |
| @cindex S_CLEAR_EXTERNAL |
| Mark the symbol as not externally visible. |
| |
| @item S_SET_WEAK |
| @cindex S_SET_WEAK |
| Mark the symbol as weak. |
| |
| @item S_GET_TYPE |
| @item S_GET_DESC |
| @item S_GET_OTHER |
| @cindex S_GET_TYPE |
| @cindex S_GET_DESC |
| @cindex S_GET_OTHER |
| Get the @code{type}, @code{desc}, and @code{other} fields of the symbol. These |
| are only defined for object file formats for which they make sense (primarily |
| a.out). |
| |
| @item S_SET_TYPE |
| @item S_SET_DESC |
| @item S_SET_OTHER |
| @cindex S_SET_TYPE |
| @cindex S_SET_DESC |
| @cindex S_SET_OTHER |
| Set the @code{type}, @code{desc}, and @code{other} fields of the symbol. These |
| are only defined for object file formats for which they make sense (primarily |
| a.out). |
| |
| @item S_GET_SIZE |
| @cindex S_GET_SIZE |
| Get the size of a symbol. This is only defined for object file formats for |
| which it makes sense (primarily ELF). |
| |
| @item S_SET_SIZE |
| @cindex S_SET_SIZE |
| Set the size of a symbol. This is only defined for object file formats for |
| which it makes sense (primarily ELF). |
| @end table |
| |
| @node Expressions |
| @subsection Expressions |
| @cindex internals, expressions |
| @cindex expressions, internal |
| @cindex expressionS structure |
| |
| Expressions are stored in an @code{expressionS} structure. The structure is |
| defined in @file{expr.h}. |
| |
| @cindex expression |
| The macro @code{expression} will create an @code{expressionS} structure based |
| on the text found at the global variable @code{input_line_pointer}. |
| |
| @cindex make_expr_symbol |
| @cindex expr_symbol_where |
| A single @code{expressionS} structure can represent a single operation. |
| Complex expressions are formed by creating @dfn{expression symbols} and |
| combining them in @code{expressionS} structures. An expression symbol is |
| created by calling @code{make_expr_symbol}. An expression symbol should |
| naturally never appear in a symbol table, and the implementation of |
| @code{S_IS_LOCAL} (@pxref{Symbols}) reflects that. The function |
| @code{expr_symbol_where} returns non-zero if a symbol is an expression symbol, |
| and also returns the file and line for the expression which caused it to be |
| created. |
| |
| The @code{expressionS} structure has two symbol fields, a number field, an |
| operator field, and a field indicating whether the number is unsigned. |
| |
| The operator field is of type @code{operatorT}, and describes how to interpret |
| the other fields; see the definition in @file{expr.h} for the possibilities. |
| |
| An @code{operatorT} value of @code{O_big} indicates either a floating point |
| number, stored in the global variable @code{generic_floating_point_number}, or |
| an integer to large to store in an @code{offsetT} type, stored in the global |
| array @code{generic_bignum}. This rather inflexible approach makes it |
| impossible to use floating point numbers or large expressions in complex |
| expressions. |
| |
| @node Fixups |
| @subsection Fixups |
| @cindex internals, fixups |
| @cindex fixups |
| @cindex fixS structure |
| |
| A @dfn{fixup} is basically anything which can not be resolved in the first |
| pass. Sometimes a fixup can be resolved by the end of the assembly; if not, |
| the fixup becomes a relocation entry in the object file. |
| |
| @cindex fix_new |
| @cindex fix_new_exp |
| A fixup is created by a call to @code{fix_new} or @code{fix_new_exp}. Both |
| take a frag (@pxref{Frags}), a position within the frag, a size, an indication |
| of whether the fixup is PC relative, and a type. In a @code{BFD_ASSEMBLER} |
| GAS, the type is nominally a @code{bfd_reloc_code_real_type}, but several |
| targets use other type codes to represent fixups that can not be described as |
| relocations. |
| |
| The @code{fixS} structure has a number of fields, several of which are obsolete |
| or are only used by a particular target. The important fields are: |
| |
| @table @code |
| @item fx_frag |
| The frag (@pxref{Frags}) this fixup is in. |
| |
| @item fx_where |
| The location within the frag where the fixup occurs. |
| |
| @item fx_addsy |
| The symbol this fixup is against. Typically, the value of this symbol is added |
| into the object contents. This may be NULL. |
| |
| @item fx_subsy |
| The value of this symbol is subtracted from the object contents. This is |
| normally NULL. |
| |
| @item fx_offset |
| A number which is added into the fixup. |
| |
| @item fx_addnumber |
| Some CPU backends use this field to convey information between |
| @code{md_apply_fix} and @code{tc_gen_reloc}. The machine independent code does |
| not use it. |
| |
| @item fx_next |
| The next fixup in the section. |
| |
| @item fx_r_type |
| The type of the fixup. This field is only defined if @code{BFD_ASSEMBLER}, or |
| if the target defines @code{NEED_FX_R_TYPE}. |
| |
| @item fx_size |
| The size of the fixup. This is mostly used for error checking. |
| |
| @item fx_pcrel |
| Whether the fixup is PC relative. |
| |
| @item fx_done |
| Non-zero if the fixup has been applied, and no relocation entry needs to be |
| generated. |
| |
| @item fx_file |
| @itemx fx_line |
| The file and line where the fixup was created. |
| |
| @item tc_fix_data |
| This has the type @code{TC_FIX_TYPE}, and is only defined if the target defines |
| that macro. |
| @end table |
| |
| @node Frags |
| @subsection Frags |
| @cindex internals, frags |
| @cindex frags |
| @cindex fragS structure. |
| |
| The @code{fragS} structure is defined in @file{as.h}. Each frag represents a |
| portion of the final object file. As GAS reads the source file, it creates |
| frags to hold the data that it reads. At the end of the assembly the frags and |
| fixups are processed to produce the final contents. |
| |
| @table @code |
| @item fr_address |
| The address of the frag. This is not set until the assembler rescans the list |
| of all frags after the entire input file is parsed. The function |
| @code{relax_segment} fills in this field. |
| |
| @item fr_next |
| Pointer to the next frag in this (sub)section. |
| |
| @item fr_fix |
| Fixed number of characters we know we're going to emit to the output file. May |
| be zero. |
| |
| @item fr_var |
| Variable number of characters we may output, after the initial @code{fr_fix} |
| characters. May be zero. |
| |
| @item fr_offset |
| The interpretation of this field is controlled by @code{fr_type}. Generally, |
| if @code{fr_var} is non-zero, this is a repeat count: the @code{fr_var} |
| characters are output @code{fr_offset} times. |
| |
| @item line |
| Holds line number info when an assembler listing was requested. |
| |
| @item fr_type |
| Relaxation state. This field indicates the interpretation of @code{fr_offset}, |
| @code{fr_symbol} and the variable-length tail of the frag, as well as the |
| treatment it gets in various phases of processing. It does not affect the |
| initial @code{fr_fix} characters; they are always supposed to be output |
| verbatim (fixups aside). See below for specific values this field can have. |
| |
| @item fr_subtype |
| Relaxation substate. If the macro @code{md_relax_frag} isn't defined, this is |
| assumed to be an index into @code{TC_GENERIC_RELAX_TABLE} for the generic |
| relaxation code to process (@pxref{Relaxation}). If @code{md_relax_frag} is |
| defined, this field is available for any use by the CPU-specific code. |
| |
| @item fr_symbol |
| This normally indicates the symbol to use when relaxing the frag according to |
| @code{fr_type}. |
| |
| @item fr_opcode |
| Points to the lowest-addressed byte of the opcode, for use in relaxation. |
| |
| @item tc_frag_data |
| Target specific fragment data of type TC_FRAG_TYPE. |
| Only present if @code{TC_FRAG_TYPE} is defined. |
| |
| @item fr_file |
| @itemx fr_line |
| The file and line where this frag was last modified. |
| |
| @item fr_literal |
| Declared as a one-character array, this last field grows arbitrarily large to |
| hold the actual contents of the frag. |
| @end table |
| |
| These are the possible relaxation states, provided in the enumeration type |
| @code{relax_stateT}, and the interpretations they represent for the other |
| fields: |
| |
| @table @code |
| @item rs_align |
| @itemx rs_align_code |
| The start of the following frag should be aligned on some boundary. In this |
| frag, @code{fr_offset} is the logarithm (base 2) of the alignment in bytes. |
| (For example, if alignment on an 8-byte boundary were desired, @code{fr_offset} |
| would have a value of 3.) The variable characters indicate the fill pattern to |
| be used. The @code{fr_subtype} field holds the maximum number of bytes to skip |
| when doing this alignment. If more bytes are needed, the alignment is not |
| done. An @code{fr_subtype} value of 0 means no maximum, which is the normal |
| case. Target backends can use @code{rs_align_code} to handle certain types of |
| alignment differently. |
| |
| @item rs_broken_word |
| This indicates that ``broken word'' processing should be done (@pxref{Broken |
| words}). If broken word processing is not necessary on the target machine, |
| this enumerator value will not be defined. |
| |
| @item rs_cfa |
| This state is used to implement exception frame optimizations. The |
| @code{fr_symbol} is an expression symbol for the subtraction which may be |
| relaxed. The @code{fr_opcode} field holds the frag for the preceding command |
| byte. The @code{fr_offset} field holds the offset within that frag. The |
| @code{fr_subtype} field is used during relaxation to hold the current size of |
| the frag. |
| |
| @item rs_fill |
| The variable characters are to be repeated @code{fr_offset} times. If |
| @code{fr_offset} is 0, this frag has a length of @code{fr_fix}. Most frags |
| have this type. |
| |
| @item rs_leb128 |
| This state is used to implement the DWARF ``little endian base 128'' |
| variable length number format. The @code{fr_symbol} is always an expression |
| symbol, as constant expressions are emitted directly. The @code{fr_offset} |
| field is used during relaxation to hold the previous size of the number so |
| that we can determine if the fragment changed size. |
| |
| @item rs_machine_dependent |
| Displacement relaxation is to be done on this frag. The target is indicated by |
| @code{fr_symbol} and @code{fr_offset}, and @code{fr_subtype} indicates the |
| particular machine-specific addressing mode desired. @xref{Relaxation}. |
| |
| @item rs_org |
| The start of the following frag should be pushed back to some specific offset |
| within the section. (Some assemblers use the value as an absolute address; GAS |
| does not handle final absolute addresses, but rather requires that the linker |
| set them.) The offset is given by @code{fr_symbol} and @code{fr_offset}; one |
| character from the variable-length tail is used as the fill character. |
| @end table |
| |
| @cindex frchainS structure |
| A chain of frags is built up for each subsection. The data structure |
| describing a chain is called a @code{frchainS}, and contains the following |
| fields: |
| |
| @table @code |
| @item frch_root |
| Points to the first frag in the chain. May be NULL if there are no frags in |
| this chain. |
| @item frch_last |
| Points to the last frag in the chain, or NULL if there are none. |
| @item frch_next |
| Next in the list of @code{frchainS} structures. |
| @item frch_seg |
| Indicates the section this frag chain belongs to. |
| @item frch_subseg |
| Subsection (subsegment) number of this frag chain. |
| @item fix_root, fix_tail |
| (Defined only if @code{BFD_ASSEMBLER} is defined). Point to first and last |
| @code{fixS} structures associated with this subsection. |
| @item frch_obstack |
| Not currently used. Intended to be used for frag allocation for this |
| subsection. This should reduce frag generation caused by switching sections. |
| @item frch_frag_now |
| The current frag for this subsegment. |
| @end table |
| |
| A @code{frchainS} corresponds to a subsection; each section has a list of |
| @code{frchainS} records associated with it. In most cases, only one subsection |
| of each section is used, so the list will only be one element long, but any |
| processing of frag chains should be prepared to deal with multiple chains per |
| section. |
| |
| After the input files have been completely processed, and no more frags are to |
| be generated, the frag chains are joined into one per section for further |
| processing. After this point, it is safe to operate on one chain per section. |
| |
| The assembler always has a current frag, named @code{frag_now}. More space is |
| allocated for the current frag using the @code{frag_more} function; this |
| returns a pointer to the amount of requested space. Relaxing is done using |
| variant frags allocated by @code{frag_var} or @code{frag_variant} |
| (@pxref{Relaxation}). |
| |
| @node GAS processing |
| @section What GAS does when it runs |
| @cindex internals, overview |
| |
| This is a quick look at what an assembler run looks like. |
| |
| @itemize @bullet |
| @item |
| The assembler initializes itself by calling various init routines. |
| |
| @item |
| For each source file, the @code{read_a_source_file} function reads in the file |
| and parses it. The global variable @code{input_line_pointer} points to the |
| current text; it is guaranteed to be correct up to the end of the line, but not |
| farther. |
| |
| @item |
| For each line, the assembler passes labels to the @code{colon} function, and |
| isolates the first word. If it looks like a pseudo-op, the word is looked up |
| in the pseudo-op hash table @code{po_hash} and dispatched to a pseudo-op |
| routine. Otherwise, the target dependent @code{md_assemble} routine is called |
| to parse the instruction. |
| |
| @item |
| When pseudo-ops or instructions output data, they add it to a frag, calling |
| @code{frag_more} to get space to store it in. |
| |
| @item |
| Pseudo-ops and instructions can also output fixups created by @code{fix_new} or |
| @code{fix_new_exp}. |
| |
| @item |
| For certain targets, instructions can create variant frags which are used to |
| store relaxation information (@pxref{Relaxation}). |
| |
| @item |
| When the input file is finished, the @code{write_object_file} routine is |
| called. It assigns addresses to all the frags (@code{relax_segment}), resolves |
| all the fixups (@code{fixup_segment}), resolves all the symbol values (using |
| @code{resolve_symbol_value}), and finally writes out the file (in the |
| @code{BFD_ASSEMBLER} case, this is done by simply calling @code{bfd_close}). |
| @end itemize |
| |
| @node Porting GAS |
| @section Porting GAS |
| @cindex porting |
| |
| Each GAS target specifies two main things: the CPU file and the object format |
| file. Two main switches in the @file{configure.in} file handle this. The |
| first switches on CPU type to set the shell variable @code{cpu_type}. The |
| second switches on the entire target to set the shell variable @code{fmt}. |
| |
| The configure script uses the value of @code{cpu_type} to select two files in |
| the @file{config} directory: @file{tc-@var{CPU}.c} and @file{tc-@var{CPU}.h}. |
| The configuration process will create a file named @file{targ-cpu.h} in the |
| build directory which includes @file{tc-@var{CPU}.h}. |
| |
| The configure script also uses the value of @code{fmt} to select two files: |
| @file{obj-@var{fmt}.c} and @file{obj-@var{fmt}.h}. The configuration process |
| will create a file named @file{obj-format.h} in the build directory which |
| includes @file{obj-@var{fmt}.h}. |
| |
| You can also set the emulation in the configure script by setting the @code{em} |
| variable. Normally the default value of @samp{generic} is fine. The |
| configuration process will create a file named @file{targ-env.h} in the build |
| directory which includes @file{te-@var{em}.h}. |
| |
| Porting GAS to a new CPU requires writing the @file{tc-@var{CPU}} files. |
| Porting GAS to a new object file format requires writing the |
| @file{obj-@var{fmt}} files. There is sometimes some interaction between these |
| two files, but it is normally minimal. |
| |
| The best approach is, of course, to copy existing files. The documentation |
| below assumes that you are looking at existing files to see usage details. |
| |
| These interfaces have grown over time, and have never been carefully thought |
| out or designed. Nothing about the interfaces described here is cast in stone. |
| It is possible that they will change from one version of the assembler to the |
| next. Also, new macros are added all the time as they are needed. |
| |
| @menu |
| * CPU backend:: Writing a CPU backend |
| * Object format backend:: Writing an object format backend |
| * Emulations:: Writing emulation files |
| @end menu |
| |
| @node CPU backend |
| @subsection Writing a CPU backend |
| @cindex CPU backend |
| @cindex @file{tc-@var{CPU}} |
| |
| The CPU backend files are the heart of the assembler. They are the only parts |
| of the assembler which actually know anything about the instruction set of the |
| processor. |
| |
| You must define a reasonably small list of macros and functions in the CPU |
| backend files. You may define a large number of additional macros in the CPU |
| backend files, not all of which are documented here. You must, of course, |
| define macros in the @file{.h} file, which is included by every assembler |
| source file. You may define the functions as macros in the @file{.h} file, or |
| as functions in the @file{.c} file. |
| |
| @table @code |
| @item TC_@var{CPU} |
| @cindex TC_@var{CPU} |
| By convention, you should define this macro in the @file{.h} file. For |
| example, @file{tc-m68k.h} defines @code{TC_M68K}. You might have to use this |
| if it is necessary to add CPU specific code to the object format file. |
| |
| @item TARGET_FORMAT |
| This macro is the BFD target name to use when creating the output file. This |
| will normally depend upon the @code{OBJ_@var{FMT}} macro. |
| |
| @item TARGET_ARCH |
| This macro is the BFD architecture to pass to @code{bfd_set_arch_mach}. |
| |
| @item TARGET_MACH |
| This macro is the BFD machine number to pass to @code{bfd_set_arch_mach}. If |
| it is not defined, GAS will use 0. |
| |
| @item TARGET_BYTES_BIG_ENDIAN |
| You should define this macro to be non-zero if the target is big endian, and |
| zero if the target is little endian. |
| |
| @item md_shortopts |
| @itemx md_longopts |
| @itemx md_longopts_size |
| @itemx md_parse_option |
| @itemx md_show_usage |
| @cindex md_shortopts |
| @cindex md_longopts |
| @cindex md_longopts_size |
| @cindex md_parse_option |
| @cindex md_show_usage |
| GAS uses these variables and functions during option processing. |
| @code{md_shortopts} is a @code{const char *} which GAS adds to the machine |
| independent string passed to @code{getopt}. @code{md_longopts} is a |
| @code{struct option []} which GAS adds to the machine independent long options |
| passed to @code{getopt}; you may use @code{OPTION_MD_BASE}, defined in |
| @file{as.h}, as the start of a set of long option indices, if necessary. |
| @code{md_longopts_size} is a @code{size_t} holding the size @code{md_longopts}. |
| GAS will call @code{md_parse_option} whenever @code{getopt} returns an |
| unrecognized code, presumably indicating a special code value which appears in |
| @code{md_longopts}. GAS will call @code{md_show_usage} when a usage message is |
| printed; it should print a description of the machine specific options. |
| |
| @item md_begin |
| @cindex md_begin |
| GAS will call this function at the start of the assembly, after the command |
| line arguments have been parsed and all the machine independent initializations |
| have been completed. |
| |
| @item md_cleanup |
| @cindex md_cleanup |
| If you define this macro, GAS will call it at the end of each input file. |
| |
| @item md_assemble |
| @cindex md_assemble |
| GAS will call this function for each input line which does not contain a |
| pseudo-op. The argument is a null terminated string. The function should |
| assemble the string as an instruction with operands. Normally |
| @code{md_assemble} will do this by calling @code{frag_more} and writing out |
| some bytes (@pxref{Frags}). @code{md_assemble} will call @code{fix_new} to |
| create fixups as needed (@pxref{Fixups}). Targets which need to do special |
| purpose relaxation will call @code{frag_var}. |
| |
| @item md_pseudo_table |
| @cindex md_pseudo_table |
| This is a const array of type @code{pseudo_typeS}. It is a mapping from |
| pseudo-op names to functions. You should use this table to implement |
| pseudo-ops which are specific to the CPU. |
| |
| @item tc_conditional_pseudoop |
| @cindex tc_conditional_pseudoop |
| If this macro is defined, GAS will call it with a @code{pseudo_typeS} argument. |
| It should return non-zero if the pseudo-op is a conditional which controls |
| whether code is assembled, such as @samp{.if}. GAS knows about the normal |
| conditional pseudo-ops,and you should normally not have to define this macro. |
| |
| @item comment_chars |
| @cindex comment_chars |
| This is a null terminated @code{const char} array of characters which start a |
| comment. |
| |
| @item tc_comment_chars |
| @cindex tc_comment_chars |
| If this macro is defined, GAS will use it instead of @code{comment_chars}. |
| |
| @item tc_symbol_chars |
| @cindex tc_symbol_chars |
| If this macro is defined, it is a pointer to a null terminated list of |
| characters which may appear in an operand. GAS already assumes that all |
| alphanumberic characters, and @samp{$}, @samp{.}, and @samp{_} may appear in an |
| operand (see @samp{symbol_chars} in @file{app.c}). This macro may be defined |
| to treat additional characters as appearing in an operand. This affects the |
| way in which GAS removes whitespace before passing the string to |
| @samp{md_assemble}. |
| |
| @item line_comment_chars |
| @cindex line_comment_chars |
| This is a null terminated @code{const char} array of characters which start a |
| comment when they appear at the start of a line. |
| |
| @item line_separator_chars |
| @cindex line_separator_chars |
| This is a null terminated @code{const char} array of characters which separate |
| lines (semicolon and newline are such characters by default, and need not be |
| listed in this array). |
| |
| @item EXP_CHARS |
| @cindex EXP_CHARS |
| This is a null terminated @code{const char} array of characters which may be |
| used as the exponent character in a floating point number. This is normally |
| @code{"eE"}. |
| |
| @item FLT_CHARS |
| @cindex FLT_CHARS |
| This is a null terminated @code{const char} array of characters which may be |
| used to indicate a floating point constant. A zero followed by one of these |
| characters is assumed to be followed by a floating point number; thus they |
| operate the way that @code{0x} is used to indicate a hexadecimal constant. |
| Usually this includes @samp{r} and @samp{f}. |
| |
| @item LEX_AT |
| @cindex LEX_AT |
| You may define this macro to the lexical type of the @kbd{@}} character. The |
| default is zero. |
| |
| Lexical types are a combination of @code{LEX_NAME} and @code{LEX_BEGIN_NAME}, |
| both defined in @file{read.h}. @code{LEX_NAME} indicates that the character |
| may appear in a name. @code{LEX_BEGIN_NAME} indicates that the character may |
| appear at the beginning of a nem. |
| |
| @item LEX_BR |
| @cindex LEX_BR |
| You may define this macro to the lexical type of the brace characters @kbd{@{}, |
| @kbd{@}}, @kbd{[}, and @kbd{]}. The default value is zero. |
| |
| @item LEX_PCT |
| @cindex LEX_PCT |
| You may define this macro to the lexical type of the @kbd{%} character. The |
| default value is zero. |
| |
| @item LEX_QM |
| @cindex LEX_QM |
| You may define this macro to the lexical type of the @kbd{?} character. The |
| default value it zero. |
| |
| @item LEX_DOLLAR |
| @cindex LEX_DOLLAR |
| You may define this macro to the lexical type of the @kbd{$} character. The |
| default value is @code{LEX_NAME | LEX_BEGIN_NAME}. |
| |
| @item SINGLE_QUOTE_STRINGS |
| @cindex SINGLE_QUOTE_STRINGS |
| If you define this macro, GAS will treat single quotes as string delimiters. |
| Normally only double quotes are accepted as string delimiters. |
| |
| @item NO_STRING_ESCAPES |
| @cindex NO_STRING_ESCAPES |
| If you define this macro, GAS will not permit escape sequences in a string. |
| |
| @item ONLY_STANDARD_ESCAPES |
| @cindex ONLY_STANDARD_ESCAPES |
| If you define this macro, GAS will warn about the use of nonstandard escape |
| sequences in a string. |
| |
| @item md_start_line_hook |
| @cindex md_start_line_hook |
| If you define this macro, GAS will call it at the start of each line. |
| |
| @item LABELS_WITHOUT_COLONS |
| @cindex LABELS_WITHOUT_COLONS |
| If you define this macro, GAS will assume that any text at the start of a line |
| is a label, even if it does not have a colon. |
| |
| @item TC_START_LABEL |
| @cindex TC_START_LABEL |
| You may define this macro to control what GAS considers to be a label. The |
| default definition is to accept any name followed by a colon character. |
| |
| @item NO_PSEUDO_DOT |
| @cindex NO_PSEUDO_DOT |
| If you define this macro, GAS will not require pseudo-ops to start with a |
| @kbd{.} character. |
| |
| @item TC_EQUAL_IN_INSN |
| @cindex TC_EQUAL_IN_INSN |
| If you define this macro, it should return nonzero if the instruction is |
| permitted to contain an @kbd{=} character. GAS will use this to decide if a |
| @kbd{=} is an assignment or an instruction. |
| |
| @item TC_EOL_IN_INSN |
| @cindex TC_EOL_IN_INSN |
| If you define this macro, it should return nonzero if the current input line |
| pointer should be treated as the end of a line. |
| |
| @item md_parse_name |
| @cindex md_parse_name |
| If this macro is defined, GAS will call it for any symbol found in an |
| expression. You can define this to handle special symbols in a special way. |
| If a symbol always has a certain value, you should normally enter it in the |
| symbol table, perhaps using @code{reg_section}. |
| |
| @item md_undefined_symbol |
| @cindex md_undefined_symbol |
| GAS will call this function when a symbol table lookup fails, before it |
| creates a new symbol. Typically this would be used to supply symbols whose |
| name or value changes dynamically, possibly in a context sensitive way. |
| Predefined symbols with fixed values, such as register names or condition |
| codes, are typically entered directly into the symbol table when @code{md_begin} |
| is called. |
| |
| @item md_operand |
| @cindex md_operand |
| GAS will call this function for any expression that can not be recognized. |
| When the function is called, @code{input_line_pointer} will point to the start |
| of the expression. |
| |
| @item tc_unrecognized_line |
| @cindex tc_unrecognized_line |
| If you define this macro, GAS will call it when it finds a line that it can not |
| parse. |
| |
| @item md_do_align |
| @cindex md_do_align |
| You may define this macro to handle an alignment directive. GAS will call it |
| when the directive is seen in the input file. For example, the i386 backend |
| uses this to generate efficient nop instructions of varying lengths, depending |
| upon the number of bytes that the alignment will skip. |
| |
| @item HANDLE_ALIGN |
| @cindex HANDLE_ALIGN |
| You may define this macro to do special handling for an alignment directive. |
| GAS will call it at the end of the assembly. |
| |
| @item md_flush_pending_output |
| @cindex md_flush_pending_output |
| If you define this macro, GAS will call it each time it skips any space because of a |
| space filling or alignment or data allocation pseudo-op. |
| |
| @item TC_PARSE_CONS_EXPRESSION |
| @cindex TC_PARSE_CONS_EXPRESSION |
| You may define this macro to parse an expression used in a data allocation |
| pseudo-op such as @code{.word}. You can use this to recognize relocation |
| directives that may appear in such directives. |
| |
| @item BITFIELD_CONS_EXPRESSION |
| @cindex BITFIELD_CONS_EXPRESSION |
| If you define this macro, GAS will recognize bitfield instructions in data |
| allocation pseudo-ops, as used on the i960. |
| |
| @item REPEAT_CONS_EXPRESSION |
| @cindex REPEAT_CONS_EXPRESSION |
| If you define this macro, GAS will recognize repeat counts in data allocation |
| pseudo-ops, as used on the MIPS. |
| |
| @item md_cons_align |
| @cindex md_cons_align |
| You may define this macro to do any special alignment before a data allocation |
| pseudo-op. |
| |
| @item TC_CONS_FIX_NEW |
| @cindex TC_CONS_FIX_NEW |
| You may define this macro to generate a fixup for a data allocation pseudo-op. |
| |
| @item TC_INIT_FIX_DATA (@var{fixp}) |
| @cindex TC_INIT_FIX_DATA |
| A C statement to initialize the target specific fields of fixup @var{fixp}. |
| These fields are defined with the @code{TC_FIX_TYPE} macro. |
| |
| @item TC_FIX_DATA_PRINT (@var{stream}, @var{fixp}) |
| @cindex TC_FIX_DATA_PRINT |
| A C statement to output target specific debugging information for |
| fixup @var{fixp} to @var{stream}. This macro is called by @code{print_fixup}. |
| |
| @item TC_FRAG_INIT (@var{fragp}) |
| @cindex TC_FRAG_INIT |
| A C statement to initialize the target specific fields of frag @var{fragp}. |
| These fields are defined with the @code{TC_FRAG_TYPE} macro. |
| |
| @item md_number_to_chars |
| @cindex md_number_to_chars |
| This should just call either @code{number_to_chars_bigendian} or |
| @code{number_to_chars_littleendian}, whichever is appropriate. On targets like |
| the MIPS which support options to change the endianness, which function to call |
| is a runtime decision. On other targets, @code{md_number_to_chars} can be a |
| simple macro. |
| |
| @item md_reloc_size |
| @cindex md_reloc_size |
| This variable is only used in the original version of gas (not |
| @code{BFD_ASSEMBLER} and not @code{MANY_SEGMENTS}). It holds the size of a |
| relocation entry. |
| |
| @item WORKING_DOT_WORD |
| @itemx md_short_jump_size |
| @itemx md_long_jump_size |
| @itemx md_create_short_jump |
| @itemx md_create_long_jump |
| @cindex WORKING_DOT_WORD |
| @cindex md_short_jump_size |
| @cindex md_long_jump_size |
| @cindex md_create_short_jump |
| @cindex md_create_long_jump |
| If @code{WORKING_DOT_WORD} is defined, GAS will not do broken word processing |
| (@pxref{Broken words}). Otherwise, you should set @code{md_short_jump_size} to |
| the size of a short jump (a jump that is just long enough to jump around a long |
| jmp) and @code{md_long_jump_size} to the size of a long jump (a jump that can |
| go anywhere in the function), You should define @code{md_create_short_jump} to |
| create a short jump around a long jump, and define @code{md_create_long_jump} |
| to create a long jump. |
| |
| @item md_estimate_size_before_relax |
| @cindex md_estimate_size_before_relax |
| This function returns an estimate of the size of a @code{rs_machine_dependent} |
| frag before any relaxing is done. It may also create any necessary |
| relocations. |
| |
| @item md_relax_frag |
| @cindex md_relax_frag |
| This macro may be defined to relax a frag. GAS will call this with the frag |
| and the change in size of all previous frags; @code{md_relax_frag} should |
| return the change in size of the frag. @xref{Relaxation}. |
| |
| @item TC_GENERIC_RELAX_TABLE |
| @cindex TC_GENERIC_RELAX_TABLE |
| If you do not define @code{md_relax_frag}, you may define |
| @code{TC_GENERIC_RELAX_TABLE} as a table of @code{relax_typeS} structures. The |
| machine independent code knows how to use such a table to relax PC relative |
| references. See @file{tc-m68k.c} for an example. @xref{Relaxation}. |
| |
| @item md_prepare_relax_scan |
| @cindex md_prepare_relax_scan |
| If defined, it is a C statement that is invoked prior to scanning |
| the relax table. |
| |
| @item LINKER_RELAXING_SHRINKS_ONLY |
| @cindex LINKER_RELAXING_SHRINKS_ONLY |
| If you define this macro, and the global variable @samp{linkrelax} is set |
| (because of a command line option, or unconditionally in @code{md_begin}), a |
| @samp{.align} directive will cause extra space to be allocated. The linker can |
| then discard this space when relaxing the section. |
| |
| @item md_convert_frag |
| @cindex md_convert_frag |
| GAS will call this for each rs_machine_dependent fragment. |
| The instruction is completed using the data from the relaxation pass. |
| It may also create any necessary relocations. |
| @xref{Relaxation}. |
| |
| @item md_apply_fix |
| @cindex md_apply_fix |
| GAS will call this for each fixup. It should store the correct value in the |
| object file. @code{fixup_segment} performs a generic overflow check on the |
| @code{valueT *val} argument after @code{md_apply_fix} returns. If the overflow |
| check is relevant for the target machine, then @code{md_apply_fix} should |
| modify @code{valueT *val}, typically to the value stored in the object file. |
| |
| @item TC_HANDLES_FX_DONE |
| @cindex TC_HANDLES_FX_DONE |
| If this macro is defined, it means that @code{md_apply_fix} correctly sets the |
| @code{fx_done} field in the fixup. |
| |
| @item tc_gen_reloc |
| @cindex tc_gen_reloc |
| A @code{BFD_ASSEMBLER} GAS will call this to generate a reloc. GAS will pass |
| the resulting reloc to @code{bfd_install_relocation}. This currently works |
| poorly, as @code{bfd_install_relocation} often does the wrong thing, and |
| instances of @code{tc_gen_reloc} have been written to work around the problems, |
| which in turns makes it difficult to fix @code{bfd_install_relocation}. |
| |
| @item RELOC_EXPANSION_POSSIBLE |
| @cindex RELOC_EXPANSION_POSSIBLE |
| If you define this macro, it means that @code{tc_gen_reloc} may return multiple |
| relocation entries for a single fixup. In this case, the return value of |
| @code{tc_gen_reloc} is a pointer to a null terminated array. |
| |
| @item MAX_RELOC_EXPANSION |
| @cindex MAX_RELOC_EXPANSION |
| You must define this if @code{RELOC_EXPANSION_POSSIBLE} is defined; it |
| indicates the largest number of relocs which @code{tc_gen_reloc} may return for |
| a single fixup. |
| |
| @item tc_fix_adjustable |
| @cindex tc_fix_adjustable |
| You may define this macro to indicate whether a fixup against a locally defined |
| symbol should be adjusted to be against the section symbol. It should return a |
| non-zero value if the adjustment is acceptable. |
| |
| @item MD_PCREL_FROM_SECTION |
| @cindex MD_PCREL_FROM_SECTION |
| If you define this macro, it should return the offset between the address of a |
| PC relative fixup and the position from which the PC relative adjustment should |
| be made. On many processors, the base of a PC relative instruction is the next |
| instruction, so this macro would return the length of an instruction. |
| |
| @item md_pcrel_from |
| @cindex md_pcrel_from |
| This is the default value of @code{MD_PCREL_FROM_SECTION}. The difference is |
| that @code{md_pcrel_from} does not take a section argument. |
| |
| @item tc_frob_label |
| @cindex tc_frob_label |
| If you define this macro, GAS will call it each time a label is defined. |
| |
| @item md_section_align |
| @cindex md_section_align |
| GAS will call this function for each section at the end of the assembly, to |
| permit the CPU backend to adjust the alignment of a section. |
| |
| @item tc_frob_section |
| @cindex tc_frob_section |
| If you define this macro, a @code{BFD_ASSEMBLER} GAS will call it for each |
| section at the end of the assembly. |
| |
| @item tc_frob_file_before_adjust |
| @cindex tc_frob_file_before_adjust |
| If you define this macro, GAS will call it after the symbol values are |
| resolved, but before the fixups have been changed from local symbols to section |
| symbols. |
| |
| @item tc_frob_symbol |
| @cindex tc_frob_symbol |
| If you define this macro, GAS will call it for each symbol. You can indicate |
| that the symbol should not be included in the object file by definining this |
| macro to set its second argument to a non-zero value. |
| |
| @item tc_frob_file |
| @cindex tc_frob_file |
| If you define this macro, GAS will call it after the symbol table has been |
| completed, but before the relocations have been generated. |
| |
| @item tc_frob_file_after_relocs |
| If you define this macro, GAS will call it after the relocs have been |
| generated. |
| |
| @item LISTING_HEADER |
| A string to use on the header line of a listing. The default value is simply |
| @code{"GAS LISTING"}. |
| |
| @item LISTING_WORD_SIZE |
| The number of bytes to put into a word in a listing. This affects the way the |
| bytes are clumped together in the listing. For example, a value of 2 might |
| print @samp{1234 5678} where a value of 1 would print @samp{12 34 56 78}. The |
| default value is 4. |
| |
| @item LISTING_LHS_WIDTH |
| The number of words of data to print on the first line of a listing for a |
| particular source line, where each word is @code{LISTING_WORD_SIZE} bytes. The |
| default value is 1. |
| |
| @item LISTING_LHS_WIDTH_SECOND |
| Like @code{LISTING_LHS_WIDTH}, but applying to the second and subsequent line |
| of the data printed for a particular source line. The default value is 1. |
| |
| @item LISTING_LHS_CONT_LINES |
| The maximum number of continuation lines to print in a listing for a particular |
| source line. The default value is 4. |
| |
| @item LISTING_RHS_WIDTH |
| The maximum number of characters to print from one line of the input file. The |
| default value is 100. |
| @end table |
| |
| @node Object format backend |
| @subsection Writing an object format backend |
| @cindex object format backend |
| @cindex @file{obj-@var{fmt}} |
| |
| As with the CPU backend, the object format backend must define a few things, |
| and may define some other things. The interface to the object format backend |
| is generally simpler; most of the support for an object file format consists of |
| defining a number of pseudo-ops. |
| |
| The object format @file{.h} file must include @file{targ-cpu.h}. |
| |
| This section will only define the @code{BFD_ASSEMBLER} version of GAS. It is |
| impossible to support a new object file format using any other version anyhow, |
| as the original GAS version only supports a.out, and the @code{MANY_SEGMENTS} |
| GAS version only supports COFF. |
| |
| @table @code |
| @item OBJ_@var{format} |
| @cindex OBJ_@var{format} |
| By convention, you should define this macro in the @file{.h} file. For |
| example, @file{obj-elf.h} defines @code{OBJ_ELF}. You might have to use this |
| if it is necessary to add object file format specific code to the CPU file. |
| |
| @item obj_begin |
| If you define this macro, GAS will call it at the start of the assembly, after |
| the command line arguments have been parsed and all the machine independent |
| initializations have been completed. |
| |
| @item obj_app_file |
| @cindex obj_app_file |
| If you define this macro, GAS will invoke it when it sees a @code{.file} |
| pseudo-op or a @samp{#} line as used by the C preprocessor. |
| |
| @item OBJ_COPY_SYMBOL_ATTRIBUTES |
| @cindex OBJ_COPY_SYMBOL_ATTRIBUTES |
| You should define this macro to copy object format specific information from |
| one symbol to another. GAS will call it when one symbol is equated to |
| another. |
| |
| @item obj_fix_adjustable |
| @cindex obj_fix_adjustable |
| You may define this macro to indicate whether a fixup against a locally defined |
| symbol should be adjusted to be against the section symbol. It should return a |
| non-zero value if the adjustment is acceptable. |
| |
| @item obj_sec_sym_ok_for_reloc |
| @cindex obj_sec_sym_ok_for_reloc |
| You may define this macro to indicate that it is OK to use a section symbol in |
| a relocateion entry. If it is not, GAS will define a new symbol at the start |
| of a section. |
| |
| @item EMIT_SECTION_SYMBOLS |
| @cindex EMIT_SECTION_SYMBOLS |
| You should define this macro with a zero value if you do not want to include |
| section symbols in the output symbol table. The default value for this macro |
| is one. |
| |
| @item obj_adjust_symtab |
| @cindex obj_adjust_symtab |
| If you define this macro, GAS will invoke it just before setting the symbol |
| table of the output BFD. For example, the COFF support uses this macro to |
| generate a @code{.file} symbol if none was generated previously. |
| |
| @item SEPARATE_STAB_SECTIONS |
| @cindex SEPARATE_STAB_SECTIONS |
| You may define this macro to indicate that stabs should be placed in separate |
| sections, as in ELF. |
| |
| @item INIT_STAB_SECTION |
| @cindex INIT_STAB_SECTION |
| You may define this macro to initialize the stabs section in the output file. |
| |
| @item OBJ_PROCESS_STAB |
| @cindex OBJ_PROCESS_STAB |
| You may define this macro to do specific processing on a stabs entry. |
| |
| @item obj_frob_section |
| @cindex obj_frob_section |
| If you define this macro, GAS will call it for each section at the end of the |
| assembly. |
| |
| @item obj_frob_file_before_adjust |
| @cindex obj_frob_file_before_adjust |
| If you define this macro, GAS will call it after the symbol values are |
| resolved, but before the fixups have been changed from local symbols to section |
| symbols. |
| |
| @item obj_frob_symbol |
| @cindex obj_frob_symbol |
| If you define this macro, GAS will call it for each symbol. You can indicate |
| that the symbol should not be included in the object file by definining this |
| macro to set its second argument to a non-zero value. |
| |
| @item obj_frob_file |
| @cindex obj_frob_file |
| If you define this macro, GAS will call it after the symbol table has been |
| completed, but before the relocations have been generated. |
| |
| @item obj_frob_file_after_relocs |
| If you define this macro, GAS will call it after the relocs have been |
| generated. |
| @end table |
| |
| @node Emulations |
| @subsection Writing emulation files |
| |
| Normally you do not have to write an emulation file. You can just use |
| @file{te-generic.h}. |
| |
| If you do write your own emulation file, it must include @file{obj-format.h}. |
| |
| An emulation file will often define @code{TE_@var{EM}}; this may then be used |
| in other files to change the output. |
| |
| @node Relaxation |
| @section Relaxation |
| @cindex relaxation |
| |
| @dfn{Relaxation} is a generic term used when the size of some instruction or |
| data depends upon the value of some symbol or other data. |
| |
| GAS knows to relax a particular type of PC relative relocation using a table. |
| You can also define arbitrarily complex forms of relaxation yourself. |
| |
| @menu |
| * Relaxing with a table:: Relaxing with a table |
| * General relaxing:: General relaxing |
| @end menu |
| |
| @node Relaxing with a table |
| @subsection Relaxing with a table |
| |
| If you do not define @code{md_relax_frag}, and you do define |
| @code{TC_GENERIC_RELAX_TABLE}, GAS will relax @code{rs_machine_dependent} frags |
| based on the frag subtype and the displacement to some specified target |
| address. The basic idea is that several machines have different addressing |
| modes for instructions that can specify different ranges of values, with |
| successive modes able to access wider ranges, including the entirety of the |
| previous range. Smaller ranges are assumed to be more desirable (perhaps the |
| instruction requires one word instead of two or three); if this is not the |
| case, don't describe the smaller-range, inferior mode. |
| |
| The @code{fr_subtype} field of a frag is an index into a CPU-specific |
| relaxation table. That table entry indicates the range of values that can be |
| stored, the number of bytes that will have to be added to the frag to |
| accomodate the addressing mode, and the index of the next entry to examine if |
| the value to be stored is outside the range accessible by the current |
| addressing mode. The @code{fr_symbol} field of the frag indicates what symbol |
| is to be accessed; the @code{fr_offset} field is added in. |
| |
| If the @code{TC_PCREL_ADJUST} macro is defined, which currently should only happen |
| for the NS32k family, the @code{TC_PCREL_ADJUST} macro is called on the frag to |
| compute an adjustment to be made to the displacement. |
| |
| The value fitted by the relaxation code is always assumed to be a displacement |
| from the current frag. (More specifically, from @code{fr_fix} bytes into the |
| frag.) |
| @ignore |
| This seems kinda silly. What about fitting small absolute values? I suppose |
| @code{md_assemble} is supposed to take care of that, but if the operand is a |
| difference between symbols, it might not be able to, if the difference was not |
| computable yet. |
| @end ignore |
| |
| The end of the relaxation sequence is indicated by a ``next'' value of 0. This |
| means that the first entry in the table can't be used. |
| |
| For some configurations, the linker can do relaxing within a section of an |
| object file. If call instructions of various sizes exist, the linker can |
| determine which should be used in each instance, when a symbol's value is |
| resolved. In order for the linker to avoid wasting space and having to insert |
| no-op instructions, it must be able to expand or shrink the section contents |
| while still preserving intra-section references and meeting alignment |
| requirements. |
| |
| For the i960 using b.out format, no expansion is done; instead, each |
| @samp{.align} directive causes extra space to be allocated, enough that when |
| the linker is relaxing a section and removing unneeded space, it can discard |
| some or all of this extra padding and cause the following data to be correctly |
| aligned. |
| |
| For the H8/300, I think the linker expands calls that can't reach, and doesn't |
| worry about alignment issues; the cpu probably never needs any significant |
| alignment beyond the instruction size. |
| |
| The relaxation table type contains these fields: |
| |
| @table @code |
| @item long rlx_forward |
| Forward reach, must be non-negative. |
| @item long rlx_backward |
| Backward reach, must be zero or negative. |
| @item rlx_length |
| Length in bytes of this addressing mode. |
| @item rlx_more |
| Index of the next-longer relax state, or zero if there is no next relax state. |
| @end table |
| |
| The relaxation is done in @code{relax_segment} in @file{write.c}. The |
| difference in the length fields between the original mode and the one finally |
| chosen by the relaxing code is taken as the size by which the current frag will |
| be increased in size. For example, if the initial relaxing mode has a length |
| of 2 bytes, and because of the size of the displacement, it gets upgraded to a |
| mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes. |
| (The initial two bytes should have been part of the fixed portion of the frag, |
| since it is already known that they will be output.) This growth must be |
| effected by @code{md_convert_frag}; it should increase the @code{fr_fix} field |
| by the appropriate size, and fill in the appropriate bytes of the frag. |
| (Enough space for the maximum growth should have been allocated in the call to |
| frag_var as the second argument.) |
| |
| If relocation records are needed, they should be emitted by |
| @code{md_estimate_size_before_relax}. This function should examine the target |
| symbol of the supplied frag and correct the @code{fr_subtype} of the frag if |
| needed. When this function is called, if the symbol has not yet been defined, |
| it will not become defined later; however, its value may still change if the |
| section it is in gets relaxed. |
| |
| Usually, if the symbol is in the same section as the frag (given by the |
| @var{sec} argument), the narrowest likely relaxation mode is stored in |
| @code{fr_subtype}, and that's that. |
| |
| If the symbol is undefined, or in a different section (and therefore moveable |
| to an arbitrarily large distance), the largest available relaxation mode is |
| specified, @code{fix_new} is called to produce the relocation record, |
| @code{fr_fix} is increased to include the relocated field (remember, this |
| storage was allocated when @code{frag_var} was called), and @code{frag_wane} is |
| called to convert the frag to an @code{rs_fill} frag with no variant part. |
| Sometimes changing addressing modes may also require rewriting the instruction. |
| It can be accessed via @code{fr_opcode} or @code{fr_fix}. |
| |
| Sometimes @code{fr_var} is increased instead, and @code{frag_wane} is not |
| called. I'm not sure, but I think this is to keep @code{fr_fix} referring to |
| an earlier byte, and @code{fr_subtype} set to @code{rs_machine_dependent} so |
| that @code{md_convert_frag} will get called. |
| |
| @node General relaxing |
| @subsection General relaxing |
| |
| If using a simple table is not suitable, you may implement arbitrarily complex |
| relaxation semantics yourself. For example, the MIPS backend uses this to emit |
| different instruction sequences depending upon the size of the symbol being |
| accessed. |
| |
| When you assemble an instruction that may need relaxation, you should allocate |
| a frag using @code{frag_var} or @code{frag_variant} with a type of |
| @code{rs_machine_dependent}. You should store some sort of information in the |
| @code{fr_subtype} field so that you can figure out what to do with the frag |
| later. |
| |
| When GAS reaches the end of the input file, it will look through the frags and |
| work out their final sizes. |
| |
| GAS will first call @code{md_estimate_size_before_relax} on each |
| @code{rs_machine_dependent} frag. This function must return an estimated size |
| for the frag. |
| |
| GAS will then loop over the frags, calling @code{md_relax_frag} on each |
| @code{rs_machine_dependent} frag. This function should return the change in |
| size of the frag. GAS will keep looping over the frags until none of the frags |
| changes size. |
| |
| @node Broken words |
| @section Broken words |
| @cindex internals, broken words |
| @cindex broken words |
| |
| Some compilers, including GCC, will sometimes emit switch tables specifying |
| 16-bit @code{.word} displacements to branch targets, and branch instructions |
| that load entries from that table to compute the target address. If this is |
| done on a 32-bit machine, there is a chance (at least with really large |
| functions) that the displacement will not fit in 16 bits. The assembler |
| handles this using a concept called @dfn{broken words}. This idea is well |
| named, since there is an implied promise that the 16-bit field will in fact |
| hold the specified displacement. |
| |
| If broken word processing is enabled, and a situation like this is encountered, |
| the assembler will insert a jump instruction into the instruction stream, close |
| enough to be reached with the 16-bit displacement. This jump instruction will |
| transfer to the real desired target address. Thus, as long as the @code{.word} |
| value really is used as a displacement to compute an address to jump to, the |
| net effect will be correct (minus a very small efficiency cost). If |
| @code{.word} directives with label differences for values are used for other |
| purposes, however, things may not work properly. For targets which use broken |
| words, the @samp{-K} option will warn when a broken word is discovered. |
| |
| The broken word code is turned off by the @code{WORKING_DOT_WORD} macro. It |
| isn't needed if @code{.word} emits a value large enough to contain an address |
| (or, more correctly, any possible difference between two addresses). |
| |
| @node Internal functions |
| @section Internal functions |
| |
| This section describes basic internal functions used by GAS. |
| |
| @menu |
| * Warning and error messages:: Warning and error messages |
| * Hash tables:: Hash tables |
| @end menu |
| |
| @node Warning and error messages |
| @subsection Warning and error messages |
| |
| @deftypefun @{@} int had_warnings (void) |
| @deftypefunx @{@} int had_errors (void) |
| Returns non-zero if any warnings or errors, respectively, have been printed |
| during this invocation. |
| @end deftypefun |
| |
| @deftypefun @{@} void as_perror (const char *@var{gripe}, const char *@var{filename}) |
| Displays a BFD or system error, then clears the error status. |
| @end deftypefun |
| |
| @deftypefun @{@} void as_tsktsk (const char *@var{format}, ...) |
| @deftypefunx @{@} void as_warn (const char *@var{format}, ...) |
| @deftypefunx @{@} void as_bad (const char *@var{format}, ...) |
| @deftypefunx @{@} void as_fatal (const char *@var{format}, ...) |
| These functions display messages about something amiss with the input file, or |
| internal problems in the assembler itself. The current file name and line |
| number are printed, followed by the supplied message, formatted using |
| @code{vfprintf}, and a final newline. |
| |
| An error indicated by @code{as_bad} will result in a non-zero exit status when |
| the assembler has finished. Calling @code{as_fatal} will result in immediate |
| termination of the assembler process. |
| @end deftypefun |
| |
| @deftypefun @{@} void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...) |
| @deftypefunx @{@} void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...) |
| These variants permit specification of the file name and line number, and are |
| used when problems are detected when reprocessing information saved away when |
| processing some earlier part of the file. For example, fixups are processed |
| after all input has been read, but messages about fixups should refer to the |
| original filename and line number that they are applicable to. |
| @end deftypefun |
| |
| @deftypefun @{@} void fprint_value (FILE *@var{file}, valueT @var{val}) |
| @deftypefunx @{@} void sprint_value (char *@var{buf}, valueT @var{val}) |
| These functions are helpful for converting a @code{valueT} value into printable |
| format, in case it's wider than modes that @code{*printf} can handle. If the |
| type is narrow enough, a decimal number will be produced; otherwise, it will be |
| in hexadecimal. The value itself is not examined to make this determination. |
| @end deftypefun |
| |
| @node Hash tables |
| @subsection Hash tables |
| @cindex hash tables |
| |
| @deftypefun @{@} @{struct hash_control *@} hash_new (void) |
| Creates the hash table control structure. |
| @end deftypefun |
| |
| @deftypefun @{@} void hash_die (struct hash_control *) |
| Destroy a hash table. |
| @end deftypefun |
| |
| @deftypefun @{@} PTR hash_delete (struct hash_control *, const char *) |
| Deletes entry from the hash table, returns the value it had. |
| @end deftypefun |
| |
| @deftypefun @{@} PTR hash_replace (struct hash_control *, const char *, PTR) |
| Updates the value for an entry already in the table, returning the old value. |
| If no entry was found, just returns NULL. |
| @end deftypefun |
| |
| @deftypefun @{@} @{const char *@} hash_insert (struct hash_control *, const char *, PTR) |
| Inserting a value already in the table is an error. |
| Returns an error message or NULL. |
| @end deftypefun |
| |
| @deftypefun @{@} @{const char *@} hash_jam (struct hash_control *, const char *, PTR) |
| Inserts if the value isn't already present, updates it if it is. |
| @end deftypefun |
| |
| @node Test suite |
| @section Test suite |
| @cindex test suite |
| |
| The test suite is kind of lame for most processors. Often it only checks to |
| see if a couple of files can be assembled without the assembler reporting any |
| errors. For more complete testing, write a test which either examines the |
| assembler listing, or runs @code{objdump} and examines its output. For the |
| latter, the TCL procedure @code{run_dump_test} may come in handy. It takes the |
| base name of a file, and looks for @file{@var{file}.d}. This file should |
| contain as its initial lines a set of variable settings in @samp{#} comments, |
| in the form: |
| |
| @example |
| #@var{varname}: @var{value} |
| @end example |
| |
| The @var{varname} may be @code{objdump}, @code{nm}, or @code{as}, in which case |
| it specifies the options to be passed to the specified programs. Exactly one |
| of @code{objdump} or @code{nm} must be specified, as that also specifies which |
| program to run after the assembler has finished. If @var{varname} is |
| @code{source}, it specifies the name of the source file; otherwise, |
| @file{@var{file}.s} is used. If @var{varname} is @code{name}, it specifies the |
| name of the test to be used in the @code{pass} or @code{fail} messages. |
| |
| The non-commented parts of the file are interpreted as regular expressions, one |
| per line. Blank lines in the @code{objdump} or @code{nm} output are skipped, |
| as are blank lines in the @code{.d} file; the other lines are tested to see if |
| the regular expression matches the program output. If it does not, the test |
| fails. |
| |
| Note that this means the tests must be modified if the @code{objdump} output |
| style is changed. |
| |
| @bye |
| @c Local Variables: |
| @c fill-column: 79 |
| @c End: |