A DESIGN OF ASSEMBLER DATA STRUCTURES [The following (till the line "Assignment 1 ends") is a suitable response to the Assignment 1 given during Sept 2000.] /* The following data structures may be used in a 2-pass assembler - */ /* Mnemonic Table : */ #define MAX_MNE_LEN 6 /* max no of char in a mnemonic opcode */ #define MAX_OPRND 2 /* max no of operands in an instr */ typedef enum mne_class { IS, /* Imperative statement */ DS, /* Storage declaration statement */ AD /* Assembler directive statement */ } mne_class_t; typedef enum oprnd_typ { /* Operand types - depends on the language */ NONE=0, REG, /* Register operand */ NUM, /* Numeric operand */ ADDR /* Address operand */ } oprnd_typ_t; typedef struct opcod { char mne[MAX_MNE_LEN+1]; mne_class_t type; oprnd_typ_t oprnd[MAX_OPRND]; short len; /* no of bytes */ int (*fn)(); /* function corr to Asslr.Dir. */ } opcod_t; extern opcod_t optab[]; /* To be initialised during definition */ extern const int max_opcodes; /* To be initialised during definition as - */ /* (sizeof( optab ) / sizeof( opcod_t )) */ #define MAX_OPCODES (max_opcodes) /* Symbol Table */ #define MAX_SYM_L 15 /* max len of a symbol */ typedef long addr_t; typedef enum { DATA, /* Symbol represents a data addr */ CODE, /* Symbol represents a code addr */ OTHER } obj_typ_t; typedef struct symbol { char str[MAX_SYM_L+1]; addr_t address; obj_typ_t type; short size; /* valid only for type==DATA */ } symbol_t; #define MAX_SYMBOLS 50 extern symbol_t symtab[MAX_SYMBOLS]; /* Symbol table ! */ extern short n_sym; /* no of syms in symtab : idx for next */ /* Literals Table */ typedef enum { NUM_1B, /* Single byte number */ NUM_2B, /* 2-byte number */ NUM_4B, /* 4-byte number */ STRING /* String */ } lit_type_t; #define MAX_LIT_L 20 /* a literal string can be upto 20 bytes */ typedef struct literal { lit_type_t type; char value[MAX_LIT_L+1]; addr_t address; } literal_t; #define MAX_LITERALS 50 #define MAX_LIT_POOLS 20 extern literal_t littab[MAX_LITERALS]; /* Literals table ! */ extern int pooltab[MAX_LIT_POOLS]; /* Literal pools table ! */ extern short n_lit, /* no of literals : index for next */ n_pool; /* no of lit pools : index for next */ /* Intermediate code record */ typedef struct oprnd { oprnd_typ_t type; int value; /* Reg #, symtab/littab/pooltab idx, value */ short size; } oprnd_t; typedef struct icr { short opcod_idx; /* index to optab */ oprnd_t oprnds[MAX_OPRND]; } icr_t; /* * Intermediate code records may be prepared for each instruction in a buffer * and then written to an intermediate-code-file. */ /* Location Counter */ extern short loc_cntr; Usage of the Above Data Structures - ==================================== 1. Mnemonics Table optab[] : Initialisation - The optab is initialised statically, i.e., during definition. It is filled with the opcode mnemonics of the assembly language. It is not changed when the assembler works. Reference - During pass-I of the assembler, optab[] is accessed to determine the attributes of the mnemonic opcode found in an assembly statement. If the lexical analyser is constructed using a tool such as lex, then the lexical analyser can be made to directly tell the position of an opcode in the table. Otherwise, the position may be determined by performing a search on the mne field of the records. In that case, it is useful to have the entries of the optab sorted beforehand. In pass-II optab is accessed to determine the machine opcodes corresponding to the imperative statements in the intermediate code. 2. Symbol Table symtab[] : Pass I - The symbol table is built up during pass-I of the assembler. The number of entries in symtab at any time is indicated by the value in the variable n_sym which is initially set to 0 and incremented upon making each new entry in symtab. Whenever a symbol is encountered in the program text, it is searched in the symtab. If the current occurrence of the symbol is its definition and it is not already there in the symbol table, then a new entry is made in the symtab specifying all the fields, including the location counter value as the address of the symbol. If the symbol denotes data the size if known from the statement. If the symbol denotes code address, the size field may be left uninitialised. If the current occurrence of the symbol is its definition and the symbol already has an entry in the symtab, then the address field of the entry is updated using the current value of the location counter. Also the size field may be updated according to the current statement defining the symbol. On the other hand, if the current occurrence of the symbol is only a reference, and it not already present in symtab, a new entry is made in symtab without specifying the address field (since the address will be known only when it is defined later). If the current occurrence of the symbol is only a reference, and it is already present in symtab the entry need not be updated. In both these cases of symbol reference, the index of the symbol's entry in the symtab is used in the Intermediate code that is being produced. Pass II - In intermediate code produced in pass-I contains indices of various entries of the symtab. In pass-II, while producing the target code the symtab references in the intermediate code are replaced by the address field contents of the corresponding entries in the symtab. 3. Literals Table littab[] and pooltab[] : Pass I - The literals table littab[] is built up during pass-I of the assembler. The number of entries in littab at any time is indicated by the value in the variable n_lit which is initially set to 0 and incremented upon making each new entry in littab. Whenever a literal is encountered in the program text, it is entered in the littab and the position (index) of the entry in the littab is recorded for the operand in the intermediate code. The number of literal pools is indicated by the value in the variable n_pool which is initially set to 0 (one less than the number of pools). Upon encountering an LTORG or the END statement addresses are assigned to each literal in the littab that are not yet assigned addresses, i.e., the current pool. The value of the location counter (loc_cntr) is used as the address of each such literal and this value is incremented according to the size (number of bytes to be occupied by each literal) of each. The variable n_pool is incremented to indicate that one pool is over. When the value n_pool is set to 0 and subsequently whenever it is incremented) to indicate the end of a pool, the entry pooltab[n_pool] is set to the current value of n_lit. Thus the entries of pooltab contains the starting positions of each literal pool. Pass II - In intermediate code produced in pass-I contains indices of various entries of the littab. In pass-II, while producing the target code the littab references in the intermediate code are replaced by the address field contents of the corresponding entries in the littab. 4. Intermediate Code : Pass I - Intermediate code is produced during pass-I. In the intermediate code, for each source statement, the mnemonic opcode is replaced by the opcode's entry number in the optab[]. For imperative statements, the symbols or literals used as operands are replaced by their corresponding entry numbers in the symtab[] and littab[] respectively. No label field is present corresponding to the source text. The intermediate code record are of fixed sizes and contains room for details of MAX_OPRND (two) number of operands. If a statement has fewer operands, the "type" field of the unused operands is set as NONE. Intermediate code record prepared for each source statement is written to an intermediate code file. Pass II - Intermediate code records from the file are read one at a time in pass-II and target code is produced. The variable loc_cntr is initialised to 0 in this pass and incremented according to the size of each target statement produced. This variable is also updated according to the operand of START and ORIGIN statements. A variable pool_idx (literal pool table index) is initialised to 0 and incremented for each LTORG statement encountered in the intermediate code. The value of pool_idx denotes the current literal pool whose starting and ending littab indices are pooltab[pool_idx] and (pooltab[pool_idx+1]-1). Against an LTORG or END statement in the intermediate code the actual representation of the literals in the current pool is produced as the corresponding target code. Apart from the literals, there is no direct representation of the Assembler directive statements in the target code. The target code produced against each intermediate code statement is written to a target code file. ----------------- Assignment 1 ends -----------------