Assembler(1)

Assembler(1)_1

아무 말하는 감자 2022. 9. 22. 21:35

Fig. 2.2: Program from Fig. 2.1 with Object Code

• The same program as in Fig. 2.1, with the generated object code for
each statement
– ‘Loc’ column gives the machine address (in hexadecimal) for each part of the assembled program
 The program is assumed to start at address 1000
– Several functions required to translate source program -> object codes:
1) Convert mnemonic operation codes to their machine language equivalents
 [e.g., translate STL to 14 (line 10)]
2) Convert symbolic operands to their equivalent machine addresses

 [e.g.,translate RETADDR to 1033 (line 10)]
3) Build the machine instructions in the proper format (w/ addressing mode)
4) Convert data constants specified in the source program into their internal machine representations

 [e.g., translate EOF to 454F46 (line 80)]
5) Write the object program and the assembly listing file (: 어떻게 프로그램이 번역되는지 쉽게 알 수 있음)

 [e.g., Fig. 2.3 shows the object program and the assembly listing is similar to Fig.2.2]

left: assembly listing file / right: object program

» An assembly listing file is used for checking how the program is translated.

Fig. 2.2 (Continued)
• All of the functions above can easily be accomplished by sequential processing of the source program, one line at a time.

However, the second function for translating addresses presents a problem!
– Consider the statement <line 10> -> This instruction contains a forward reference

• Because of this, most assemblers make two passes over the source program:
– The 1st pass just scans the source program for label definitions while
assigning addresses to the labels
– The 2nd pass performs most of the actual translation previously described 16

->called two pass algorithm!

Fig. 2.3: Object Program corresponding to Fig. 2.2
• The assembler must write the generated object code into some output
devices (i.e., storage), and then this object program will later be loaded
into memory for execution
• The simple object program format used contains 3 types of records:

– Header record contains the program name, starting address, and length
– Text records contain the translated instructions and data of the program,
together with an indication of the addresses where these are to be loaded
– End record marks the end of the object program and specifies the address
in the program where execution is to begin

Three Types of Records

-? 4바이트?

• Header record
– Col. 1 H
– Col. 2-7 Program name
– Col. 8-13 Starting address of object program (hex)
– Col. 14-19 Length of object program in bytes (hex) (전체 오브젝트 코드의 len)

• Text record (several object code chunk)
– Col. 1 T
– Col. 2-7 Starting address for object code in this record (hex)
– Col. 8-9 Length of object code in this record in bytes (hex)
– Col. 10-69 Object code, represented in hex (2 columns per byte of object
code)

• End record
– Col.1 E
– Col.2-7 Address of first executable instruction in object program (hex)

• The symbol ^ is used to separate fields visually
– Of course, such symbols are not present in the actual object program

• Note that there is no object code corresponding to addresses 1033 - 2038. This storage is simply reserved by the loader for use by the
program during execution.

(이 부분은 RESW와 RESB가 있는 부분, 이 Directive들은 memory allocation을 위한 것이므로, memory allocation은 Assembler가 아닌 Loader가 하는 일이다. 따라서, Assembler에서 딱히 objecct code를 생성하지 않는다...)

The Object File for UNIX (1/2)
• The object file for UNIX systems typically contains 6 distinct pieces:
– Header, describing the size & position of the other 5 pieces of the object file (다른 파트들의 정보)
– Text Segment, containing the machine language code (번역된 머신 코드들)
– Static Data Segment, containing data allocated for the life of the program (프로그램 데이터, 변수들이 저장됨)
 UNIX allows programs to use both static data (allocated throughout the program)
and dynamic data (which can grow or shrink as needed by the program)

? Stack과 Dynamic data는 Heap구조, Stack에는 Local variable등이 저장된다. Static data는 memory 할당 관련, Text는 Text Segment가 저장됨 ?

The Object File for UNIX (2/2)
• The object file for UNIX systems typically contains 6 distinct pieces:
– Relocation Information, identifying instructions and data words that depend
on absolute addresses when the program is loaded into memory

(Loader가 allocation 할때 필요한 정보)
– Symbol Table, containing <lable – address> information about function and
global variables

(mapping table, obj 파일에 포함 X, )
– Debugging Information, containing symbolic information so that a
debugger can associate machine instructions with C source files
 A concise description of how the program modules were compiled

2-pass Assembler (1/2)
• Pass 1: Define symbols

: Symbol 정의 후 Symbol Table을 만드는 것이 목표
– Read a statement in a line of assembly code
– Assign an address to this statement: increasing the address by N (word
addressing or byte addressing) using LOCCTR (location counter, 다음 주소를 저장하는 전역 변수 같은 것,N은 기본적으로 byte단위)
– Save address values assigned to all labels (in symbol tables) for use in pass 2

(Assembler construct entire symbol table)
– Perform some processing of assembler directives, such as constant declaration or space reservation

(like RESW,RESB)
 This includes processing that affects address assignment, such as determining
the length of data areas defined by BYTE, RESW, etc. (address assingnment에 영향을 주는 directive들)

=> Pass1 usually writes an intermediate file that contains each source (중간 파일이 만들어짐, p2에서 사용)
statement together with its assigned address, error indicators, etc.

2-pass Assembler (2/2)
• Pass 2: Assemble instructions & generate object program
– Read in a line of code: Intermediate file created in the 1st pass is used as
the input to the 2nd pass
– Translate operation code, using OP Code Table (mapping information)
– Change labels to addresses, using Symbol Table
– Perform processing of assembler directives not done during pass 1 (pass1에서 처리안된 directive 처리)
_ Produce object program

Two Data Structures of SIC Assembler (1/2)
• Operational Code Table (OPTAB)– Contain mnemonic op code and its machine language equivalent.
 Also, it may contain variable-sized instruction format and length (for more complex
assemblers)

-> OP CODE와 그에 상응하는 ML을 가지고 있음.

– In Pass 1, it is used to look up and validate mnemonic codes.
– In Pass 2, used to translate the op codes to machine language.

– Usually organized as a hash table : read-only라 빠른 성능을 위해
 Key: mnemonic code
 Fast retrieval with a minimum of searching
 Once prepared, the OPTAB is not changed.

-> 어셈블리어 (ex. LDA #5)를 어셈블러를 통해 ML로 변환.

Two Data Structures of SIC Assembler (2/2)
• Symbol Table (SYMTAB)
– Include name and value (address) for each label in the
source program, together with flags (error condition).

-> LABEL과 LABEL의 주소를 mapping한 Table

+ Also, it may contain other information such as a type or a
length.

– In Pass 1, labels are entered into SYMTAB with their
assigned addresses (from LOCCTR : location counter).
– In Pass 2, symbols used as operands are looked up in SYMBOL to obtain the addresses to be inserted in the

assembled instructions.

– Usually hash table : 효율적인 삽입을 위함
 efficient insertion and retrieval are needed.

Let's study with cdoe!

- pass1 Algorithm

- pass2 Algorithm

저작자표시 비영리 동일조건 (새창열림)