The TM Machine Specification
Introduction
TM is a simple target machine. Kenneth Louden created Tiny Machine for his textbook, Compiler Construction: Principles and Practice. TM has an architecture and instruction set complex enough to illustrate the important issues faced when writing a compiler, yet simple enough not to distract us with unnecessary details.
Architecture
TM provides two kinds of memory:
- instruction memory, which is read-only
- data memory
Memory addresses are non-negative integers. When the machine is started, all data memory is set to 0, except for the first memory location. That location contains the value of the highest legal address.
We use an extended version of the TM interpreter that accepts command-line arguments to the TM program and stores them in memory locations 1 through n, where n is the number of command-line arguments.
TM provides eight registers, numbered 0 through 7. Register 7 is the program counter. The other seven registers are available for program use. When the machine is started, all registers are set to 0.
When the machine is started, after memory and registers have been initialized, TM begins execution of the program beginning in the first location of instruction memory. The machine follows a standard fetch-execute cycle:
- fetch the current instruction from the address indicated by the program counter
- increment the program counter
- execute the instruction
The loop terminates when it reaches a HALT
instruction or when an error occurs. TM has three native
error conditions:
-
IMEM_ERR, which occurs in the fetch step whenever the address of the next instruction to be executed is out of bounds -
DMEM_ERR, which occurs in the execute step whenever the address of a memory access is out of bounds -
ZERO_DIV, which occurs in the execute step whenever the divisor to aDIVis zero
Instruction Set
TM provides two kinds of instructions: register-only and register-memory.
Register-Only (RO) Instructions
Register-only (RO) instructions are of the form:
opcode r1,r2,r3
where the ri are legal registers. These
are the RO opcodes:
-
IN— read an integer from stdin and place result inr1; ignore operandsr2andr3 -
OUT— write contents ofr1to stdout; ignore operandsr2andr3 -
ADD— add contents ofr2to contents ofr3and place result inr1 -
SUB— subtract contents ofr3from contents ofr2and place result inr1 -
MUL— multiply contents ofr2and contents ofr3and place result inr1 -
DIV— divide contents ofr2by contents ofr3and place result inr1 -
HALT— ignore operands and terminate the machine
Register-Memory (RM) Instructions
Register-memory (RM) instructions are of the form:
opcode r1,offset(r2)
where the ri are legal registers and
offset is an integer offset.
offset may be negative.
With the exception of the LDC instruction, the
expression offset(r2) is used to compute the
address of a memory location:
address = (contents of r2) + offset
There are four RM opcodes for memory manipulation:
-
LDC— place the constantoffsetinr1; ignorer2 -
LDA— place the addressaddressinr1 -
LD— place the contents of data memory locationaddressinr1 -
ST— place the contents ofr1in data memory locationaddress
There are six RM opcodes for branching. If the value of
r1 satisfies the opcode's condition, then
branch to the instruction at instruction memory location
address.
JEQ— equal to 0JNE— not equal to 0JLT— less than 0JLE— less than or equal to 0JGT— greater than 0JGE— greater than or equal to 0
Notes
All arithmetic is done with registers (not memory locations) and on integers. Floating-point numbers must be simulated in the run-time system.
There are no restrictions on the usage of registers. For example, the source and target registers for an operation an be the same.
This is also true of the program counter, Register 7. For example:
-
To branch unconditionally to an instruction, a program
can load the target address into the PC using an
LDAinstruction. -
To branch unconditionally to an instruction whose address
is stored in data memory, a program can load the target
address into the PC using an
LDinstruction. -
To branch conditionally to an instruction whose address is
relative to the current position in the program, a program
can use the PC as
r2in any of theJxxinstructions.
The TM Simulator
We do not have a hardware realization of the TM architecture. We do have a TM virtual machine, implemented as a C program. This program accepts assembly language programs written for TM and executes them according to the machine's specification.
We have corrected an unsafe operation in the original TM simulator from Louden and extended it to:
- accept command-line arguments,
- run TM files without human interaction, and
- report execution times.
You can download the corrected and extended TM simulators, along with sample programs, as a zip file or as individual files.
Input to the VM
The VM accepts a text file as a program to execute.
Each statement in a program consists of a line number, a colon, an assembly language instruction, and an optional comment.
[line number]: [instruction] [comment]For example:
5: SUB 0,0,2 r0 = r0 - r2
The program command may not contain a tab character.
As noted in the README file, the VM limits the names of TM
source files to 20 characters. You can change this
limit by changing the FILE[20] declaration on
line 123 of the tm.c source file.
Interaction with the VM
Invoke the virtual machine with the name of a TM assembly
language program as an argument. If the filename does
not have an extension, the simulator assumes .tm.
The simulator then requests a command. The basic commands for running the program are:
-
g— runs the assembly language program. This will execute the program until it reaches aHALTinstruction. -
s n— steps through the execution of the next n instructions. n defaults to 1. -
c— clears the simulator, so that the program can be run fresh.
Several other commands accepted by the simulator provide rudimentary debugging capabilities:
-
p— toggles the printing of the number of instructions executed for eachgrun. -
t— toggles the printing of an instruction trace for eachgrun. -
r— prints the current contents of the registers. -
i loc— prints the contents of the instruction memory loc, which defaults to 0. You may give a second argument n to print n instructions at one time. -
d loc— prints the contents of the data memory loc, which defaults to 0. Again, you may give a second argument n to print n data locations at one time.
Finally are these commands:
-
h— prints a list of all the commands accepted by the simulator. -
q— quits the simulator.
Command-Line Arguments
We use a version of the TM VM that is identical to the machine described in Louden's textbook, with one exception. Our simulator has been extended to accept command-line arguments to assembly-language programs. These arguments are placed by the VM at the base of the data memory.
For example, we can invoke the TM VM as follows:
office > tm factorial-cli.tm 10 TM simulation (enter h for help)... Enter command: g OUT instruction prints: 3628800 HALT: 0,0,0 Halted
This instruction loads the command-line argument 10 into register 0:
2: LD 0,1(0) ; loads arg from DMEM location 1
If user provides multiple command-line arguments, they will be placed in consecutive data memory locations beginning at location 1.
Note: If a TM program expects n command-line arguments, then the program should not place any static data objects in the first n spots of data memory.