CS 4550 TM Spec

The TM Machine Specification

Introduction

TM is a simple target machine. Kenneth Louden created Tiny Machine for his textbook, Compiler Construction: Principles and Practice. TM has an architecture and instruction set complex enough to illustrate the important issues faced when writing a compiler, yet simple enough not to distract us with unnecessary details.

Architecture

TM provides two kinds of memory:

instruction memory, which is read-only
data memory

Memory addresses are non-negative integers. When the machine is started, all data memory is set to 0, except for the first memory location. That location contains the value of the highest legal address.

We use an extended version of the TM interpreter that accepts command-line arguments to the TM program and stores them in memory locations 1 through n, where n is the number of command-line arguments.

TM provides eight registers, numbered 0 through 7. Register 7 is the program counter. The other seven registers are available for program use. When the machine is started, all registers are set to 0.

When the machine is started, after memory and registers have been initialized, TM begins execution of the program beginning in the first location of instruction memory. The machine follows a standard fetch-execute cycle:

fetch the current instruction from the address indicated by the program counter
increment the program counter
execute the instruction

The loop terminates when it reaches a HALT instruction or when an error occurs. TM has three native error conditions:

IMEM_ERR, which occurs in the fetch step whenever the address of the next instruction to be executed is out of bounds
DMEM_ERR, which occurs in the execute step whenever the address of a memory access is out of bounds
ZERO_DIV, which occurs in the execute step whenever the divisor to a DIV is zero

Instruction Set

TM provides two kinds of instructions: register-only and register-memory.

Register-Only (RO) Instructions

opcode r1,r2,r3

where the ri are legal registers. These are the RO opcodes:

IN — read an integer from stdin and place result in r1; ignore operands r2 and r3
OUT — write contents of r1 to stdout; ignore operands r2 and r3
ADD — add contents of r2 to contents of r3 and place result in r1
SUB — subtract contents of r3 from contents of r2 and place result in r1
MUL — multiply contents of r2 and contents of r3 and place result in r1
DIV — divide contents of r2 by contents of r3 and place result in r1
HALT — ignore operands and terminate the machine

Register-Memory (RM) Instructions

opcode r1,offset(r2)

where the ri are legal registers and offset is an integer offset. offset may be negative.

With the exception of the LDC instruction, the expression offset(r2) is used to compute the address of a memory location:

address = (contents of r2) + offset

There are four RM opcodes for memory manipulation:

LDC — place the constant offset in r1; ignore r2
LDA — place the address address in r1
LD — place the contents of data memory location address in r1
ST — place the contents of r1 in data memory location address

There are six RM opcodes for branching. If the value of r1 satisfies the opcode's condition, then branch to the instruction at instruction memory location address.

JEQ — equal to 0
JNE — not equal to 0
JLT — less than 0
JLE — less than or equal to 0
JGT — greater than 0
JGE — greater than or equal to 0

Notes

All arithmetic is done with registers (not memory locations) and on integers. Floating-point numbers must be simulated in the run-time system.

There are no restrictions on the usage of registers. For example, the source and target registers for an operation an be the same.

This is also true of the program counter, Register 7. For example:

To branch unconditionally to an instruction, a program can load the target address into the PC using an LDA instruction.
To branch unconditionally to an instruction whose address is stored in data memory, a program can load the target address into the PC using an LD instruction.
To branch conditionally to an instruction whose address is relative to the current position in the program, a program can use the PC as r2 in any of the Jxx instructions.

The TM Simulator

We do not have a hardware realization of the TM architecture. We do have a TM virtual machine, implemented as a C program. This program accepts assembly language programs written for TM and executes them according to the machine's specification.

We have corrected an unsafe operation in the original TM simulator from Louden and extended it to:

accept command-line arguments,
run TM files without human interaction, and
report execution times.

You can download the corrected and extended TM simulators, along with sample programs, as a zip file or as individual files.

Input to the VM

The VM accepts a text file as a program to execute.

Each statement in a program consists of a line number, a colon, an assembly language instruction, and an optional comment.

[line number]: [instruction] [comment]

For example:

5: SUB 0,0,2     r0 = r0 - r2

The program command may not contain a tab character.

As noted in the README file, the VM limits the names of TM source files to 20 characters. You can change this limit by changing the FILE[20] declaration on line 123 of the tm.c source file.

Interaction with the VM

Invoke the virtual machine with the name of a TM assembly language program as an argument. If the filename does not have an extension, the simulator assumes .tm.

The simulator then requests a command. The basic commands for running the program are:

g — runs the assembly language program. This will execute the program until it reaches a HALT instruction.
s n — steps through the execution of the next n instructions. n defaults to 1.
c — clears the simulator, so that the program can be run fresh.

Several other commands accepted by the simulator provide rudimentary debugging capabilities:

p — toggles the printing of the number of instructions executed for each g run.
t — toggles the printing of an instruction trace for each g run.
r — prints the current contents of the registers.
i loc — prints the contents of the instruction memory loc, which defaults to 0. You may give a second argument n to print n instructions at one time.
d loc — prints the contents of the data memory loc, which defaults to 0. Again, you may give a second argument n to print n data locations at one time.

Finally are these commands:

h — prints a list of all the commands accepted by the simulator.
q — quits the simulator.

Command-Line Arguments

We use a version of the TM VM that is identical to the machine described in Louden's textbook, with one exception. Our simulator has been extended to accept command-line arguments to assembly-language programs. These arguments are placed by the VM at the base of the data memory.

For example, we can invoke the TM VM as follows:

office > tm factorial-cli.tm 10
TM simulation (enter h for help)...
Enter command: g
OUT instruction prints: 3628800
HALT: 0,0,0
Halted

This instruction loads the command-line argument 10 into register 0:

2:     LD  0,1(0)    ; loads arg from DMEM location 1

If user provides multiple command-line arguments, they will be placed in consecutive data memory locations beginning at location 1.

Note: If a TM program expects n command-line arguments, then the program should not place any static data objects in the first n spots of data memory.