Session 25
Techniques for Generating Target Code
Opening Exercise
Last time, we began our study of how to generate target code and saw the idea of a code template for a generic 3AC statement. We can build an entire code generator using a set of templates like that one.
Expressions that cause branches in execution create more
challenges for a code generator. Let's write TM code for an
if expression and consider some of those
challenges.
if (n < 10)
1
else
n * n
Let's assume that:
-
Argument
nis stored in a stack frame at memory location 105. - Register 5 is a status pointer that points at the first slot after the arguments, memory location 106.
-
Temporary variables
t1,t2,t3, ... have been allocated for use in the code and are stored in the same stack frame at memory locations 113, 114, 115, ..., respectively. - Following TM idiom, Register 0 holds a 0.
A Candidate Solution
First, here is a three-address code version of the program:
t1 := n t2 := 10 t3 := t1 < t2 IF_NOT t3 GOTO label1 ;; inverted so that 'then' comes next t4 := 1 GOTO label2 LABEL label1 t5 := n t6 := n t4 := t1 * t2 LABEL label2
Notice two things here:
-
As we discussed in our earlier discussions of three-address
code, we store the value of the
ifexpression in a temporary variable, for use within whatever expression contains it.t3was next in line, so I selected it and assign a value to it in both clauses. -
I use a straight
IFoperator, which requires either (1) inverting the 'then' and 'else' blocks or (2) inverting the test condition. This time, I chose the latter.
From here, we can generate the TM assembly code:
3: LD 1,-1(5) ;; n is at LOC 105 4: LDC 2,10(0) ;; we need 10 in a register 5: SUB 3,1,2 ;; now we can compare r3 to 0 and branch 6: JGE 3,3(7) ;; select inverted Jxx instruction * ;; and branch over the then clause * then clause * 7: LDC 4,1(0) ;; load value of the then clause into r4 8: ST 4,10(5) ;; store r4 into memory slot for t4 9: LDA 7,4(7) ;; and branch over else clause * * else clause * 10: LD 1,-1(5) ;; reload n from LOC 105 11: LD 2,-1(5) ;; reload n from LOC 105 12: MUL 4,1,2 ;; compute value of else clause into r4 13: ST 4,10(5) ;; store r4 into memory slot for t4
I embedded this in code that set up our assumption and printed the result. It's usually a good idea to make sure that code works as designed!
By using a Register 5 as a status pointer into the stack frame, the address computations in lines 2, 3, 8, 10, 11, and 13 are relative to that pointer. If not, you could code them with absolute addresses using register 0: 105(0), ....
The choice not to store the temp vars t2 and
t2 to memory works fine in a bit of code this
small, but it will not always work well in general, when we
might run out of registers. The simplest code templates store
the value of every temporary into its stack slot as
soon as the program computes it, and then reload it from memory
on its next use. This approach creates longer — and
slower — TM code. But the code generator is simpler.
Creating templates and applying them verbatim makes this
process more straightforward — and more verbose —
than writing assembly language by hand. Even so, there are
simple techniques we can use when writing a code generator that
produce more efficient. For example, if Register 0 always holds
a 0, why ever load 0 into a register? Why load n
into two separate registers, in Instructions 10-11?
Generating Target Code with Branches
Computing t3 creates our first problem:
TM assembly language
does not have a generic < operation. All
of its branching op codes compare a value in a register to
zero. How can we compare t1 to t2
then? We can use a simple transformation:
t1 < t2
→
t1 - t2 < 0
... that converts a comparison of two numbers into a comparison of one number to zero. Now we can write:
| 3AC STATEMENT | CODE GENERATED |
|---|---|
t3 := t1 < t2 |
SUB 3, 1, 2 |
IF t3 GOTO label1 |
JGE 3, ?(7) |
But that still leaves us with a problem: how can the code
generator know to replace of the ? with a 3?
Generating Jump Targets
How can the code generator know how many instructions to skip? This is a general problem that occurs any time the compiler generates code for a boolean expression, a conditional expression, a function call, or any other expression for which it does not yet know the jump target. The problem is further complicated for us, because TM assembly language does not have labels — so we cannot generate a "jump to location labeled label1" instruction!
(This problem can also occur when the compiler generates code "out of order", say, by generating code for local chunks of the AST independently and then stitching the pieces together at the end.)
One solution is to generate code in two passes. The first pass generates the framework of the target program. The second pass computes the jumps and labels. This approach simplifies the compiler's task but makes it less efficient.
Another approach makes only a single pass over the AST. When the compiler generates a branch whose jump target is undefined, it adds the statement to a list of "jumps to be completed". As soon as the correct label is known, the compiler fills in the slot and removes the statement from the to-do list. This technique is known as backpatching.
A trivial sort of backpatching is to generate target code statements out of order. The TM virtual machine supports this approach, because statements in a TM program are labeled by their integer position in the program. Even if a TM source file is "out of order", the statements are put into the correct order at load time.
We can take advantage of this feature of TM assembly to
backpatch the target of our if expression's jump.
The code generator can wait to write the JGE instruction until
it has generated all of the statements in the then
clause, counting the number of statements generated as it goes.
Then, when it generates the jump statement, it knows exactly
the number of instructions to skip.
So, the generator will produce two assembly statements for the body of the clause, and then generate the jump statement to skip over them:
| 3AC STATEMENT | CODE GENERATED |
|---|---|
IF_NOT t3 GOTO label1 |
JGE 3, 3(7) |
Recall that the TM virtual machine increments the PC upon loading an instruction, so we add 3 to the PC to jump from Statement 6 to Statement 10.
The code produced by the generator would actually look like this:
7: LDC 4,1(0) ;; load value of the then clause into r4 8: ST 4,10(5) ;; store r4 into memory slot for t3 9: LDA 7,4(7) ;; and branch over else clause 6: JGE 3,3(7) ;; select inverted Jxx instruction
Simple, but nice. This approach works for boolean computations
as well as control structures, such as (a or b).
There is one more complication with boolean expressions. In
most programming languages, boolean operators use short-circuit
evaluation. This means that if a evaluates to
true, then the code for the boolean expression must skip the
evaluation of b and the subsequent disjunction.
Klein works this way, so your code templates for the
and and or operators will have to
take that into account.
Backpatching Jump Targets
That's all great, but how can our code generator implement this idea?
Let's take a closer look at the jumps in Instructions 6 and 9 in our opening exercise. Here are the 3AC and assembly code side by side:
t1 := n 3: LD 1,-1(5)
t2 := 0 4: LDC 2,10(0)
t3 := t1 < t2 5: SUB 3,1,2
IF_NOT t3 GOTO label1 6: JGE 3,3(7)
t4 := 1 7: LDC 4,1(0)
8: ST 4,10(5)
GOTO label2 9: LDA 7,4(7)
LABEL label1
t5 := n 10: LD 1,-1(5)
t6 := n 11: LD 2,-1(5)
t4 := t5 * t6 12: MUL 4,1,2
13: ST 4,10(5)
LABEL label2 14: [next instruction]
It's easy to see that the jumps in Instructions 6 and 9 are correct. But how can our code generator produce this code? It doesn't know the size of the THEN or ELSE blocks at the time we come to those lines in the generation process.
As we've just seen, a common solution to the jump target issue is "backpatching": maintaining a list of "jumps to be completed" and fill the target into the generated code as soon as you know the actual target. The TM virtual machine supports a simple form of backpatching: numbered assembly language statements. Our compiler can generate target code statements out of order!
Let's see how that might look in our exercise code. After we
compute n - 10, we are ready to generate a jump:
IF_NOT t3 GOTO label1 6: JGE 3,???(7)
We know that the jump occurs in Instruction 6 of the program
and that it goes to label1, but we don't know what
line of code label1 corresponds to yet. So we save
what we know in a data structure:
jumps_to_complete = [ (6, label1, (JGE 3)) ]
and proceed.
Next, we generate code for the assignment statement and are ready to generate a jump:
t4 := 1 7: LDC 4,1(0)
8: ST 4,10(5)
GOTO label2 9: LDA 7,???(7)
Again, we know the number of the jump instruction, 9, and the
target, label2, but we don't know the line number
for the label yet. So we save what we know:
jumps_to_complete = [ (6, label1, (JGE 3)), (9, label2, unconditional) ]
and proceed.
Next we see a label:
LABEL label1
We now know that label1 corresponds to
Instruction 10 in our program. Let's record the new
information and proceed.
label_data = { label1 : 10 }
We generate several lines of code for the else
clause:
t5 := n 10: LD 1,-1(5)
t6 := n 11: LD 2,-1(5)
t4 := t5 * t6 12: MUL 4,1,2
13: ST 4,10(5)
before running into our next label:
LABEL label2 14: [next instruction]
We now know that label2 corresponds to
Instruction 14 in the program, so record that detail in our
label table:
label_data = { label1 : 10, label2 : 14 }
That is the end of our expression. Now we generate the jumps.
The process is straightforward: iterate through
jumps_to_complete, using information there to
build the instruction and looking up the line number of the
label in label_data. The differences between the
jumps and their targets are 4 and 5, respectively, but remember:
the TM VM advances the PC at the moment it loads an instruction.
So our jump gaps are always one smaller than the difference:
IF_NOT t3 GOTO label1 6: JGE 3,3(7) GOTO label2 9: LDA 7,4(7)
And there you have it!
There are many different ways to implement this pattern. For instance, we could have generated jumps as soon as we knew each label's line number. I like the clean style of separating body generation from jump generation in TM, but that's a stylistic preference.
Like so many things in compiler construction, and computer science more generally, this doesn't seem so mysterious after we see it work. At one level, it's all data structures and algorithms.
Now, onto algorithms for generating code and selecting registers.
The Basic Algorithm for Code Generation
At its heart, generating target code comes down to producing one or more target instructions for each node in the AST or for each statement in the intermediate representation. The basic algorithm for code generation will be the same for most operations. Control flow operators are bit different.
Your code generator for Klein must take into account the features of your target machine, TM. We saw in our opening exercise one of the features of TM that most affects your code generator: All arithmetic in TM is done using register-only instructions.
Here is an algorithm that meets our needs:
x := y op z:
-
Decide where to store the value of
y op z.-
If
xis already in a register, then use that register. -
Else if
yis already in a register, then use that register. -
Otherwise, invoke the function
getRegister()to get a free register, and moveythere.
-
If
-
Look to see where
yis currently stored. If it is not already in a register, then invokegetRegister()to get a free register, and moveythere. -
Look to see where
zis currently stored. If it is not already in a register, then invokegetRegister()to get a free register, and movezthere. - Do the operation on the two registers.
-
Copy the value from that register to the memory location
for
x.
Step 1b takes advantage of the fact that we can do the
operation and store its result in the same register. That
register will now hold the value of x,
not y. If the code generator maintains a list of
which registers hold which objects, it must update the whenever
it assigns an object to a register and move it there.
After Step 5, x resides both in its memory
location and in the register we computed it into. The code
generator could free the register that holds x
for another use. This leads to a radical approach that
simplifies code generation at the expense of longer, slower
programs:
Use the same three registers for x,
y, and z every time,
re-loading them even if they are already sitting in a register!
One simple way to generate better code is to save Step 5 until
later, when we are done with the code block or when we need the
register. This saves both program space and execution time.
The cost in the code generator is keeping track of information
in the process of selecting registers, which brings us to the
getRegister() function.
Quick Exercise
Hint: consider Line 5 and Lines 10-11.
At Line 5, the algorithm would note that t1 was
already in a register, and use it to store the result:
5: SUB 1,1,2
It would then also use Register 1 in Line 6.
At Line 11, by noting that n was already stored
in t5 and Register 1, the algorithm would
not need to generate Line 11. Line 12 would get both of its
operands from Register 1:
11: MUL 4,1,1
This shows how even a simple getRegister()
function can help us generate better code. Knowing only that
a value is already in a register can prevent the generator
from producing an unnecessary line of code.
What can a more sophisticated getRegister()
function do? For instance, what it could see at Line 5 that
it would need the n again (twice) later in the
program and choose a different register for the result?
Implementing a getRegister() Utility
At first, all registers are free, except for the program counter (register 7) and any registers that your code generator reserves for particular uses (such as register 0 as a zero and registers 5-6 for stack pointers). So the code generator can simply grab an empty register when it needs one.
- Return the next available register.
Eventually, though, the generator may have used all of the other registers, and each will hold a value that was used in the program. How do registers become available again?
If the code generator always stores temporary results
to memory, into the slots allocated to the temporary and
user-defined objects, then the program will never have many
registers in use for very long. In such a case,
getRegister() won't have to do much.
However, such a compiler will generate inefficient code! Every time its needs a value, it will have to load the value into a register. This is especially wasteful given that most programs exhibit locality of reference: when a value is used, it likely will be used again soon in the instructions that follow.
There are many ways for getRegister() to generate
more efficient code, by doing more work at compile time. The
simplest is this:
- Return the next available register.
- If all registers are in use, then free one by writing its value back to memory.
This technique requires the code generator to maintain the
minimal amount of information that we have considered thus far
a register map of (register, object) pairs.
Keep in mind that a register can hold two objects at one time.
For example, after x := y (a copy instruction),
the same register holds both x and
y.
To implement this step, getRegister() could simply
circle back around to the beginning of its list and free the
registers in the same order it allocated them. But this might
violate the locality-of-reference principle and thus be
wasteful.
With a little more information, getRegister() can
do better:
- Return the next available register.
-
If all registers are in use, then free one by writing
its value back to memory.
Free a register immediately if the object it holds has no next use in the block.
If an object won't be used again, there is no need to waste a register holding it. But how can the code generator know this? Before generating code, the compiler can make a forward scan of the three-address code program to see where each value is used next after each instruction and which objects are "live" at the end of the code block.
In Klein, the only code blocks we have are function bodies and
the 'then' and 'else' clauses of if expressions,
so scanning for next use to the end of a block is
straightforward, and usually quite fast.
Next time, we will look at how the code generator can track and use 'next use' information about objects.
Some Pragmatic Project Advice: Getting Things Done
Your goal now is to produce a working code generator that writes legal, executable TM code. You do not need a 100% correct type checker to generate code. Focus on the code generator first.
It is better to have a working code generator that handles only a subset of Klein than to have an "almost working" code generator that handles all of Klein. If your compiler only "almost works", then it doesn't really "handle all of Klein"!
So, if you are short on time, act as if the Klein language consists only of a subset of its features, and implement that. Create a set of test programs that use only this subset of Klein to demonstrate that your compiler works.
For example, you might assume that there are no if
statements, just a main function. Generate code
for integer expressions. Then generate code for boolean
expressions. Then generate code for the if
statement. Then generate code for general function calls at
the end.
Or, after generating code for integer expressions, you might
do full-on function calls next. That's a more impressive step,
and it enables a different sort of limited Klein program. Then
generate code for if expressions.
Of course, if your
Module 5
works properly, you already have a run-time system that handles
much of the work needed for function calls. You might add them
first, and then focus on integer, boolean, and if
expressions (in that order, if you like).
photo courtesy of Wikipedia
Even if you aren't short on time, working in small steps like this can be a handy strategy. At every step of the way, you will have a working code generator for a growing subset of Klein.
Work in whatever order makes sense to you. The key is to make conscious choices that let you grow a working code generator, rather than produce a lot of code that doesn't quite work yet. As software guru, inventor of wiki, and my programming hero Ward Cunningham is known to say:
It's all talk until the tests run.
And talk ain't a compiler!