Session 25
Techniques for Generating Target Code

Opening Exercise

Last time, we began our study of how to generate target code and saw the idea of a code template for a generic 3AC statement. We can build an entire code generator using a set of templates like that one.

Expressions that cause branches in execution create more challenges for a code generator. Let's write TM code for an if expression and consider some of those challenges.

Generate TM assembly code for this Klein expression:
if (n < 10)
    1
else
    n * n
Let's assume that:
  • Argument n is stored in a stack frame at memory location 105.
  • Register 5 is a status pointer that points at the first slot after the arguments, memory location 106.
  • Temporary variables t1, t2, t3, ... have been allocated for use in the code and are stored in the same stack frame at memory locations 113, 114, 115, ..., respectively.
  • Following TM idiom, Register 0 holds a 0.

A Candidate Solution

First, here is a three-address code version of the program:

t1 := n
t2 := 10
t3 := t1 < t2
IF_NOT t3 GOTO label1       ;; inverted so that 'then' comes next
t4 := 1
GOTO label2
LABEL label1
t5 := n
t6 := n
t4 := t1 * t2
LABEL label2

Notice two things here:

From here, we can generate the TM assembly code:

  3: LD   1,-1(5)      ;; n is at LOC 105
  4: LDC  2,10(0)      ;; we need 10 in a register
  5: SUB  3,1,2        ;; now we can compare r3 to 0 and branch
  6: JGE  3,3(7)       ;; select inverted Jxx instruction
*                      ;;      and branch over the then clause
* then clause
*
  7: LDC  4,1(0)       ;; load value of the then clause into r4
  8: ST   4,10(5)      ;; store r4 into memory slot for t4
  9: LDA  7,4(7)       ;;      and branch over else clause
*
* else clause
*
 10: LD   1,-1(5)      ;; reload n from LOC 105
 11: LD   2,-1(5)      ;; reload n from LOC 105
 12: MUL  4,1,2        ;; compute value of else clause into r4
 13: ST   4,10(5)      ;; store r4 into memory slot for t4

I embedded this in code that set up our assumption and printed the result. It's usually a good idea to make sure that code works as designed!

By using a Register 5 as a status pointer into the stack frame, the address computations in lines 2, 3, 8, 10, 11, and 13 are relative to that pointer. If not, you could code them with absolute addresses using register 0: 105(0), ....

The choice not to store the temp vars t2 and t2 to memory works fine in a bit of code this small, but it will not always work well in general, when we might run out of registers. The simplest code templates store the value of every temporary into its stack slot as soon as the program computes it, and then reload it from memory on its next use. This approach creates longer — and slower — TM code. But the code generator is simpler.

Creating templates and applying them verbatim makes this process more straightforward — and more verbose — than writing assembly language by hand. Even so, there are simple techniques we can use when writing a code generator that produce more efficient. For example, if Register 0 always holds a 0, why ever load 0 into a register? Why load n into two separate registers, in Instructions 10-11?

Generating Target Code with Branches

Computing t3 creates our first problem: TM assembly language does not have a generic < operation. All of its branching op codes compare a value in a register to zero. How can we compare t1 to t2 then? We can use a simple transformation:

t1 < t2       t1 - t2 < 0

... that converts a comparison of two numbers into a comparison of one number to zero. Now we can write:

3AC STATEMENT CODE GENERATED
t3 := t1 < t2 SUB 3, 1, 2
IF t3 GOTO label1 JGE 3, ?(7)

But that still leaves us with a problem: how can the code generator know to replace of the ? with a 3?

Generating Jump Targets

How can the code generator know how many instructions to skip? This is a general problem that occurs any time the compiler generates code for a boolean expression, a conditional expression, a function call, or any other expression for which it does not yet know the jump target. The problem is further complicated for us, because TM assembly language does not have labels — so we cannot generate a "jump to location labeled label1" instruction!

(This problem can also occur when the compiler generates code "out of order", say, by generating code for local chunks of the AST independently and then stitching the pieces together at the end.)

One solution is to generate code in two passes. The first pass generates the framework of the target program. The second pass computes the jumps and labels. This approach simplifies the compiler's task but makes it less efficient.

Another approach makes only a single pass over the AST. When the compiler generates a branch whose jump target is undefined, it adds the statement to a list of "jumps to be completed". As soon as the correct label is known, the compiler fills in the slot and removes the statement from the to-do list. This technique is known as backpatching.

A trivial sort of backpatching is to generate target code statements out of order. The TM virtual machine supports this approach, because statements in a TM program are labeled by their integer position in the program. Even if a TM source file is "out of order", the statements are put into the correct order at load time.

We can take advantage of this feature of TM assembly to backpatch the target of our if expression's jump. The code generator can wait to write the JGE instruction until it has generated all of the statements in the then clause, counting the number of statements generated as it goes. Then, when it generates the jump statement, it knows exactly the number of instructions to skip.

So, the generator will produce two assembly statements for the body of the clause, and then generate the jump statement to skip over them:

3AC STATEMENT CODE GENERATED
IF_NOT t3 GOTO label1 JGE 3, 3(7)

Recall that the TM virtual machine increments the PC upon loading an instruction, so we add 3 to the PC to jump from Statement 6 to Statement 10.

The code produced by the generator would actually look like this:

 7: LDC  4,1(0)       ;; load value of the then clause into r4
 8: ST   4,10(5)      ;; store r4 into memory slot for t3
 9: LDA  7,4(7)       ;;      and branch over else clause
 6: JGE  3,3(7)       ;; select inverted Jxx instruction

Simple, but nice. This approach works for boolean computations as well as control structures, such as (a or b).

There is one more complication with boolean expressions. In most programming languages, boolean operators use short-circuit evaluation. This means that if a evaluates to true, then the code for the boolean expression must skip the evaluation of b and the subsequent disjunction. Klein works this way, so your code templates for the and and or operators will have to take that into account.

Backpatching Jump Targets

That's all great, but how can our code generator implement this idea?

Let's take a closer look at the jumps in Instructions 6 and 9 in our opening exercise. Here are the 3AC and assembly code side by side:

t1 := n                         3: LD   1,-1(5)
t2 := 0                         4: LDC  2,10(0)
t3 := t1 < t2                   5: SUB  3,1,2
IF_NOT t3 GOTO label1           6: JGE  3,3(7)
t4 := 1                         7: LDC  4,1(0)
                                8: ST   4,10(5)
GOTO label2                     9: LDA  7,4(7)
LABEL label1
t5 := n                        10: LD   1,-1(5)
t6 := n                        11: LD   2,-1(5)
t4 := t5 * t6                  12: MUL  4,1,2
                               13: ST   4,10(5)
LABEL label2                   14: [next instruction]

It's easy to see that the jumps in Instructions 6 and 9 are correct. But how can our code generator produce this code? It doesn't know the size of the THEN or ELSE blocks at the time we come to those lines in the generation process.

As we've just seen, a common solution to the jump target issue is "backpatching": maintaining a list of "jumps to be completed" and fill the target into the generated code as soon as you know the actual target. The TM virtual machine supports a simple form of backpatching: numbered assembly language statements. Our compiler can generate target code statements out of order!

Let's see how that might look in our exercise code. After we compute n - 10, we are ready to generate a jump:

IF_NOT t3 GOTO label1           6: JGE  3,???(7)

We know that the jump occurs in Instruction 6 of the program and that it goes to label1, but we don't know what line of code label1 corresponds to yet. So we save what we know in a data structure:

jumps_to_complete = [ (6, label1, (JGE 3)) ]

and proceed.

Next, we generate code for the assignment statement and are ready to generate a jump:

t4 := 1                         7: LDC  4,1(0)
                                8: ST   4,10(5)
GOTO label2                     9: LDA  7,???(7)

Again, we know the number of the jump instruction, 9, and the target, label2, but we don't know the line number for the label yet. So we save what we know:

jumps_to_complete = [
  (6, label1, (JGE 3)),
  (9, label2, unconditional)
]

and proceed.

Next we see a label:

LABEL label1

We now know that label1 corresponds to Instruction 10 in our program. Let's record the new information and proceed.

label_data = { label1 : 10 }

We generate several lines of code for the else clause:

t5 := n                        10: LD   1,-1(5)
t6 := n                        11: LD   2,-1(5)
t4 := t5 * t6                  12: MUL  4,1,2
                               13: ST   4,10(5)

before running into our next label:

LABEL label2                   14: [next instruction]

We now know that label2 corresponds to Instruction 14 in the program, so record that detail in our label table:

label_data = { label1 : 10, label2 : 14 }

That is the end of our expression. Now we generate the jumps.

The process is straightforward: iterate through jumps_to_complete, using information there to build the instruction and looking up the line number of the label in label_data. The differences between the jumps and their targets are 4 and 5, respectively, but remember: the TM VM advances the PC at the moment it loads an instruction. So our jump gaps are always one smaller than the difference:

IF_NOT t3 GOTO label1           6: JGE  3,3(7)
GOTO label2                     9: LDA  7,4(7)

And there you have it!

There are many different ways to implement this pattern. For instance, we could have generated jumps as soon as we knew each label's line number. I like the clean style of separating body generation from jump generation in TM, but that's a stylistic preference.

Like so many things in compiler construction, and computer science more generally, this doesn't seem so mysterious after we see it work. At one level, it's all data structures and algorithms.

Now, onto algorithms for generating code and selecting registers.

The Basic Algorithm for Code Generation

At its heart, generating target code comes down to producing one or more target instructions for each node in the AST or for each statement in the intermediate representation. The basic algorithm for code generation will be the same for most operations. Control flow operators are bit different.

Your code generator for Klein must take into account the features of your target machine, TM. We saw in our opening exercise one of the features of TM that most affects your code generator: All arithmetic in TM is done using register-only instructions.

Here is an algorithm that meets our needs:

For each three-address code instruction x := y op z:
  1. Decide where to store the value of y op z.
    • If x is already in a register, then use that register.
    • Else if y is already in a register, then use that register.
    • Otherwise, invoke the function getRegister() to get a free register, and move y there.
  2. Look to see where y is currently stored. If it is not already in a register, then invoke getRegister() to get a free register, and move y there.
  3. Look to see where z is currently stored. If it is not already in a register, then invoke getRegister() to get a free register, and move z there.
  4. Do the operation on the two registers.
  5. Copy the value from that register to the memory location for x.

Step 1b takes advantage of the fact that we can do the operation and store its result in the same register. That register will now hold the value of x, not y. If the code generator maintains a list of which registers hold which objects, it must update the whenever it assigns an object to a register and move it there.

After Step 5, x resides both in its memory location and in the register we computed it into. The code generator could free the register that holds x for another use. This leads to a radical approach that simplifies code generation at the expense of longer, slower programs:

Use the same three registers for x, y, and z every time, re-loading them even if they are already sitting in a register!

One simple way to generate better code is to save Step 5 until later, when we are done with the code block or when we need the register. This saves both program space and execution time. The cost in the code generator is keeping track of information in the process of selecting registers, which brings us to the getRegister() function.

Quick Exercise

In what way would this algorithm generate more efficient code for our opening exercise?

Hint: consider Line 5 and Lines 10-11.

At Line 5, the algorithm would note that t1 was already in a register, and use it to store the result:

 5: SUB  1,1,2

It would then also use Register 1 in Line 6.

At Line 11, by noting that n was already stored in t5 and Register 1, the algorithm would not need to generate Line 11. Line 12 would get both of its operands from Register 1:

11: MUL  4,1,1

This shows how even a simple getRegister() function can help us generate better code. Knowing only that a value is already in a register can prevent the generator from producing an unnecessary line of code.

What can a more sophisticated getRegister() function do? For instance, what it could see at Line 5 that it would need the n again (twice) later in the program and choose a different register for the result?

Implementing a getRegister() Utility

At first, all registers are free, except for the program counter (register 7) and any registers that your code generator reserves for particular uses (such as register 0 as a zero and registers 5-6 for stack pointers). So the code generator can simply grab an empty register when it needs one.

  1. Return the next available register.

Eventually, though, the generator may have used all of the other registers, and each will hold a value that was used in the program. How do registers become available again?

If the code generator always stores temporary results to memory, into the slots allocated to the temporary and user-defined objects, then the program will never have many registers in use for very long. In such a case, getRegister() won't have to do much.

However, such a compiler will generate inefficient code! Every time its needs a value, it will have to load the value into a register. This is especially wasteful given that most programs exhibit locality of reference: when a value is used, it likely will be used again soon in the instructions that follow.

There are many ways for getRegister() to generate more efficient code, by doing more work at compile time. The simplest is this:

  1. Return the next available register.
  2. If all registers are in use, then free one by writing its value back to memory.

This technique requires the code generator to maintain the minimal amount of information that we have considered thus far a register map of (register, object) pairs. Keep in mind that a register can hold two objects at one time. For example, after x := y (a copy instruction), the same register holds both x and y.

To implement this step, getRegister() could simply circle back around to the beginning of its list and free the registers in the same order it allocated them. But this might violate the locality-of-reference principle and thus be wasteful.

With a little more information, getRegister() can do better:

  1. Return the next available register.
  2. If all registers are in use, then free one by writing its value back to memory.
    Free a register immediately if the object it holds has no next use in the block.

If an object won't be used again, there is no need to waste a register holding it. But how can the code generator know this? Before generating code, the compiler can make a forward scan of the three-address code program to see where each value is used next after each instruction and which objects are "live" at the end of the code block.

In Klein, the only code blocks we have are function bodies and the 'then' and 'else' clauses of if expressions, so scanning for next use to the end of a block is straightforward, and usually quite fast.

Next time, we will look at how the code generator can track and use 'next use' information about objects.

Some Pragmatic Project Advice: Getting Things Done

Your goal now is to produce a working code generator that writes legal, executable TM code. You do not need a 100% correct type checker to generate code. Focus on the code generator first.

It is better to have a working code generator that handles only a subset of Klein than to have an "almost working" code generator that handles all of Klein. If your compiler only "almost works", then it doesn't really "handle all of Klein"!

So, if you are short on time, act as if the Klein language consists only of a subset of its features, and implement that. Create a set of test programs that use only this subset of Klein to demonstrate that your compiler works.

For example, you might assume that there are no if statements, just a main function. Generate code for integer expressions. Then generate code for boolean expressions. Then generate code for the if statement. Then generate code for general function calls at the end.

Or, after generating code for integer expressions, you might do full-on function calls next. That's a more impressive step, and it enables a different sort of limited Klein program. Then generate code for if expressions.

Of course, if your Module 5 works properly, you already have a run-time system that handles much of the work needed for function calls. You might add them first, and then focus on integer, boolean, and if expressions (in that order, if you like).

Ward Cunningham
Ward Cunningham
photo courtesy of Wikipedia

Even if you aren't short on time, working in small steps like this can be a handy strategy. At every step of the way, you will have a working code generator for a growing subset of Klein.

Work in whatever order makes sense to you. The key is to make conscious choices that let you grow a working code generator, rather than produce a lot of code that doesn't quite work yet. As software guru, inventor of wiki, and my programming hero Ward Cunningham is known to say:

It's all talk until the tests run.

And talk ain't a compiler!