Session 13
Semantic Actions and Abstract Syntax

Project Update

Project 2

I have just begun reviewing the submissions. So far, so good! The parsers I've tried all found the bug in the Egyptian fractions program and accepted the corrected program.

I made a copy of the broken program and included it in a subdirectory of klein-programs/ named tests-pr02/, which also contains a truckload of "one-off" test programs, each of which is missing a token from a legal program, has an extra token, or changes a token in a way that creates a syntax error.

Project 3

Good news: we continue to work with the parser, so you get to stay in the groove you found for Project 2. More good news: the features we are adding in this project are more code than theory, so you get to begin working on code sooner.

A New Klein Program

Oops, I did it again. + Last week, I ran across a Numberphile video on YouTube about Harshad numbers, integers that are divisible by the sum of their digits. The person in the video wrote some code in Python, but I used Klein. It was a perfect fit!

Watch for the new program in the Klein collection.

A Quick Opening Exercise: Klein Golf

Here is a quickie:

Write the shortest legal Klein program.

Fewest tokens wins! (Par is 8.)
Fewest characters wins! (Par is 25.)

The Shortest Klein Program

Klein Golf

Here is my effort:

function main():integer 1

One line, seven tokens, twenty-five characters. Did anyone beat me? I think my program is minimal in all three counts. I need both spaces, or at least some other whitespace character, to separate function from the function's name and to separate integer from the 1 (or whatever digit I use there).

This program may look familiar. I used print-one.kln as my starting point. Two quick thoughts...

Test Programs

A short program like that can serve as the starting point for a good set of test cases. It does not have any extraneous parts, which makes it useful for focusing on single points of failure in a program.

It is also missing several languages features, though, so it can't be our only basis for testing!

Using only this smallest of Klein programs, how many useful tests of your parser can you create?

One-Liners

My solution squeezes an entire program on a single line. That doesn't make much sense for real programs. However, the same idea is attractive for writing small utility functions:

function average(...)... (a+b)/2

... except that parameter and function types take up a lot of space. It's hard to find terse forms in languages with a lot syntax. But if the function has few arguments and a small body, a one-liner can work. I occasionally use a one-line format for simple utility functions.

There is one way in which a Klein one-liner can really shine. Klein does not have local variables or any other names for data except parameter names. But we have one other kind of name: the function name. We can use this idea to create named constants in our program — by making them functions:

function CLASS_SIZE():integer 10

For the static cost of two characters (the parens each time we call the function), we get the software engineering benefit of naming and isolating literals. The run-time cost is a little time and a little space on each call.

A named constant is most attractive if the Klein optimizer or code generator knows how to inline simple functions. Then we would not face any run-time penalty for using a function call for its name. This is a cool optimization for us to implement later in the project, time permitting.

Recap: How a Parser Builds the Abstract Syntax Tree

Last time, we extended our table-driven parsing algorithm with ability to create the AST of a program, using the idea of semantic actions. This involved a number of steps:

We applied these ideas to our simple arithmetic language, showing how we could add semantic actions to some rules and then use the extended algorithm to parse the expression x + y * z. It worked.

I left the expression x * y + z as an exercise for you to trace. Notice the change: the addition and multiplication operators are swapped. Will the parser produce the correct tree for this expression?

(We trace...). It does!

We still have to figure out how to implement this in our parser. In theory, though, things look good.

Today, we'll consider these ideas in the context of Klein, a language with a larger grammar, as you begin work on Module 3 and Klein's abstract syntax. We'll also look ahead to some of practical issues in implementing semantic actions and abstract syntax, which you will read about for next time. But first, I have another nagging doubt...

Another Exercise: Bigger Doubts

Our augmented table-driven algorithm did the job for two programs in the expression language. It gave precedence to the * operator over the + in both cases, based on how the grammar expanded rules for terms and factors. Are there other cases in which it is difficult to produce the correct AST?

Recall that a parser can produce a right derivation or a left derivation. Either is sufficient for determining whether a program is a legal sentence in the language. But when a parser produces an abstract syntax tree for the program, it must be sure to perform operations in the correct order.

In Klein, as in most languages, operations at the same level of precedence are performed left-to-right. Our new parser with semantic actions works correctly for operators are at different levels of precedence, but will it work for expressions at the same level?

There is, of course, one good way to find out: trace the algorithm!

Here are the augmented algorithm and the augmented grammar.

Trace the new algorithm as it recognizes the token stream: x + y + z

After working through this on your own, check out this solution.

There are still other cases that might cause a problem... If you are not yet convinced, trace the algorithm as it operates on programs such as (x + y) * z. Does it produce the expected abstract syntax tree? If not, how might we modify the position of the semantic actions in the grammar to fix the problem?

All this practice is not wasted. Expressions like these are a part of Klein, so being sure you understand the algorithm and the placement of semantic actions in grammar rules will pay off on your project.

Compiler Moments: Steve Wozniak and Apple BASIC

All this talk of operator precedence and grammars reminded me of a passage in How Steve Wozniak Wrote BASIC for the Original Apple From Scratch

I didn't know about compiler writing other than perchance. But I did know about stacks and things like converting expressions into RPN using stacks. ...

I also had a list for every operator of 2 priorities. One was its tendency to go ahead of other operators. For example, the + operator would cause a * operator to occur first. But I needed a second table of the resistance to being pushed into action to handle things like parentheses. I had no idea if I was on a correct track but it worked correctly and did what I needed. It didn't have to come from a book.

Woz wrote his BASIC compiler in assembly language, because BASIC was the first high-level language he made for his new computer. That computer became the Apple I.

These days, it's tempting to think that someone has to tell us how to solve a hard problem. Or that they can, so we should let them. It's good to be reminded that we can figure out to do things, if we want.

It's also good, though, to realize that a little theory can save us from unnecessary work!

Adding Semantic Actions to a Grammar

The first pragmatic challenge of implementing our approach is deciding where to add the semantic actions to your grammar.

Unfortunately, this is a topic that compiler textbooks often cover poorly. It is also the first big hole we find in Thain's text. Chapter 6 gives a lot of detail on the construction of the AST nodes themselves, but it covers semantic actions only in terms of what compiler generators do.

Step 1: Define Abstract Syntax

The first step is to identify the abstract syntax of your language. When I do this, I use the original version of the grammar, which defines the intended structure of language before we mark it up for the purposes of top-down parsing.

For each production, look at the right hand side of the rule. Is this the kind of construct that programmers talk about in their code? If so, decide what the code generator will need to know in order to produce the meaning of the expression. Name the expression, and create a record that holds values for the variables in the expression.

Step 2: Place Semantic Actions

The second step is to determine at what point in each rule your parser will have recognized all the information for the tree node. That's where you place the semantic action in the rule.

For this step, we use the refactored grammar, because it is the basis for the production rules in our parse table.

Often, we place semantic actions at the end of the rule. However, some cases require extra care.

After we have factored the grammar and introduced marker non-terminals that make it predictive, we often want to place the semantic action before the marker non-terminal. This was essential in our little expression grammar. If we had placed the make-+ semantic action at the end of the rule, after the E', then our algorithm would have produced a right derivation — which would be wrong!

Note that this extra care is consistent with the advice in the preceding paragraph: put the semantic action at the point in the rule at which the parser has recognized all the information it needs to construct the node.

What about a rule that has several parts? Often, they involve nodes in the AST that don't have their own rules. Consider a Klein function definition:

DEFINITION ::= "function" IDENTIFIER "(" PARAMETER-LIST ")" ":" TYPE
                  BODY

IDENTIFIER doesn't have a grammar rule, because we treat identifiers as terminals produced by the scanner. So we need to construct an identifier node while processing this rule. PARAMETER-LIST, TYPE, and BODY do have their own rules, so we can place the semantic actions for those nodes on the corresponding rules. Finally, at the end of this rule, we can place the semantic action that creates a node for the function definition itself.

Could we place the semantic actions for PARAMETER-LIST, TYPE, and BODY in the grammar rule for DEFINITION? Certainly. That results in a single grammar rule that knows about a definition's AST. The trade-off is that this rule is large, and the new grammar doesn't represent AST knowledge in the rule that gives it its content.

Practical Matters

In the end, the best way to know how your parser will work is to trace through the algorithm by hand on a candidate grammar and see whether it constructs tree nodes in the order you need them. Tracing through the algorithm by hand can teach us a lot about how the parser will work before we invest a lot of time and effort implementing the grammar in code.

(This is one reason I traced one run in class, asked you to trace two others in class, and suggested you do more outside of class. You can try the same with any parts of the Klein grammar where you need some guidance.)

Once you have your code running, you can examine the ASTs produced by your parser and make adjustments to the grammar rules in your table where necessary.

LL(1) Parsing and Semantic Actions

We have now seen that a relatively simple extension of an LL(1) parser can execute semantic actions and build abstract syntax. The particular extension we made follows nicely from the predictive nature of top-down parsing. A top-down algorithm looks ahead one token without consuming it and thus expands a non-terminal before processing the tokens that match the production's right hand side.

In this extension, the primary job of the compiler writer is to add semantic actions to the grammar. This task isn't too difficult, as the grammar tells us most of what we need to know. But you do have to watch for occasional subtleties in precedence. When writing your compiler, be sure to test the output of your parser on many different kinds of programs, and inspect the results carefully.

Reading Assignment

Before next session, please read this short assignment about some of the choices you will have to make as you implement semantic action and abstract syntax trees in your parser.

Next session, we will work through an example that implements these ideas in a program, taking them down to the level of code. We will use our old friend, the Fizzbuzz compiler. We will also continue with our look at the abstract syntax of Klein.