Session 13
Semantic Actions and Abstract Syntax
Project Update
Project 2
I have just begun reviewing the submissions. So far, so good! The parsers I've tried all found the bug in the Egyptian fractions program and accepted the corrected program.
I made a copy of the broken program and included it in a
subdirectory of klein-programs/ named
tests-pr02/, which also contains a truckload of
"one-off" test programs, each of which is missing a token
from a legal program, has an extra token, or changes a token
in a way that creates a syntax error.
Project 3
Good news: we continue to work with the parser, so you get to stay in the groove you found for Project 2. More good news: the features we are adding in this project are more code than theory, so you get to begin working on code sooner.
A New Klein Program
Oops, I did it again. + Last week, I ran across a Numberphile video on YouTube about Harshad numbers, integers that are divisible by the sum of their digits. The person in the video wrote some code in Python, but I used Klein. It was a perfect fit!
This cultural reference is now older than most students in the course. Sigh.
Watch for the new program in the Klein collection.
A Quick Opening Exercise: Klein Golf
Here is a quickie:
Fewest tokens wins! (Par is 8.)
Fewest characters wins! (Par is 25.)
The Shortest Klein Program
Klein Golf
Here is my effort:
function main():integer 1
One line, seven tokens, twenty-five characters. Did anyone
beat me? I think my program is minimal in all three counts.
I need both spaces, or at least some other whitespace
character, to separate function from the
function's name and to separate integer from
the 1 (or whatever digit I use there).
This program may look familiar. I used
print-one.kln
as my starting point. Two quick thoughts...
Test Programs
A short program like that can serve as the starting point for a good set of test cases. It does not have any extraneous parts, which makes it useful for focusing on single points of failure in a program.
It is also missing several languages features, though, so it can't be our only basis for testing!
Using only this smallest of Klein programs, how many useful tests of your parser can you create?
One-Liners
My solution squeezes an entire program on a single line. That doesn't make much sense for real programs. However, the same idea is attractive for writing small utility functions:
function average(...)... (a+b)/2
... except that parameter and function types take up a lot of space. It's hard to find terse forms in languages with a lot syntax. But if the function has few arguments and a small body, a one-liner can work. I occasionally use a one-line format for simple utility functions.
There is one way in which a Klein one-liner can really shine. Klein does not have local variables or any other names for data except parameter names. But we have one other kind of name: the function name. We can use this idea to create named constants in our program — by making them functions:
function CLASS_SIZE():integer 10
For the static cost of two characters (the parens each time we call the function), we get the software engineering benefit of naming and isolating literals. The run-time cost is a little time and a little space on each call.
A named constant is most attractive if the Klein optimizer or code generator knows how to inline simple functions. Then we would not face any run-time penalty for using a function call for its name. This is a cool optimization for us to implement later in the project, time permitting.
Recap: How a Parser Builds the Abstract Syntax Tree
Last time, we extended our table-driven parsing algorithm with ability to create the AST of a program, using the idea of semantic actions. This involved a number of steps:
- adding a semantic action to each rule that must create a node in the AST
- adding an arm to the algorithm to handle the semantic actions that it finds on the parse stack
- creating a semantic stack to hold the nodes created by the semantic actions during parsing
- writing a factory method for each type of AST node, to be called by the algorithm or the semantic actions themselves
We
applied these ideas
to our simple arithmetic language, showing how we could add
semantic actions to some rules and then use the extended
algorithm to parse the expression
x + y * z.
It worked.
I left the expression
x * y + z
as an exercise for you to trace. Notice the change: the
addition and multiplication operators are swapped. Will the
parser produce the correct tree for this expression?
We still have to figure out how to implement this in our parser. In theory, though, things look good.
Today, we'll consider these ideas in the context of Klein, a language with a larger grammar, as you begin work on Module 3 and Klein's abstract syntax. We'll also look ahead to some of practical issues in implementing semantic actions and abstract syntax, which you will read about for next time. But first, I have another nagging doubt...
Another Exercise: Bigger Doubts
Our augmented table-driven algorithm did the job for two programs in the expression language. It gave precedence to the * operator over the + in both cases, based on how the grammar expanded rules for terms and factors. Are there other cases in which it is difficult to produce the correct AST?
Recall that a parser can produce a right derivation or a left derivation. Either is sufficient for determining whether a program is a legal sentence in the language. But when a parser produces an abstract syntax tree for the program, it must be sure to perform operations in the correct order.
In Klein, as in most languages, operations at the same level of precedence are performed left-to-right. Our new parser with semantic actions works correctly for operators are at different levels of precedence, but will it work for expressions at the same level?
There is, of course, one good way to find out: trace the algorithm!
Trace the new algorithm as it recognizes the token stream:
x + y + z
After working through this on your own, check out this solution.
There are still other cases that might cause a problem... If
you are not yet convinced, trace the algorithm as it operates
on programs such as (x + y) * z. Does it
produce the expected abstract syntax tree? If not, how might
we modify the position of the semantic actions in the grammar
to fix the problem?
All this practice is not wasted. Expressions like these are a part of Klein, so being sure you understand the algorithm and the placement of semantic actions in grammar rules will pay off on your project.
Compiler Moments: Steve Wozniak and Apple BASIC
All this talk of operator precedence and grammars reminded me of a passage in How Steve Wozniak Wrote BASIC for the Original Apple From Scratch
I didn't know about compiler writing other than perchance. But I did know about stacks and things like converting expressions into RPN using stacks. ...
I also had a list for every operator of 2 priorities. One was its tendency to go ahead of other operators. For example, the+operator would cause a*operator to occur first. But I needed a second table of the resistance to being pushed into action to handle things like parentheses. I had no idea if I was on a correct track but it worked correctly and did what I needed. It didn't have to come from a book.
Woz wrote his BASIC compiler in assembly language, because BASIC was the first high-level language he made for his new computer. That computer became the Apple I.
These days, it's tempting to think that someone has to tell us how to solve a hard problem. Or that they can, so we should let them. It's good to be reminded that we can figure out to do things, if we want.
It's also good, though, to realize that a little theory can save us from unnecessary work!
Adding Semantic Actions to a Grammar
The first pragmatic challenge of implementing our approach is deciding where to add the semantic actions to your grammar.
Unfortunately, this is a topic that compiler textbooks often cover poorly. It is also the first big hole we find in Thain's text. Chapter 6 gives a lot of detail on the construction of the AST nodes themselves, but it covers semantic actions only in terms of what compiler generators do.
Step 1: Define Abstract Syntax
The first step is to identify the abstract syntax of your language. When I do this, I use the original version of the grammar, which defines the intended structure of language before we mark it up for the purposes of top-down parsing.
For each production, look at the right hand side of the rule. Is this the kind of construct that programmers talk about in their code? If so, decide what the code generator will need to know in order to produce the meaning of the expression. Name the expression, and create a record that holds values for the variables in the expression.
Step 2: Place Semantic Actions
The second step is to determine at what point in each rule your parser will have recognized all the information for the tree node. That's where you place the semantic action in the rule.
For this step, we use the refactored grammar, because it is the basis for the production rules in our parse table.
Often, we place semantic actions at the end of the rule. However, some cases require extra care.
After we have factored the grammar and introduced marker
non-terminals that make it predictive, we often want to place
the semantic action before the marker non-terminal.
This was essential in our little expression grammar. If we
had placed the make-+ semantic action at the end
of the rule, after the E', then our algorithm
would have produced a right derivation — which would be
wrong!
Note that this extra care is consistent with the advice in the preceding paragraph: put the semantic action at the point in the rule at which the parser has recognized all the information it needs to construct the node.
What about a rule that has several parts? Often, they involve nodes in the AST that don't have their own rules. Consider a Klein function definition:
DEFINITION ::= "function" IDENTIFIER "(" PARAMETER-LIST ")" ":" TYPE
BODY
IDENTIFIER doesn't have a grammar rule, because
we treat identifiers as terminals produced by the scanner. So
we need to construct an identifier node while processing this
rule. PARAMETER-LIST, TYPE, and
BODY do have their own rules, so we can place the
semantic actions for those nodes on the corresponding rules.
Finally, at the end of this rule, we can place the semantic
action that creates a node for the function definition itself.
Could we place the semantic actions for
PARAMETER-LIST, TYPE, and
BODY in the grammar rule for
DEFINITION? Certainly. That results in a single
grammar rule that knows about a definition's AST. The
trade-off is that this rule is large, and the new grammar
doesn't represent AST knowledge in the rule that gives it
its content.
Practical Matters
In the end, the best way to know how your parser will work is to trace through the algorithm by hand on a candidate grammar and see whether it constructs tree nodes in the order you need them. Tracing through the algorithm by hand can teach us a lot about how the parser will work before we invest a lot of time and effort implementing the grammar in code.
(This is one reason I traced one run in class, asked you to trace two others in class, and suggested you do more outside of class. You can try the same with any parts of the Klein grammar where you need some guidance.)
Once you have your code running, you can examine the ASTs produced by your parser and make adjustments to the grammar rules in your table where necessary.
LL(1) Parsing and Semantic Actions
We have now seen that a relatively simple extension of an LL(1) parser can execute semantic actions and build abstract syntax. The particular extension we made follows nicely from the predictive nature of top-down parsing. A top-down algorithm looks ahead one token without consuming it and thus expands a non-terminal before processing the tokens that match the production's right hand side.
In this extension, the primary job of the compiler writer is to add semantic actions to the grammar. This task isn't too difficult, as the grammar tells us most of what we need to know. But you do have to watch for occasional subtleties in precedence. When writing your compiler, be sure to test the output of your parser on many different kinds of programs, and inspect the results carefully.
Reading Assignment
Before next session, please read this short assignment about some of the choices you will have to make as you implement semantic action and abstract syntax trees in your parser.
Next session, we will work through an example that implements these ideas in a program, taking them down to the level of code. We will use our old friend, the Fizzbuzz compiler. We will also continue with our look at the abstract syntax of Klein.