Session 14
Adding Semantic Actions to a Real Parser
Setting Up the Day
Today we tie together several threads.
- First, we'll take a few minutes to consider the abstract syntax of Klein.
- Next, we'll discuss the ideas in your reading assignment for today, on representing semantic actions in a program and implementing abstract syntax.
- With all this knowledge in hand, we will look at how to apply the techniques we have been learning to the Fizzbuzz compiler we have seen a couple of times this semester. We'll add semantic actions and abstract syntax to the table-driven parser we explored last week.
Finally, we will close with a quick discussion of upcoming activities.
Implementing Semantic Actions and ASTs
We have been discussing the syntax analysis phase of a compiler. In particular, we are learning how to implement our first intermediate representation of a source program, the abstract syntax tree (AST) that serves as a primary input to al later phases in the compiler. Later phases of a compiler must read and annotate the abstract syntax of a program in several different ways.
In Session 12, we learned how to augment our table-driven parsing technique to create ASTs as a side effect, using the idea of a semantic action. We applied those ideas to our small expression grammar and parser. We saw that the process works — almost like magic, even though we understand the process.
In your reading for this time, we considered some pragmatic issues in implementing the augmented technique in code. These include representing semantic actions and the abstract syntax tree in a program. We see again a fundamental trade-off between active data (objects) and passive data (records), which you first encounter in Intermediate Computing. We also encountered the same choice earlier in the course when implementing state machines.
Compilers tend to lean toward the passive data side of the continuum. When using an OO language, though, objects are an attractive choice that offset some of the weaknesses of algorithms processing passive data.
Choosing how we want to implement abstract syntax matters, because this is a key point in the design of a compiler. We know of several ways in which later phases of the compiler will use the AST, in particular type checking and code generation, but we may not be able to anticipate all potential uses. The way we represent an AST should support the compiler's behavior downstream from the parser and, if possible, permit the addition of processing steps we haven't thought of yet.
You will see a common pattern in OO programs that makes it possible to add new behaviors for an object without changing the object's code. It is called the visitor pattern. It is an ingenious solution.
If you are interested in learning more about the visitor pattern, check out this mini-session that describes visitors and demonstrates its implementation. Even if you are not using Java or an OO style to build your compiler, you may want to study the code for the on your own. One thing I've learned as a programmer is that I learn a lot by reading code.
Making a Better Parser for Fizzbuzz
In Session 11, we looked at a table-driven parser for Fizzbuzz, the unorthodox programming language we first saw in Session 2. Now it is time to take the next step: augment the Fizzbuzz parser with semantic actions.
We will proceed in two big steps. First, we will create the abstract syntax for the language, then we will augment the grammar with semantic actions that generate the abstract syntax tree required by the later phases of the compiler.
This series of exercises gives you some practice building a table-driven top-down parser and exposes you to some of the pragmatics of writing a parsing program. If you are working through this session on your own, do the exercises before peeking at the solutions.
Exercise 1: Define the Abstract Syntax
program ::= range assignments
range ::= number ELLIPSIS number
assignments ::= assignment assignments
| ε
assignment ::= word EQUALS number
word ::= WORD
number ::= NUMBER
Solution
We need six kinds of node, one for each non-terminal in the language. Because Fizzbuzz is so simple (and I was a bit short of time), I implemented only four kinds of node: a program, a range, an assignment, and an assignment list. This works because:
- Python is dynamically typed and allows me to store numbers and words on the semantic stack, and
- code generation treats both of them as strings to output.
For your compiler, you will definitely want nodes for identifiers and numbers!
I implemented each as a Python class in the file
ast.py.
The __str__ and pretty_print
methods are for formatting output, including that of the code
generator. That behavior will ordinarily reside elsewhere in
a compiler.
Exercise 2: Add Semantic Actions to the Grammar
program ::= range assignments
range ::= number ELLIPSIS number
assignments ::= assignment assignments
| ε
assignment ::= word EQUALS number
word ::= WORD
number ::= NUMBER
Solution
Because Fizzbuzz is so simple, I implemented only four semantic actions, one for each type of AST node. I push numbers and words onto the stack by hand when I encounter those tokens.
When it comes to placing the semantic actions in the grammar, most of the cases seem straightforward. We have a single rule for a kind of thing, so we can put the semantic action at the end of the rule.
What about assignments? One place I know that
I have recognized all of the assignments is the moment I pop
the NonTerminal.Assignments symbol off of the
parse stack. That corresponds to the ε rule, so I
put my semantic action there. In a grammatical sense, this
is still an ε rule, but in an AST-generating sense,
it is now a rule with one symbol:
make-assignments!
The result looks like this:
program ::= range assignments make-program
range ::= number ELLIPSIS number make-range
assignments ::= assignment assignments
| make-assignments
assignment ::= word EQUALS number make-assignment
word ::= WORD
number ::= NUMBER
What if we put semantic actions in different places? For
instance, what if we put make-assignments after
assignments on the program rule,
instead of on the ε rule? What if we put
make-assignments at the end of the
assignments rule itself? Try them out and see!
Step 3: Implement Semantic Actions in Code
Now let's implement our semantic actions in code. This requires us to think about two things:
- What will semantic actions look like on the parse stack?
- How will the algorithm execute the semantic actions?
For the first, I used an enum, similar to how I implemented tokens and non-terminals:
class AstAction(Enum):
MakeProgram = 0
MakeRange = 1
MakeAssignment = 2
MakeAssignments = 3
For the second, I looked back to the three ways of representing semantic actions discussed last session. To be consistent with my other Pythonic choices, I decided to use its equivalent of a table of function pointers: a dictionary mapping semantic actions to functions. This corresponds to the functional style glossed over in your reading.
First, I wrote functions for each kind of semantic action:
def make_program_node(ast_stack): ... def make_range_node(ast_stack): ... def make_assignment_node(ast_stack): ... def make_assignments_node(ast_stack): ...
Each function pops the arguments it needs off of the semantic stack, builds a new tree node, and pushes it onto the semantic stack. The function for assignments is a little different from what we have seen in the past, because it has to pop an unknown number of assignments of the stack and put them in a list.
Next, I built my table of semantic actions, as a Python dictionary:
action_table = {
AstAction.MakeProgram : make_program_node,
AstAction.MakeRange : make_range_node,
AstAction.MakeAssignment : make_assignment_node,
AstAction.MakeAssignments : make_assignments_node
}
Now, when our parsing algorithm encounters a semantic action on the parse stack, it can look up a function corresponding to the the action, pass it the semantic stack as an argument, and let it do its thing.
Step 4: Add Semantic Actions to the Parsing Table
We are now able to make the final change to the data in our program: adding instances of our semantic actions to the rules in our parsing table. This step follows the changes we made to the refactored grammar in Exercise 2 above. It requires us to change four rules by adding a semantic action at the end of the row.
parse_table = {
(NonTerminal.Program, TokenType.NUMBER):
[NonTerminal.Range, NonTerminal.Assignments, AstAction.MakeProgram],
...
}
Those are all of the Fizzbuzz-specific changes we have to make. All that is left is to modify the parsing algorithm to handle semantic actions.
Step 5: Modify the Parser
Finally, we get to extend our parser to handle the semantic actions that will be on its parse stack. We can refer to the updated algorithm to guide us.
The first change is typographic: convert all references to the stack into references to the parse_stack. This helps us to keep the two kinds of stack separate in our minds as we work with the code.
Then we create the semantic stack, initializing it to empty:
semantic_stack = []
Then we modify the terminal arm to remember the value of the matched numbers and words:
if isinstance( A, TokenType ):
t = self.scanner.next_token()
if A == t.token_type:
pop(parse_stack)
if t.is_number() or t.is_word():
push(t.value(), semantic_stack)
Notice that I use the semantic stack to hold the integers and strings. Again, this works for Fizzbuzz only because it is so simple. For your compiler, you will need nodes to hold identifiers and literal values, so you will want to record matched tokens in a separate variable and define semantic actions to create nodes for the them.
Finally, we add a new arm to handle semantic actions:
elif isinstance( A, AstAction ):
action = action_table.get(A)
action(semantic_stack)
pop(parse_stack)
Recall that I implemented semantic actions as Python functions. As a result, the parser has only to look up the function for the given semantic action and call it with the semantic stack.
I made one other addition to the algorithm. When the loop terminates, we may have recognized a legal program and built its AST. We currently watch for one possible error: extraneous tokens in the input stream. There is now a second possibility: the semantic stack is empty or holds more than one item. Either case indicates an error in our new parse table. I added a test to catch this error:
if len(semantic_stack) != 1:
msg = 'unexpected number of AST nodes: {}'
raise ParseError(msg.format(semantic_stack))
If all the error checks fail, we have achieved our goal of producing the AST for a legal program. So we return it!
return top(semantic_stack)
We now have a working parser.
See Spot Run
I modified the compiler to use my new parser:
from td_parser import Parser
so we can run the compiler and see if our new parser does the job. For example:
$ python3 compiler.py programs/extended.fb
Voilá!
The full version of the Fizzbuzz compiler is in today's zip file. Examine it as much as you like and ask any questions that you have.
Bonus demonstration. If you would like to see another example, check out this bonus reading that applies the same techniques to the ASCII graphics language for which we wrote a scanner in Session 6.
Upcoming Events
Homework 10 is scheduled for this week. + We are in the middle of a two-week sprint on the parser, so I hope you can carve out the time you need to complete it from your project work.
Recall that I am counting in binary. Humor me.
Next session, we may have a little extra time. If so, we will use it so that I can meet with each group individually for a few minutes. That will also leave a few extra minutes for the homework.