Implementing a Table-Driven Parser for AGLn
Making a Better Parser for AGL
In Session 6, we wrote a scanner for AGL, a little language for creating ASCII graphics. Now it is time to take the next step: implement a table-driven parser with semantic actions.
We will proceed in two big steps. First, we will create the abstract syntax for the language, then we will augment the grammar with semantic actions that generate the abstract syntax tree required by the later phases of the compiler.
This series of exercises gives you some practice building a table-driven top-down parser and exposes you to some of the pragmatics of writing a parsing program.
Exercise 1: Define the Abstract Syntax
drawing → row* row → repeat chunk+ ";" repeat → INTEGER chunk → INTEGER CHAR
Solution
I opted for four types of node, one for each non-terminal in the language. This works in AGL only because it is so simple; we won't need nodes to wrap identifiers and literal values in the tree later. For your compiler, you will definitely want nodes for those values!
I implemented each as a Python class in the file
ast.py.
The value() method returns the node's instance
variable or, if there is more than one, a tuple of instance
variables. I added an optional main function that
demonstrates the nodes for the "Big I" program.
Exercise 2: Add Semantic Actions to the Grammar
drawing → row drawing
| ε
row → repeat chunks ";"
chunks → chunk chunks
| ε
repeat → INTEGER
chunk → INTEGER CHAR
Solution
The row, repeat, and
chunk cases seem straightforward... We have
a single rule for each, so we can put the semantic action
at the end of the rule.
What about drawing? The one place I know that
I have recognized an entire drawing is the moment I pop the
last NonTerminal.Drawing symbol off of the parse
stack. That corresponds to the ε rule, so I put my
semantic action there. In a grammatical sense, this is still
an ε rule, but in an AST-generating sense, it is no
a rule with one symbol: make-drawing.
The result looks like this:
drawing → row drawing
| make-drawing
row → repeat chunks ";" make-row
chunks → chunk chunks
| ε
repeat → INTEGER make-repeat
chunk → INTEGER CHAR make-chunk
Step 3: Implement Semantic Actions in Code
Now let's implement our semantic actions in code. This requires answering two questions:
- What will semantic actions look like on the parse stack?
- How will the algorithm execute the semantic actions?
For the first, I used an enum, similar to how I implemented tokens and non-terminals:
class AstAction(Enum):
MakeDrawing = 0
MakeRow = 1
MakeRepeat = 2
MakeChunk = 3
For the second, I looked back to the three ways of representing semantic actions discussed last session. To be consistent with my other Pythonic choices, I decided to use its equivalent of a table of function pointers: a dictionary mapping semantic actions to functions.
First, I wrote functions for each kind of semantic action:
def make_repeat_node(ast_stack): ... def make_chunk_node(ast_stack): ... def make_row_node(ast_stack): ... def make_drawing_node(ast_stack): ...
Each function pops the arguments it needs off of the semantic stack, builds a new tree node, and pushes it onto the semantic stack. The functions for row and drawing are a little different from what we have seen in the past, because they have to pop an unknown number of items of the stack (chunks and rows, respectively) and put them in a list.
Next, I built my table of semantic actions, as Python dictionary:
action_table = {
AstAction.MakeDrawing : make_drawing_node,
AstAction.MakeRow : make_row_node,
AstAction.MakeRepeat : make_repeat_node,
AstAction.MakeChunk : make_chunk_node
}
Step 4: Add Semantic Actions to the Parsing Table
We are now able to make the final change to the data in our program: adding instances of our semantic actions to the rules in our parsing table. This step follows the changes we made to the refactored grammar in Exercise 2 above. It requires us to change four rules:
parse_table = {
...
(NonTerminal.Row, TokenType.int_token) :
[ NonTerminal.Repeat,
NonTerminal.Chunks,
TokenType.terminator,
AstAction.MakeRow ] ,
...
(NonTerminal.Repeat, TokenType.int_token) :
[ TokenType.int_token,
AstAction.MakeRepeat ] ,
(NonTerminal.Chunk, TokenType.int_token) :
[ TokenType.int_token,
TokenType.str_token,
AstAction.MakeChunk ],
(NonTerminal.Drawing, TokenType.EndOfStream) :
[ AstAction.MakeDrawing ],
...
}
Those are all of the language-specific changes we have to make. All that is left is to modify the parsing algorithm to handle semantic actions.
Step 5: Modify the Parser
Here, we make the same changes to parsing algorithm that we made to the Fizzbuzz compiler in Session 14. Refer back to those notes if you would like a refresher.
See Spot Run
You can run the parser in two ways.
-
If you load
parser.pyinto your REPL, it will run the default main program at the bottom, which parses a hard-coded string and prints the AST a string. This uses the__str__methods in each class, which return AGL's concrete syntax. -
You can use the script
aglp, which parses the program stored in the file whose name is given as a command-line argument. This prints a formatted listing of the program's AST, as requested by Module 3. Feel free to use this as inspiration for yourkleinpscript.
The full version of the updated AGL compiler is in today's zip file. Examine it as much as you like and ask any questions that you have.