Session 14 Bonus Reading

Implementing a Table-Driven Parser for AGLn

Making a Better Parser for AGL

In Session 6, we wrote a scanner for AGL, a little language for creating ASCII graphics. Now it is time to take the next step: implement a table-driven parser with semantic actions.

We will proceed in two big steps. First, we will create the abstract syntax for the language, then we will augment the grammar with semantic actions that generate the abstract syntax tree required by the later phases of the compiler.

This series of exercises gives you some practice building a table-driven top-down parser and exposes you to some of the pragmatics of writing a parsing program.

Exercise 1: Define the Abstract Syntax

Define the abstract syntax for AGL from its original grammar:

drawing  → row*
row      → repeat chunk+ ";"
repeat   → INTEGER
chunk    → INTEGER CHAR

Solution

I opted for four types of node, one for each non-terminal in the language. This works in AGL only because it is so simple; we won't need nodes to wrap identifiers and literal values in the tree later. For your compiler, you will definitely want nodes for those values!

I implemented each as a Python class in the file ast.py. The value() method returns the node's instance variable or, if there is more than one, a tuple of instance variables. I added an optional main function that demonstrates the nodes for the "Big I" program.

Exercise 2: Add Semantic Actions to the Grammar

Add semantic actions to the refactored grammar:

drawing  → row drawing
         | ε
row      → repeat chunks ";"
chunks   → chunk chunks
         | ε
repeat   → INTEGER
chunk    → INTEGER CHAR

Solution

The row, repeat, and chunk cases seem straightforward... We have a single rule for each, so we can put the semantic action at the end of the rule.

What about drawing? The one place I know that I have recognized an entire drawing is the moment I pop the last NonTerminal.Drawing symbol off of the parse stack. That corresponds to the ε rule, so I put my semantic action there. In a grammatical sense, this is still an ε rule, but in an AST-generating sense, it is no a rule with one symbol: make-drawing.

The result looks like this:

drawing  → row drawing
         | make-drawing
row      → repeat chunks ";" make-row
chunks   → chunk chunks
         | ε
repeat   → INTEGER make-repeat
chunk    → INTEGER CHAR make-chunk

Step 3: Implement Semantic Actions in Code

Now let's implement our semantic actions in code. This requires answering two questions:

What will semantic actions look like on the parse stack?
How will the algorithm execute the semantic actions?

For the first, I used an enum, similar to how I implemented tokens and non-terminals:

class AstAction(Enum):
    MakeDrawing = 0
    MakeRow     = 1
    MakeRepeat  = 2
    MakeChunk   = 3

For the second, I looked back to the three ways of representing semantic actions discussed last session. To be consistent with my other Pythonic choices, I decided to use its equivalent of a table of function pointers: a dictionary mapping semantic actions to functions.

First, I wrote functions for each kind of semantic action:

def make_repeat_node(ast_stack):  ...
def make_chunk_node(ast_stack):   ...
def make_row_node(ast_stack):     ...
def make_drawing_node(ast_stack): ...

Each function pops the arguments it needs off of the semantic stack, builds a new tree node, and pushes it onto the semantic stack. The functions for row and drawing are a little different from what we have seen in the past, because they have to pop an unknown number of items of the stack (chunks and rows, respectively) and put them in a list.

Next, I built my table of semantic actions, as Python dictionary:

action_table = {
    AstAction.MakeDrawing : make_drawing_node,
    AstAction.MakeRow     : make_row_node,
    AstAction.MakeRepeat  : make_repeat_node,
    AstAction.MakeChunk   : make_chunk_node
}

Step 4: Add Semantic Actions to the Parsing Table

We are now able to make the final change to the data in our program: adding instances of our semantic actions to the rules in our parsing table. This step follows the changes we made to the refactored grammar in Exercise 2 above. It requires us to change four rules:

parse_table = {
    ...
    (NonTerminal.Row, TokenType.int_token) :
        [ NonTerminal.Repeat,
          NonTerminal.Chunks,
          TokenType.terminator,
          AstAction.MakeRow ] ,
    ...
    (NonTerminal.Repeat, TokenType.int_token) :
        [ TokenType.int_token,
          AstAction.MakeRepeat ] ,
    (NonTerminal.Chunk, TokenType.int_token) :
        [ TokenType.int_token,
          TokenType.str_token,
          AstAction.MakeChunk ],
    (NonTerminal.Drawing, TokenType.EndOfStream) :
        [ AstAction.MakeDrawing ],
    ...
}

Those are all of the language-specific changes we have to make. All that is left is to modify the parsing algorithm to handle semantic actions.

Step 5: Modify the Parser

Here, we make the same changes to parsing algorithm that we made to the Fizzbuzz compiler in Session 14. Refer back to those notes if you would like a refresher.

See Spot Run

You can run the parser in two ways.

If you load parser.py into your REPL, it will run the default main program at the bottom, which parses a hard-coded string and prints the AST a string. This uses the __str__ methods in each class, which return AGL's concrete syntax.
You can use the script aglp, which parses the program stored in the file whose name is given as a command-line argument. This prints a formatted listing of the program's AST, as requested by Module 3. Feel free to use this as inspiration for your kleinpscript.

The full version of the updated AGL compiler is in today's zip file. Examine it as much as you like and ask any questions that you have.