CS 4550 Session 17

Session 17
Type Expressions and Type Checking

Where Are We?

We are now past the middle of the term on the calendar, and we have crossed over into the second half of the compiler project as well. After Project 3, your team has a parser that accepts syntactically legal Klein programs and produces abstract syntax trees for them. However, we know that your parser does not accept only legal programs, because the parser could not enforce requirements outside of the context-free grammar.

A road through a wooded area, trees on either side. The image is suffused in a beautiful green color, with sunlight filtering in from the left. — *Beautiful trees of green, not trees of abstract syntax.*

In Session 15 and Session 16, we began our discussion of semantic analysis, the phase of the compiler that determines whether a program "makes sense" according to the meanings of the language's constructs. It checks for elements of syntactic correctness that are beyond the capability of the parser and checks for elements of semantic correctness defined outside the grammar. Like syntactic analysis, semantic analysis also performs a pragmatic second task: annotating the AST with information that will assist in code generation.

Our primary focus for semantic analysis in this course is on type correctness, a semantic check that ensures variables, expressions, function calls, and statements behave in the ways specified by their types and the operations on them. That is our primary topic for the day.

Today, we look in more detail at the kind of type checking that a Klein compiler must do, in particular if expressions and functions. Then we will take a brief look back at the semantic checker for Fizzbuzz, to see what type checking looks like in code. This will finish our discussion of semantic analysis.

First, though, let's spend a few minutes thinking about the other parts of semantic analysis. That might be helpful, as your Klein semantic checker has to do more than type checking!

Quick Notes

Next session, we move onto the synthesis of target code in the TM assembly language. A short reading and lab exercise will prepare you to use the TM virtual machine. Be sure to work through the assignment before our next class!

The slides for this session are available as a PDF file, if you'd like to see a brief version of the algorithms you will want to implement in your semantic checker.

Language Requirements Outside the Grammar

Part of verifying a program's semantic correctness is to show that the program satisfies all of the non-grammatical requirements of the language. Your assignment for Module 4 identifies at least five such requirements that can be found in the Klein language spec.

Three of the five deal with function names at the global level. We can implement each of these checks with a loop over the list of program definitions that make up the program. For example:

for f in list of function defs
    if f.name = "print"
       then addError("user-defined function named 'print'")

A similar loop can check to see if there is a function named 'main'. A third loop can check to see if all function names are unique.

All three of these checks can be done with a single loop, if we do a little bookkeeping:

f_named_main = false
list_of_names = []

for f in list of function defs
    if f.name = "print"
       then addError("user-defined function named 'print'")
    if f.name = "main"
       then f_named_main = true
    if f.name in list_of_names
       then addError("duplicate function named", f.name)
       else add f.name to list_of_names

if !f_named_main
   then addError("no user-defined function named 'main'")

This approach walks the list of function definitions only once.

The other two requirements work at the level of individual functions. They, too, can be checked in a single pass over the list of function definitions, by examining each function's parameter list and then walking its body to find variable references.

This kind of semantic analysis does not require any new ideas from the course, just a little programming.

The Basic Idea of Type Checking

Now for type checking. The basic idea here is relatively simple, too.

Consider a typical node for, say, a Klein division expression:

According to the language grammar, the left and right operands to the / operator can be any expression. They might be an integer literal, an identifier that holds an integer, a call to a function that returns an integer, or another arithmetic expression. They could also be boolean values or expressions that evaluate to booleans. If both operands are integers, then this node is legal. If either operand is not an integer, then this node is invalid.

To type-check this or any other node, a semantic checker needs to...

Type-check its parts.
Apply a type rule to the results.

The type rule for the divide operator, or all binary arithmetic operators, is something like this:

if type(left operand) == integer
   and
   type(right operand) == integer
then type of node = integer
else type of node = error

This process performs a post-order walk of the abstract syntax tree using structural recursion.

Type Checking `if` Expressions

In the usual case, assigning a type expression to an if is pretty straightforward. Consider:

if (a < b)
   then b / a
   else 1

The AST for this expression looks something like this:

To type-check a compound expression, we have to type-check the expression's parts and use those results to type-check the expression. An if expression's test must be a boolean; if it isn't, then there is a semantic error. The "then" and "else" clauses ordinarily must return the same type. In this case, that type is integer.

The type rule might be written as:

if type(condition) == boolean
   and
   type(then) == type(else)
then
   type(if-exp) = type(then)
else
   type(if-exp) = error

A Familiar Question

I alluded to this last time, so now let's take it on...

What types might a semantic checker assign to this Klein expression?

if (a < b)
   then b / a
   else false

What if we change a < b to true?

I can think of at least two possibilities. Ask yourself: Is there any context in which a Klein programmer could reasonably use this expression?

This Type or That: A Union Type

Applying the type rule for if expressions given above, the type of:

if (a < b)
   then b / a
   else false

is in error. We can't use it as a value in a typed program.

If it is the body of a function, then it has to produce a value of a specific type. If it is part of a larger arithmetic or boolean expression, then its value will be used by an operator or function call that expects a value of a specific type. So, in general, the then and else clauses have the same type.

This line of reasoning led one team in a previous semester to ask the question (a very short optional reading) that inspired the exercise. It's a great question, because it helps us determine how to implement a Klein semantic checker and helps us understand type checking at a deeper level.

Is there a way that we can use that if expression in a valid Klein program? There is one context in which it is valid: the value is passed as an argument to the print function:

print(
  if (a < b)
     then b / a
     else false
)

The language spec says that print() returns no value, but it doesn't say anything about the type of the value passed to it. print is omnivorous! Er, that should say polymorphic. It can take this if expression as an argument and print a value of either type.

Can we support this behavior in our Klein type checker? We can, if we create a new kind of type expression:

OR(type₁, type₂)

An OR type says that a value will be of either type₁ or type₂. The OR type may be valid in some contexts, such as passing the value to a polymorphic print function, and invalid in others — just like any other type. Consider:

1 + if (a < b)
       then b / a
       else false

The type checker will determine its validity when it checks the expression that contains the if expression.

The Klein language specification is fairly typical. It does not define the type system as an entity in its own right, only implicitly as a result of combining and using expressions. print's looseness creates a potential problem — or a potential opportunity.

For us, it is an opportunity. You get to create a structured type expression in your Klein compiler.

Types Have Structure

Last session, we encountered the idea that types have structure. That's because, in addition to the base types of atomic data values, there are types associated with constructed types, such as a pointer to an int or an array of booleans. A semantic checker thus works with type expressions, which can be defined inductively as follows:

A basic type is a type expression.
A type created by applying a type constructor to one or more types is a type expression.

As we saw last time, though, the universe of type expressions is bigger than this. In addition to type constructors, some languages enable programmers to create type names or to use type variables when specifying types.

Trees are a useful representation for type expressions. Here are a few:

pictures of type expressions constructed out of other types

Type Checking Functions

The one structured type expression required by the Klein language spec is the function type. This type plays a role in checking recording the types associated with function definitions and function calls. Because functions can take different numbers of arguments, we can define the function type more simply if we define a tuple type, too.

The type rule we need to type-check a function definition looks like this:

if type of body == type of return type
then
  type of function def = function(parameters type, return type)
else
  type of function def = error

Assigning a type to a function call requires comparing the type of the arguments to the type expected by the function:

if type of function def = function(s, t)
   and
   type of argument list = s
then
   type of function call = t
else
   type of function call = error

Even with this simple and incomplete idea of functions, we can explore many of the interesting issues involved in type-checking functions.

To allow functions of more than one argument, we generalize the s on the left hand side of the function type to be a tuple. You need this for Klein.
To allow higher-order functions, we allow one or both of:
- one of the types in the tuple to be a function type
- the type on the right hand side of the function type to be a function
For example, we could specify the type of function like Racket's map procedure as
```
(s→t, list(s)) → list(t)
```
Algol, a statically-typed language, had this feature — way back in 1960! C has this feature in a more roundabout fashion, through its use of function pointers.

Klein does not allow higher-order functions. (Can you find evidence for this in the language spec?)

Type expressions such as these mean that determining whether two types are equal to one another is more complex than comparing to unitary values. We have to compare the parts of a constructed type expression by recursively comparing their structures.

A Look at a Type Checker in Action

Even a simple language can benefit from this sort of semantic analysis. Let's consider the semantic checker in the Fizzbuzz compiler we first saw in Week 1. It doesn't propagate types up through a tree of expressions, but it will give you a sense of how straightforward the code to check a program's semantics can be.

Note a few things:

This type checker does not annotate the AST nodes. It needs only to ensure that each node meets the semantic requirements of the language.
Each node is a compound expression that is checked based on the values of its parts. The value returned for the list of assignments is the conjunction of the values of each individual assignment.
The type checker does build a symbol table, for use by the code generator. The checker itself uses the symbol table to check on the uniqueness of the program's words.

Quick exercise: Add a check to determine whether the same number is used in two different assignments. We should probably at least warn the programmer about such a case.

Wrap-Up

Can you believe that there is still a whole lot more to learn just about type checking, let alone semantic analysis more generally? There is. However, let's move on. We have a lot to learn about run-time systems and code generation!

Required Reading for Next Session

To prepare for our transition to code generation, work through this introduction to the TM simulator before Session 18.

If you do not have Linux or Unix running on your own computer, you can do this work on the student.cs.uni.edu server.

If you have any questions or need any help, please let me know.

Optional Readings

If you would like to read about type checking in more detail, check out this bonus reading on type checking a small language. The language in this reading has more features than Klein, so you get to see what type rules for other kinds of statement and expression might look like. The reading defines a complete type system for the language — and even has an exercise for you to do for practice!

Session 17 Type Expressions and Type Checking