Session 17 Bonus Reading
Type Checking a Little Language

Introduction

Languages more complex than Klein have many more constructs. Working through how to type-check other kinds of expressions may help you think about type checking more generally and about type checking a simple language like Klein. This bonus reading let's you do that on your own.

If you have any questions, be sure to ask.

Type Checking a Small Language

Consider this simple language, which generates programs that consist of any number of declarations followed by a single expression:

P → D ; E
D → D { ; D }
  | id : T
T → char
  | integer
  | array [num] of T
  | T
E → literal
  | num
  | id
  | E mod E
  | E [ E ]
  | E

This language has two basic types, char and integer, and two constructed types, arrays and pointers.

Each array has the index set [0..num-1], where num is the declared size of the array. Expressions include the integer operation mod, array dereferencing, and pointer dereferencing.

In this language, all identifiers are declared prior to being used. Here are two programs generated by the grammar:

year: integer;          a: integer;
year mod 1970           b: char;
                        c: array [10] of integer;
                        d: ↑integer;
                        c[d↑] mod c[a]

A type checker for this language can first build type expressions for each declared identifier and then compute the type of the program's expression. We can implement the semantic actions needed to type-check a program in separate arms of the type checker code.

Let's consider declarations first. These actions require the program to record basic types and assemble constructed types from their components:

P → D ; E
D → D ; D
  | id : T               addType(id.value, T.type)
T → char                 T.type ← char
  | integer              T.type ← integer
  | array [num] of T1    T.type ← array([0..num.value-1], T1.type)
  | T1                  T.type ← pointer(T1.type)

Now let's consider the types of expressions. For literal values, the types are basic types:

E → literal              E.type ← char
E → num                  E.type ← integer

Identifiers have values associated with them in the symbol table:

E → id                   E.type ← lookupType(id.value)

Types for the three remaining kinds of expressions must be computed. Because they must be computed from parts that have specific types, there is the possibility of a type error. For example, the integer operation mod requires integer arguments.

E → E1 mod E2            E.type ←
                          if E1.type = integer and
                              E2.type = integer
                          then
                              integer
                          else
                              type error

Similarly, array indices much be integers. At this point, we don't care about the value of the index (is it in the index set?), only that the types match up.

E → E1 [ E2 ]            E.type ←
                          if E1.type = array(s, t) and
                              E2.type = integer
                          then
                              t
                          else
                              type error

Finally, pointer dereferencing works only for pointer types, and returns the pointed-to type:

E → E1                 E.type ←
                          if E1.type = pointer(t)
                          then
                              t
                          else
                              type error

That's pretty much how it works. Not too bad!

A Quick Exercise

Add to our simple language:
  • a boolean data type
  • a comparison operation "E < E"
  • logical operations "! E" and "E && E"

Check out the solution below.

Type checking expressions really is quite simple, a matter of verifying argument types and setting result types. At the top level, it is straightforward structural recursion. At the bottom level, it is straightforward selection.

Type Checking Statements

Consider this change and addition to our simple language, which introduces statements:

P → D ; S

Sid := E
  |  if E then S
  |  while E do S
  |  S { ; S }

Not much new is required. The same techniques that type-check expressions also work for type-checking statements. In most languages, statements do not really have types, or need them. We can assign types to statements if we wish, but more common in procedural languages is not to do so, by assigning a custom type surrogate: void.

Exercise Solution

Here are the changes we should make, along with the corresponding type-checking actions:

T → ...
  | boolean            T.type ← boolean
E → ...
  | E1 < E2            E.type ←
                        if E1.type = integer and
                           E2.type = integer
                        then
                            boolean
                        else
                            type error
  | ! E1               E.type ←
                        if E1.type = boolean
                        then
                            boolean
                        else
                            type error
  | E1 && E2           E.type ←
                        if E1.type = boolean and
                           E2.type = boolean
                        then
                            boolean
                        else
                            type error