Session 13
An Application of Structural Recursion: Variable Binding
Quick Review of Homework 5
Problems
2,
3,
and
4
create operations for X-lists that we could implement
over flat lists using map and apply.
The
mutual recursion pattern
makes them almost as easy to implement over the nested lists.
The solution to
prefix->infix
is so very simple. Follow the data structure...
Congratulations! You have written your first translator. This function converts a legal program in one form into a legal program in another form. At its heart, this is what a compiler does.
The input to prefix->infix is a legal Racket
program. The output from prefix->infix is a
legal Python program. We can use the same idea to write
another prefix expression translator
that translates Racket expressions into legal programs in
languages like Forth and Joy. (We will see Joy again later
in the semester).
Fortunately, most flyers with tabs are not.
Where Are We?
For the last few sessions, we have been discussing different techniques for writing recursive programs, all based on the fundamental technique of structural recursion. Last time, we introduced a new topic in the study of programming languages: the static properties of variables. That included the definition of a little programming language that will serve as our testbed for studying the topic.
Some Racket functions that you may think of as having only bound variables also contain free variables.
Consider this Racket function. It is not a combinator.
(define (sum-of-squares x y)
(+ (* x x) (* y y)))
;; or
(define sum-of-squares
(lambda (x y)
(+ (* x x) (* y y))))
Why? Because + is a free variable! You can
verify this for yourself by capturing the variable reference
in a lambda:
(define capture+
(lambda (+)
(lambda (x y)
(+ (* x x) (* y y)))))
What is the value of ((capture+ /) 3 4)?
Understanding the idea of bound and free variables — a static feature of Racket programs — helps us understand why Racket operators work as they do.
A Static Analysis Problem: Bound Variables
As we learned last time, if a program feature is static, then its value can be determined by looking at the text of a program. A person can look at the code, of course, but what about another program? The text of a program is data, so we ought to be able to give the text as input to another program that determines the value of a static feature. This is just what compilers, type checkers, IDEs, and all sorts of other programming tools do: examine a program to extract its static features.
Today, we use our techniques for writing recursive programs to write a program that processes programs in our little language. Our task is straightforward:
Does a variable occur bound in a given piece of code?
Our function will take a program as input. The input program will be written in the little language we saw last time:
<exp> ::= <varref>
| (lambda (<var>) <exp>)
| (<exp> <exp>)
Let's formalize our task so that we know our goal more clearly:
(occurs-bound? v exp)
that answers this question:
Does a given variable reference var
occur bound in expression exp
from the little language?
Writing this function will help us in at least two ways:
- It will help us to understand the definitions of free and bound variables more deeply, by seeing the definitions come alive in a piece of code that implements them. The working code will also be a testbed for experimenting with the little language.
- It will help us to see how the same recursive programming techniques we've been learning can be used to process a program written in some language.
When we write programs to process other programs, we see quickly why knowing how to write recursive programs is so important: Programming language specifications are almost always highly inductive!
Before writing our function, let's give ourselves a couple of useful tools.
Tool 1: Formal Definitions of Free and Bound Variables
Last time, we learned the terms occurs bound and occurs free. A variable "is bound" or "occurs bound" in an expression if it refers to the formal parameter in an expression that contains it. A variable reference "is free" or "occurs free" in an expression if it occurs but is not bound.
To write code that implements these definitions, though, it would be helpful if we had more formal definitions of occurs free and occurs bound. Because our language definition is inductive, we can give these terms inductive definitions, too.
First, occurs bound:
v occurs bound in an expression
exp if and only if:
-
expis of the form(lambda (var) body)and either-
voccurs bound inbody, or -
voccurs free inbodyandvis the same asvar.
-
-
expis of the form(exp1 exp2)andvoccurs bound in eitherexp1orexp2.
Now, occurs free:
v occurs free in an expression
exp if and only if:
-
expis a variable reference and is the same asv -
expis of the form(lambda (var) body),vis different fromvar, andvoccurs free inbody. -
expis of the form(exp1 exp2)andvoccurs free in eitherexp1orexp2.
With these definitions, we are better prepared to write Racket code that implements our function.
Tool 2: Syntax Procedures for the Little Language
But wait... How will we write code to determine if, say:
expis of the form(lambda (var) body)
is true?
We could include in our function code to verify that
exp...
- is a list of size three,
- whose first item is the symbol
lambda, - whose second item is a list of one symbol, and
- whose third item is a legal expression in the language.
That is a four-way and expression, with the
first and third parts requiring compound expressions
themselves.
It should be easy to see how such code would obscure our definition of what it means to "occur bound" and make it much harder to read!
Indeed, we will be using Racket lists to represent two
different kinds of expression. Some lists denote
lambda expressions. Other lists denote
applications of functions to arguments.
This will require us to use many cars and
cdrs, or firsts and
rests, or seconds and
thirds to access parts of the data.
What's worse, they will mean different things in the
different parts of the same function!
If lambda expressions in the little language
were a Racket data type, though, we would expect to find
a function named lambda? that let us check to
see if some expression were a lambda expression or not.
The grammar of our little language is data type, and it begs us to use the Syntax Procedures design pattern. I ask you to read about this pattern for next time. For now, we will see it in action on our problem.
Before we begin to implement our solution, I have created these syntax procedures for our little language. There are three kinds of syntax procedure in the file:
- type predicates, which test whether an expression is a variable reference, a function, or an application,
- access procedures, or "accessors", which extract the parts of compound expressions (functions and applications), and
- constructor functions, or "constructors", which create an an object out of its parts.
These functions are not as unusual as they might at first seem.
-
We are used to Racket data types having type predicates
— for example,
symbol?,number?, andlist?. -
We have also seen that Racket provides access procedures
for its data structures: for example,
carandcdr,firstandrest, andvector-ref. -
Finally, we have also seen that Racket provides constructors
for its data structures, such as
consfor pairs andlistfor lists.
I have simply defined analogous functions for our data type, the syntax of the little language.
These procedures allow us to write occurs-bound?
in terms of the little language, rather than in terms
of Racket's cars and cdrs,
firsts and rests. It lets us think
only about the problem spec and the language, not the
underlying implementation. The difference in the code we
write will be noticeable.
Implementing occurs-bound?
Finally, we are ready to begin writing
occurs-bound?.
Growing the Code
As always, we base our function on the inductive definition of the data type it manipulates. An expression in the language can be one of three alternatives. Following the Structural Recursion pattern, our function will make a three-way choice, with one arm in the function for each arm in the definition.
Let's add a fourth case, error, to handle
invalid expressions...
We can use a cond expression here, instead
of an if, to simplify the layout of our code:
(define (occurs-bound? s exp)
(cond ((varref? exp) ;; handle a variable reference )
((app? exp) ;; handle an application )
((lambda? exp) ;; handle a lambda expression )
(else (error 'occurs-bound? "invalid expression ~a" exp)) ))
I swapped the order for handling applications and
lambdas because the definition of
"occurs bound?" is simpler in the application case than in
the lambda case. Putting base cases and other
simple cases at the top of a function makes it easier to
read. I also like doing this because it encourages me solve
the easier cases first.
Handling variable references is easy. Our definition says, No variable occurs bound in an expression consisting of a single variable reference, so:
(define (occurs-bound? s exp)
(cond ((varref? exp)
#f)
((app? exp) ;; handle an application )
((lambda? exp) ;; handle a lambda expression )
(else (error 'occurs-bound? "invalid expression ~a" exp)) ))
How can a variable occur bound in a function application?
The application itself doesn't bind a variable; it is
simply a list of two expressions. The definition says
s can occur bound in an application only if
it occurs bound either in the function expression or in
the argument expression:
(define (occurs-bound? s exp)
(cond ((varref? exp)
#f)
((app? exp)
(or (occurs-bound? s (app->proc exp))
(occurs-bound? s (app->arg exp))) )
((lambda? exp) ;; handle a lambda expression )
(else (error 'occurs-bound? "invalid expression ~a" exp)) ))
The toughest case is the lambda expression.
s can occur bound in a lambda in
two different ways. s can occur bound within
the body of the lambda OR it can occur
free in the body and be the same as the formal parameter of
the lambda expression.
(define (occurs-bound? s exp)
(cond ((varref? exp)
#f)
((app? exp)
(or (occurs-bound? s (app->proc exp))
(occurs-bound? s (app->arg exp))) )
((lambda? exp)
(or (occurs-bound? s (lambda->body exp))
(and (eq? s (lambda->param exp))
(occurs-free? s (lambda->body exp)))))
(else (error 'occurs-bound? "invalid expression ~a" exp)) ))
The occurs-free? Function
Notice that the definition of occurs-bound?
calls occurs-free?. This is another example
of mutually recursive functions. Here, though,
the mutual recursion results not from two data definitions
that are mutually inductive, but because the definitions
for the two terms are themselves mutually
inductive!
In order to test this solution, we need to define
occurs-free?, too. I've done that for you
and included the function in the code download for today.
However, try to write occurs-free? on your own
first before you read it:
(occurs-free? v exp)
that answers this question:
Does a given variable reference var occur free
in expression exp from the little language?
Doing so will give you some practice doing what we have just done. Then look at my solution, compare them, and make sure you understand any differences.
Things to Notice
There are several things to notice about
occurs-bound?:
- Notice how the use of structural recursion made this code relatively easy to write. It told us which cases to consider and, when we are considering each, we don't have to think about the other two cases at all.
-
Notice how the use of syntax procedures made this
code relatively easy to write. They enabled us
to program using the same terms that are used in the
definitions. While writing the function, we had to think
only of the definition of bound variables; we didn't have
to worry about which sequence of
firsts andrests to use in order to manipulate the underlying list implementation.
Furthermore, if we decide to change the underlying representation of programs to some other data structure, we won't need to modify this code at all. We will need only to write syntax procedures for the new representation. -
Notice, too, how the use of syntax procedures makes this
code relatively easy to read. We can read it in
much the same way as we read
the prose definition of occurs bound?.
Understanding the code requires as little reliance on the
syntax of Racket lists as possible, because it follows our
language grammar and the definition of our terms to a tee.
This is an example of how using a program to describe a concept can be just as clear as a prose definition, if not clearer. And, because it is executable, we can verify that it is unambiguous by running the tests!
Today's zip file includes
source code
for occurs-bound? and
occurs-free?.
Play with these functions, both to be sure you understand
how to write such code and also to be sure you understand
the ideas of free and bound variables. For example...
A Study Question for Quiz 2
occurs-free? and
occurs-bound? inverses of one another?
If they are inverses then,
for a single expression exp:
-
If
(occurs-bound? exp)is true, then(occurs-free? exp)is always false. -
If
(occurs-bound? exp)is false, then(occurs-free? exp)is always true.
Unbound Variables
I said last time that we cannot evaluate an expression
containing a completely free variable, because
at run-time, the variable needs to have a binding.
Such a free variable needs to be bound within an enclosing
expression or at the "top-level". Racket primitives are
like that. Symbols such as first and
+ are free in our expressions, but they are
bound to their primitive values at the top level of the
REPL. +
By the way... How do you think that works?
Technically, my statement is not quite true. We can evaluate an expression that contains a free variable — as long as the variable is never evaluated. How could that happen?
Here are two trivial examples:
> (if (zero? 0) 1 foo) 1 > (and #f foo) #f
foo is unbound, but it will never be evaluated.
The value of this if expression is always
1,
and the evaluation rule for the special form if
never evaluates foo. This works even when
foo is not bound at the top level.
Let's try a bit of Racket mental gymnastics. Can you create an expression that:
- doesn't use a conditional,
- contains an unbound variable, and
- whose value is not affected by the value of the unbound variable?
Hint: how can lambda help?
If you are curious, check out this optional short reading that works through one idea.
Wrap Up
-
Reading
-
Study these notes, especially the section in which we
build the
occurs-bound?function. - Read a mini-lecture on syntax procedures. This idea may seem natural to you, especially if you have experience with object-oriented programming. It is an important element of data abstraction in many styles. Syntax procedures are also especially helpful in making Racket programs more readable!
-
Study these notes, especially the section in which we
build the
-
Homework
- Homework 5 was due yesterday.
- Homework 6 will be available soon and due on Monday. It gives you more practice writing recursive programs and working with expressions in the little language. Ask questions early, so you can finish in time to begin studying for the next quiz.