Program Derivation
Increasing Efficiency Through Program Derivation
Our
original definition of subst
in Session 10 was somewhat confusing — both to read
and to write. We then saw that following the BNF and
using
mutual recursion
made the code easier to write and easier to understand. This ease
comes, however, at the cost of extra function calls.
How so?
Notice: we now make two function calls each time the
first
of the s-list is an s-list: one to
subst-symbol-expr
, and then an immediate return call
to subst
. Such "double dispatch" can be expensive
on a large dataset.
Sometimes, the run-time costs introduced by mutual recursion outweigh the program-time and read-time benefits of the separate functions. Can we modify our definition without losing too many of its benefits?
We can use Racket's substitution model to get back to a single function. Our solution currently looks like this:
(define (subst new old slist) (if (null? slist) '() (cons (subst-symbol-expr new old (first slist)) (subst new old (rest slist))))) (define (subst-symbol-expr new old symexp) (if (symbol? symexp) (if (eq? symexp old) new symexp) (subst new old symexp)))
We can substitute the definition of subst-symbol-expr
into subst
, using the standard rules from the
substitution model. This is exactly what the Racket interpreter
will do at run-time. First, we substitute the lambda
in place of the name:
(define subst (lambda (new old slist) (if (null? slist) '() (cons ( (lambda (new old symexp) ;; (if (symbol? symexp) ;; Here (if (eq? symexp old) ;; is new ;; the symexp) ;; first (subst new old se))) ;; substitution. new old (first slist)) (subst new old (rest slist))))))
Next, we replace the application of the lambda
with
the body of the lambda
, substituting the arguments
for the corresponding formal parameters: new
for
new
, old
for old
, and
(first slist)
for symexp
:
(define subst (lambda (new old slist) (if (null? slist) '() (cons (if (symbol? (first slist)) ;; (if (eq? (first slist) old) ;; Here is new ;; the second (first slist)) ;; substitution. (subst new old (first slist))) ;; (subst new old (rest slist))))))
The result is a single function that behaves exactly like the two original functions. After all, all we did was to derive by hand the same result that the Racket evaluator will produce. So, provided that we made no errors in our derivation, the resulting function has the same functionality. Our unit tests can help us ensure that we haven't broken the code.
However, the new version is more efficient, because it eliminates the extra function calls. We hope that it is nearly as readable as the two-function version.
Take a closer look.
The derived function is not like
the single-function solution we wrote earlier.
That function repeated the expression
(subst new old (cdr slist))
several times, because we
worked through the details of every possible case. Using mutual
recursion followed by program derivation — letting Racket's
substitution model do some of the work for us — results in a
program with a single (subst new old (rest slist))
.
We can do this in Racket because the if
construct is
an expression that returns a value, not a statement. In many
languages, if
is a statement and returns no value.
A few, including Java and C++, have a "computed if" expression
that may let us do something like this. In Java, a "computed if"
is written as
<test> ? <then-value> : <else-value>
A Related Concept: Function Inlining
C++ has a concept that is similar in spirit to program derivation:
the inlining of member functions. The difference, though,
is that it is implemented by the compiler. When we declare a
function inline
, the compiler tries to replace all
calls to the function with equivalent code from the body of the
function.
For example, we may well use an accessor method x()
frequently when interacting with an object that has an
x-coordinate. By declaring the x()
method as
inline
, the compiler will replace the method call
with the equivalent code from the body of the function.
This enables the programmer to eliminate the overhead of extra function calls at run time, without obscuring the readability and design of our class.
Most other languages do not have an inline
keyword,
but their compilers often inline code aggressively as a way to
make programs more efficient. This is especially valuable in
languages that depend heavily on function calls, including Java
and functional programming languages.
Program derivation works like inlining, but it is a technique used by programmers to modify their code. (I can certainly imagine having a Racket compiler implementing program derivation automatically, thus saving the programmer the effort and risk of error!)
Final Note
We will use the program derivation technique occasionally to simplify the result of mutual recursion, and any other technique that introduces unwanted function calls that create undesirable inefficiency at run-time — but only when the cost of the extra function calls outweighs the benefits of separate functions.
A Closing Exercise: count-occurrences
In Session 10, we implemented a function named
count-occurrences
using mutual recursion. Our solution is in
this Racket file
from Session 10's zip file.
Our implementation of count-occurrences
has the same
"double dispatch" behavior as subst
. In this case,
it seems even more of a problem, given how simple the helper
function is.
Use program derivation to eliminate the
count-occurrences-symbol-expr
function from our
solution.
Do you like the result?
You can see my solution in this Racket file. (Control-click to download the file to your computer.)
You can practice program derivation on any code we implement using mutual recursion. Give it a try!