Session 21
Creating New Syntax
or... Racket: A Language for Making Languages
Introduction
When teaching you a new language, especially one that is very different from the languages you already know, professors often try to convince you that you can do all the things you are used to doing in the languages you know — Python, Java, ...
But then you may wonder, why do I need a new language at all?
What makes Racket different? Compelling? Why learn it?
We have seen higher-order functions and the idea that code==data, but there is more.
This is the story... of how Lisp-like languages really do come from a different place, and how they are inspiring the designers of other languages.
This session is about ideas. The code I show you is to illustrate those ideas. I certainly won't ask you to write code like this on Quiz 3. But please try to understand the ideas.
The Set-Up
In Unit 3, we have written functions that translate a syntactic abstraction into a core form. We extended this into into a preprocessor for the little language we've been studying.
In Unit 4, you will write a preprocessor and an evaluator for a larger language. Racket has a preprocessor and an evaluator, too.
Throughout the course, we have also seen that Racket exposes its machinery to us in ways that other languages usually do not. We can add new functions and operators to the language, thus affecting how the evaluator works. What would it be like to add syntax to the language, thus affecting how the preprocessor works?
Example 1: A Python-Like for
Loop
Instead of writing many for
loops, Python programmers
write list comprehensions. For example:
roots = [sqrt(i) for i in range(0, 10)]
is equivalent to this Python for
loop:
roots = [] for i in range(0, 10): roots.append(sqrt(i))
More generally, we can think of the for
loop as:
for var in lst: expression-using-var
It might be nice to add a Python-like for
loop to
Racket, such as:
(for i in (range 0 10): (sqrt i))
or, more generally:
(for var in lst: exp-using-var)
Side note: Racket already has
a fine set of for
-loops.
They have many different features, depending on our needs. This
is just a simple example for us to explore, and to see how those
loops work.
This functional loop is equivalent to a Racket map
expression:
(map (lambda (i) (sqrt i)) (range 0 10))
This an example of a syntactic abstraction. We can write code to translate the abstraction into a core form:
(for var in lst : =====> (map (lambda (var) exp-using-var) exp-using-var) lst)
Opening Exercise: Make It So

for-to-map
.
for-to-map
takes as input an expression of the form:
(for <var> in <lst> : <exp>)and returns an expression of the form:
(map (lambda (<var>) <exp>) <lst>)
for
, in
, and :
are all symbols.
For example:
> (for-to-map '(for i in lst : exp)) '(map (lambda (i) exp) lst) > (for-to-map '(for n in (range 0 10): (sqrt n))) '(map (lambda (n) (sqrt n)) (range 0 10))
Note: The input is always a list of size 6, and the output is
always a list of size 3. All you need are the list
function and a few list accessors.
Implementing for-to-map
We can write a simple list-to-list translator that converts
the for
loop to an equivalent map
expression:
(define (for-to-map for-exp) (let ((var (second for-exp)) (lst (fourth for-exp)) (exp (sixth for-exp))) (list 'map (list 'lambda (list var) exp) lst)))
This code handles only the surface syntax of the new form. To add it to the language, we'd have to recursively translate the two sub-expressions. But this simple function alone demonstrates the idea of translational semantics and reminds us how easy it can be to convert a simple syntactic abstraction into an equivalent core form.
This enables us to pass in code with Python's for
syntax and produce executable Racket code, as a Racket list.
We can do this! We have the technology — and you have the knowledge to write the preprocessor. Racket's simple, parenthesized syntax helps us here.
But modifying Racket's preprocessor might be a bigger challenge than modifying the preprocessor for our little language. This seems risky (what if we break it?) and potentially quite difficult (how big is the Racket preprocessor?).
If only we could build this process into the language somehow: remove the friction, and let Racket do most of the work.
We can.
Implementing for-to-map
as Racket Syntax
Racket gives us a better option. The syntax-rules
operator enables us to define patterns of the form:
pattern → expansion
and add them to Racket's preprocessor.
Here is the for
-to-map
"transformer"
we wrote as a Racket function written using a new operator,
syntax-rules
:
(define-syntax for (syntax-rules (in :) ( (for var in lst : exp) (map (lambda (var) exp) lst) ) ))
So easy. So powerful. And relatively clear, even if you have
never seen the syntax-rules
operator before. Look
at the two patterns...
This does more than translate surface syntax in the form of a Racket list; it enables the Racket language processor to expand the expression in place and execute the result:
> (for i in (range 0 10): (sqrt i)) '(0 1 1.4142135623730951 ... 2.8284271247461903 3)
syntax-rules
lets us write a
syntax transformer that translates (or expands)
a syntactic abstraction into a core expression. Historically,
and in many other languages, such transformers are called
macros.
Notice, though: This happens before run-time:

We can see the result of preprocessing the for
expression away using the expand-once
operator:
> (expand-once #'(for i in (range 0 10): (sqrt i))) (map (lambda (i) (sqrt i)) (range 0 10))
The map
expression is the code that is passed on to
the evaluator.
Other languages have preprocessors, too. For example, C's
preprocessor provides operators such as include
,
ifndef
, and define
. The preprocessor
does a simple text replacement of the macro pattern with its
expansion.
Lisp — Racket's grandparent — offered that and more, though also at a lower level than Racket.
This is what I mean when I say that Racket is language for making languages. It gives us operators that define syntax at the level of the code we want to be able to write.
You can find both the for-to-map
function and the
for
macro in
this file.
Example 2: A Wordy if
Expression
Now, let's try something more practical for us to use.
Back in Session 4, we wrote an if
expression to solve
the opening exercise:
(if (>= student-grade 0.90) 'A (if (>= student-grade 0.80) 'B (if (>= student-grade 0.70) 'C (if (>= student-grade 0.60) 'D 'F))))
We were just learning to write Racket expressions, so this was
good practice. With a cond
expression, we can write
something a bit shorter:
(cond ((>= student-grade 0.90) 'A) ((>= student-grade 0.80) 'B) ((>= student-grade 0.70) 'C) ((>= student-grade 0.60) 'D) (else 'F))
That's better, but... still wordy. Many languages include a
case
statement that switches on a single variable.
Racket does, too:
(case transaction ('withdraw withdraw) ('deposit deposit) ('balance balance) (else error))
Unfortunately for us, Racket's case
looks for an
exact match, so it can't help us with our grade evaluator.
What we'd like to write is something like this:
(range-case student-grade ((>= 0.90) 'A) ((>= 0.80) 'B) ((>= 0.70) 'C) ((>= 0.60) 'D) (else 'F))
and have it generate the if
expression for us.
What can we do? After the last few weeks, we know how to write
code that translates a range-case
expression into an
equivalent cond
or if
expression. But
we now know that we do not have to add an arm to the Racket
preprocessor.
Racket adopts a different approach: it lets the programmer instruct the preprocessor by defining a new special form. We have seen several of Racket's primitive special forms:
-
Some (
define
,quote
,if
) have syntax that looks just like calling a function, each with its own evaluation rule. -
Others (
lambda
,let
,letrec
) have what appears to be a new syntax.
Racket gives us operators to define new syntax.
syntax-rules
is one. Let's use it.
Implementing a range-case
Expression
Again, we can use the function expand-once
to see
how the Racket's preprocessor translates the abstraction into
its core form:
> (expand-once #'(range-case taxable-income ((<= 12000) '( 0 0.044 0.00)) ((<= 60000) '( 12000 0.0482 528.00)) ((<= 150000) '( 60000 0.057 2841.60)) (else '(150000 0.06 7971.60)))) (if (<= taxable-income 12000) '(0 0.044 0.0) (range-case taxable-income ((<= 60000) '(12000 0.0482 528.0)) ((<= 150000) '(60000 0.057 2841.6)) (else '(150000 0.06 7971.6))))
A Wrinkle
This approach works great if we are choosing a value based on a
single value, such as an identifier. But if id
is a
compound expression, it will be repeated throughout the generated
code — and this evaluated multiple times. Can we do better?
Yes! We can evaluate the key expression once and bind its value
to a new local variable, to save recomputation. See the new
version of range-case
at the bottom of the source
file linked above. This special form uses the original
range-case
to do the recursive work. Most important,
Racket guarantees to use a local variable name that does not
collide with any name in the range-case
expression.
This is good hygiene.
Racket Macros
Racket enables us to define pattern → expansion templates as new special forms. To support complex forms:
- It allows the use of an ellipsis to describe compound patterns.
- It allows one special form to expand to another special form.
- It even allows a syntax rule to be recursive.
And keep in mind: this is all happening before run-time.
Implementing a Different range-case
Expression
What if we decide we want a more verbose syntax, such as:
((0.90 1.00) 'A) ((0.80 0.90) 'B) ...
This would allow for non-sequential and overlapping ranges.
We can do that.
This solution
defines range-case
to use a different pattern and a
different expansion template.
Change the pattern, change the translation, BOOM! A new special form.
Don't worry about the details of the code. We won't be defining our own syntax this semester. But please note: This is just Racket code. We are using the language we are writing in to extend the language we are writing in — on the fly.
Macros in Other Languages
Other languages have macros. What languages with macros are you likely to encounter?
Old-Style Macros
C and assembly language have rudimentary macro systems, implemented as text-based preprocessors. The C preprocessor works by simple textual search-and-replace at the token level, rather than the character level. This allows some powerful forms of conditional processing, but working at the token level creates problems. If you are interested in learning more, check out the bonus reading for today.
If you publish research papers in CS, you might use a tool named LaTeX. TeX is a computer typesetting system written by Donald Knuth in the 1970s and 1980s. LaTeX is a derivative of TeX, with most of its functionality implemented as macros in TeX.
Macros at this low level are hard to work with, are error prone, are not always as powerful as we'd like.
Embeddable Languages
PHP is one of the most common languages used on the web. PHP
programs are embedded into HTML files. The PHP processor
recognizes code fragments using the markers [?php
and ?]
and executes the code to modify or extend
the HTML.
More generally, programs in embeddable languages can be embedded in free-format text, or in the source code of other languages. This is similar to a textual macro language, but embeddable languages are usually much more powerful, as they are full-featured programming languages. (Racket comes with Scribble, a tool for for writing documents that allow Racket expressions as embedded programs.)
Our final example today is an embeddable language.
Modern Macro Systems
Among programming languages created in the last decade or so, Rust and Elixir stand out for their hygienic syntax-level macro systems. Rust excels at systems programming, while Elixir has found a niche in web development. Both languages have borrowed ideas from Racket and adapted them to the specific syntax and capabilities of those languages.
Even among these new languages with modern macro systems, though, Racket stands out.
Creating a New Language in Racket
We have just seen that Racket lets us modify its expander, the preprocessor that translates syntactic sugar into core expressions.
If we could also modify Racket's reader, we could define an entirely different language using Racket.
We can.
Racket is a language-making language. It treats languages as libraries to be loaded, mixed, and matched.
This is one example.
Matthew Butterick is a lawyer, a programmer, and a typographer. Then he decided to write a book called Practical Typography. He could have used Word or Latex, but neither gave him the flexibility or even the power he wanted. As a programmer, he knew he didn't have to settle for other people's tools. So he went looking for programming languages to use. Nothing seemed quite right.
Then he discovered Racket. Racket is a language-making language, so he decided to create his own publishing system, which became an entire language within Racket: Pollen.
Note: You will need to install Pollen to run this code.
Use the menu command File | Install Package...
.
Type pollen
into the Package Source
box
and click Install. When it's done, relaunch DrRacket.
Remember: All the reading and expanding happens before run-time.
Now we can write Pollen files, er, programs, and run them in Racket. Butterick has written two books using Pollen, at the same time creating wonderful web sites from the same source code:
If you want to learn more about how to make languages such as Pollen, check out Beautiful Racket. It's a very good book.
This is one example of a document language written in Racket. In Session 28, we will see a programming language written in Racket — one that doesn't use Racket's parenthesized prefix notation!
Wrap Up
-
Reading
- Review these lecture notes. Peek at the code for the session. Pay more attention to the ideas than the details, unless you really want to go deeper. (In which case, let me know!)
- If you like the ideas in this session and would like to see more, check out this short reading with associated code examples.
-
Homework
- Homework 8 was due yesterday. Our next assignment will come next week, after...
-
Quiz
-
Quiz 3 is next session. This unit is shorter than the
previous two. That is good, because the exam can focus on
a smaller number of ideas:
- the idea of syntactic abstraction
- several syntactic abstractions and their translations
- the implication of syntactic abstraction for the implementation of compilers
- the idea of a lexical address
- the implementation of lexical addressing in code [ Part 1 | Part 2 ]
-
Quiz 3 is next session. This unit is shorter than the
previous two. That is good, because the exam can focus on
a smaller number of ideas: