CS 3540 Session 21

Session 21
Creating New Syntax

or... Racket: A Language for Making Languages

Introduction

When teaching you a new language, especially one that is very different from the languages you already know, professors often try to convince you that you can do all the things you are used to doing in the languages you know — Python, Java, ...

But then you may wonder, why do I need a new language at all?

What makes Racket different? Compelling? Why learn it?

We have seen higher-order functions and the idea that code==data, but there is more.

This is the story... of how Lisp-like languages really do come from a different place, and how they are inspiring the designers of other languages.

This session is about ideas. The code I show you is to illustrate those ideas. I certainly won't ask you to write code like this on Quiz 3. But please try to understand the ideas.

The Set-Up

In Unit 3, we have written functions that translate a syntactic abstraction into a core form. We extended this into into a preprocessor for the little language we've been studying.

In Unit 4, you will write a preprocessor and an evaluator for a larger language. Racket has a preprocessor and an evaluator, too.

Throughout the course, we have also seen that Racket exposes its machinery to us in ways that other languages usually do not. We can add new functions and operators to the language, thus affecting how the evaluator works. What would it be like to add syntax to the language, thus affecting how the preprocessor works?

Example 1: A Python-Like `for` Loop

Instead of writing many for loops, Python programmers write list comprehensions. For example:

roots = [sqrt(i) for i in range(0, 10)]

is equivalent to this Python for loop:

roots = []
for i in range(0, 10):
    roots.append(sqrt(i))

More generally, we can think of the for loop as:

for var in lst:
    expression-using-var

It might be nice to add a Python-like for loop to Racket, such as:

(for i in (range 0 10):
  (sqrt i))

or, more generally:

(for var in lst:
  exp-using-var)

Side note: Racket already has a fine set of for-loops. They have many different features, depending on our needs. This is just a simple example for us to explore, and to see how those loops work.

This functional loop is equivalent to a Racket map expression:

(map (lambda (i)
        (sqrt i))
     (range 0 10))

This an example of a syntactic abstraction. We can write code to translate the abstraction into a core form:

(for var in lst :    =====>     (map (lambda (var)
     exp-using-var)                    exp-using-var)
                                     lst)

Opening Exercise: Make It So

a photo of Captain Picard from Star Trek: The Next Generation sitting in his chair on the bridge — Captain Picard says: "Make it so."

Write a function for-to-map.

for-to-map takes as input an expression of the form:

(for <var> in <lst> : <exp>)

and returns an expression of the form:

(map (lambda (<var>) <exp>)
     <lst>)

for, in, and : are all symbols.

For example:

> (for-to-map '(for i in lst : exp))
'(map (lambda (i) exp) lst)

> (for-to-map '(for n in (range 0 10):
                    (sqrt n)))
'(map (lambda (n) (sqrt n))
      (range 0 10))

Note: The input is always a list of size 6, and the output is always a list of size 3. All you need are the list function and a few list accessors.

Implementing `for-to-map`

We can write a simple list-to-list translator that converts the for loop to an equivalent map expression:

(define (for-to-map for-exp)
  (let ((var (second for-exp))
        (lst (fourth for-exp))
        (exp (sixth for-exp)))
    (list 'map
          (list 'lambda (list var) exp)
          lst)))

This code handles only the surface syntax of the new form. To add it to the language, we'd have to recursively translate the two sub-expressions. But this simple function alone demonstrates the idea of translational semantics and reminds us how easy it can be to convert a simple syntactic abstraction into an equivalent core form.

This enables us to pass in code with Python's for syntax and produce executable Racket code, as a Racket list.

... run for-to-map on a simple expression
... run the result expression as code

We can do this! We have the technology — and you have the knowledge to write the preprocessor. Racket's simple, parenthesized syntax helps us here.

But modifying Racket's preprocessor might be a bigger challenge than modifying the preprocessor for our little language. This seems risky (what if we break it?) and potentially quite difficult (how big is the Racket preprocessor?).

If only we could build this process into the language somehow: remove the friction, and let Racket do most of the work.

We can.

Implementing `for-to-map` as Racket Syntax

Racket gives us a better option. The syntax-rules operator enables us to define patterns of the form:

pattern → expansion

and add them to Racket's preprocessor.

Here is the for-to-map "transformer" we wrote as a Racket function written using a new operator, syntax-rules:

(define-syntax for
  (syntax-rules (in :)
    ( (for var in lst : exp)
        (map (lambda (var) exp) lst) )  ))

So easy. So powerful. And relatively clear, even if you have never seen the syntax-rules operator before. Look at the two patterns...

This does more than translate surface syntax in the form of a Racket list; it enables the Racket language processor to expand the expression in place and execute the result:

> (for i in (range 0 10):
    (sqrt i))
'(0
  1
  1.4142135623730951
  ...
  2.8284271247461903
  3)

syntax-rules lets us write a syntax transformer that translates (or expands) a syntactic abstraction into a core expression. Historically, and in many other languages, such transformers are called macros.

Notice, though: This happens before run-time:

a graph showing the read-preprocess-evaluate pipeline NEED ALT TEXT — a graph showing the read-preprocess-evaluate pipeline

We can see the result of preprocessing the for expression away using the expand-once operator:

> (expand-once #'(for i in (range 0 10):
                    (sqrt i)))
(map
  (lambda (i) (sqrt i))
  (range 0 10))

The map expression is the code that is passed on to the evaluator.

Other languages have preprocessors, too. For example, C's preprocessor provides operators such as include, ifndef, and define. The preprocessor does a simple text replacement of the macro pattern with its expansion.

Lisp — Racket's grandparent — offered that and more, though also at a lower level than Racket.

This is what I mean when I say that Racket is language for making languages. It gives us operators that define syntax at the level of the code we want to be able to write.

You can find both the for-to-map function and the for macro in this file.

Example 2: A Wordy `if` Expression

Now, let's try something more practical for us to use.

Back in Session 4, we wrote an if expression to solve the opening exercise:

(if (>= student-grade 0.90)
    'A
    (if (>= student-grade 0.80)
        'B
        (if (>= student-grade 0.70)
            'C
            (if (>= student-grade 0.60)
                'D
                'F))))

We were just learning to write Racket expressions, so this was good practice. With a cond expression, we can write something a bit shorter:

(cond ((>= student-grade 0.90) 'A)
      ((>= student-grade 0.80) 'B)
      ((>= student-grade 0.70) 'C)
      ((>= student-grade 0.60) 'D)
      (else 'F))

That's better, but... still wordy. Many languages include a case statement that switches on a single variable. Racket does, too:

(case transaction
  ('withdraw withdraw)
  ('deposit  deposit)
  ('balance  balance)
  (else      error))

Unfortunately for us, Racket's case looks for an exact match, so it can't help us with our grade evaluator. What we'd like to write is something like this:

(range-case student-grade
  ((>= 0.90) 'A)
  ((>= 0.80) 'B)
  ((>= 0.70) 'C)
  ((>= 0.60) 'D)
  (else 'F))

and have it generate the if expression for us.

What can we do? After the last few weeks, we know how to write code that translates a range-case expression into an equivalent cond or if expression. But we now know that we do not have to add an arm to the Racket preprocessor.

Racket adopts a different approach: it lets the programmer instruct the preprocessor by defining a new special form. We have seen several of Racket's primitive special forms:

Some (define, quote, if) have syntax that looks just like calling a function, each with its own evaluation rule.
Others (lambda, let, letrec) have what appears to be a new syntax.

Racket gives us operators to define new syntax. syntax-rules is one. Let's use it.

Implementing a `range-case` Expression

Take a look at a solution for range-case.
Study the parts of the macro.

Again, we can use the function expand-once to see how the Racket's preprocessor translates the abstraction into its core form:

> (expand-once #'(range-case taxable-income
                   ((<=  12000) '(     0 0.044     0.00))
                   ((<=  60000) '( 12000 0.0482  528.00))
                   ((<= 150000) '( 60000 0.057  2841.60))
                   (else        '(150000 0.06   7971.60))))

(if (<= taxable-income 12000)
    '(0 0.044 0.0)
    (range-case taxable-income
      ((<= 60000)  '(12000 0.0482 528.0))
      ((<= 150000) '(60000 0.057 2841.6))
      (else        '(150000 0.06 7971.6))))

A Wrinkle

This approach works great if we are choosing a value based on a single value, such as an identifier. But if id is a compound expression, it will be repeated throughout the generated code — and this evaluated multiple times. Can we do better?

Yes! We can evaluate the key expression once and bind its value to a new local variable, to save recomputation. See the new version of range-case at the bottom of the source file linked above. This special form uses the original range-case to do the recursive work. Most important, Racket guarantees to use a local variable name that does not collide with any name in the range-case expression. This is good hygiene.

Racket Macros

Racket enables us to define pattern → expansion templates as new special forms. To support complex forms:

It allows the use of an ellipsis to describe compound patterns.
It allows one special form to expand to another special form.
It even allows a syntax rule to be recursive.

And keep in mind: this is all happening before run-time.

Implementing a Different `range-case` Expression

Note: We did not cover this in class.

What if we decide we want a more verbose syntax, such as:

((0.90 1.00) 'A)
((0.80 0.90) 'B)
...

This would allow for non-sequential and overlapping ranges.

We can do that. This solution defines range-case to use a different pattern and a different expansion template.

Change the pattern, change the translation, BOOM! A new special form.

Don't worry about the details of the code. We won't be defining our own syntax this semester. But please note: This is just Racket code. We are using the language we are writing in to extend the language we are writing in — on the fly.

Macros in Other Languages

Other languages have macros. What languages with macros are you likely to encounter?

Old-Style Macros

C and assembly language have rudimentary macro systems, implemented as text-based preprocessors. The C preprocessor works by simple textual search-and-replace at the token level, rather than the character level. This allows some powerful forms of conditional processing, but working at the token level creates problems. If you are interested in learning more, check out the bonus reading for today.

If you publish research papers in CS, you might use a tool named LaTeX. TeX is a computer typesetting system written by Donald Knuth in the 1970s and 1980s. LaTeX is a derivative of TeX, with most of its functionality implemented as macros in TeX.

Macros at this low level are hard to work with, are error prone, are not always as powerful as we'd like.

Embeddable Languages

PHP is one of the most common languages used on the web. PHP programs are embedded into HTML files. The PHP processor recognizes code fragments using the markers [?php and ?] and executes the code to modify or extend the HTML.

More generally, programs in embeddable languages can be embedded in free-format text, or in the source code of other languages. This is similar to a textual macro language, but embeddable languages are usually much more powerful, as they are full-featured programming languages. (Racket comes with Scribble, a tool for for writing documents that allow Racket expressions as embedded programs.)

Our final example today is an embeddable language.

Modern Macro Systems

Among programming languages created in the last decade or so, Rust and Elixir stand out for their hygienic syntax-level macro systems. Rust excels at systems programming, while Elixir has found a niche in web development. Both languages have borrowed ideas from Racket and adapted them to the specific syntax and capabilities of those languages.

Even among these new languages with modern macro systems, though, Racket stands out.

Creating a New Language in Racket

My notes for the rest of this section are incomplete. I will complete and improve them soon.

We have just seen that Racket lets us modify its expander, the preprocessor that translates syntactic sugar into core expressions.

If we could also modify Racket's reader, we could define an entirely different language using Racket.

We can.

Racket is a language-making language. It treats languages as libraries to be loaded, mixed, and matched.

This is one example.

Matthew Butterick is a lawyer, a programmer, and a typographer. Then he decided to write a book called Practical Typography. He could have used Word or Latex, but neither gave him the flexibility or even the power he wanted. As a programmer, he knew he didn't have to settle for other people's tools. So he went looking for programming languages to use. Nothing seemed quite right.

Then he discovered Racket. Racket is a language-making language, so he decided to create his own publishing system, which became an entire language within Racket: Pollen.

Note: You will need to install Pollen to run this code. Use the menu command File | Install Package.... Type pollen into the Package Source box and click Install. When it's done, relaunch DrRacket.

Demonstrate Pollen.

Show a Pollen file.
Run in Dr. Racket. Look at the output.
Run at the command line: racket poem.html.pp.
Run and re-direct the output: racket poem.html.pp > poem.html.
Pollen can do that for us: raco pollen render poem.html.pp
Open the output.

Remember: All the reading and expanding happens before run-time.

Now we can write Pollen files, er, programs, and run them in Racket. Butterick has written two books using Pollen, at the same time creating wonderful web sites from the same source code:

If you want to learn more about how to make languages such as Pollen, check out Beautiful Racket. It's a very good book.

This is one example of a document language written in Racket. In Session 28, we will see a programming language written in Racket — one that doesn't use Racket's parenthesized prefix notation!

Wrap Up

Reading
- Review these lecture notes. Peek at the code for the session. Pay more attention to the ideas than the details, unless you really want to go deeper. (In which case, let me know!)
- If you like the ideas in this session and would like to see more, check out this short reading with associated code examples.
Homework
- Homework 8 was due yesterday. Our next assignment will come next week, after...
Quiz
- Quiz 3 is next session. This unit is shorter than the previous two. That is good, because the exam can focus on a smaller number of ideas:
  - the idea of syntactic abstraction
  - several syntactic abstractions and their translations
  - the implication of syntactic abstraction for the implementation of compilers
  - the idea of a lexical address
  - the implementation of lexical addressing in code [ Part 1 | Part 2 ]

Session 21 Creating New Syntax