PL: Lecture #22  Tuesday, March 30th
(text)

Scheme (and Racket) Macros (contd.)

The way that Scheme implementations achieve hygiene in a macro system is by making it deal with more than just raw S-expressions. Roughly speaking, it deals with syntax objects that are sort of a wrapper structure around S-expression, carrying additional information. The important part of this information when it gets to dealing with hygiene is the “lexical scope” — which can roughly be described as having identifiers be represented as symbols plus a “color” which represents the scope. This way such systems can properly avoid confusing identifiers with the same name that come from different scopes.

There was also the problem of making debugging difficult, because a macro can introduce errors that are “coming out of nowhere”. In the implementation that we work with, this is solved by adding yet more information to these syntax objects — in addition to the underlying S-expression and the lexical scope, they also contain source location information. This allows Racket (and DrRacket) to locate the source of a specific syntax error, so locating the offending code is easy. DrRacket’s macro debugger heavily relies on this information to provide a very useful tool — since writing macros can easily become a hard job.

Finally, there was the problem of writing bad macros. For example, it is easy to forget that you’re dealing with a macro definition and write:

(define-syntax-rule (twice x) (+ x x))

just because you want to inline the addition — but in this case you end up duplicating the input expression which can have a disastrous effect. For example:

(twice (twice (twice (twice (twice (twice (twice (twice 1))))))))

expands to a lot of code to compile.

Another example is:

(define-syntax-rule (with-increment var expr)
  (let ([var (add1 var)]) expr))
...
(with-increment (* foo 2)
  ...code...)

the problem here is that (* foo 2) will be used as an identifier to be bound by the let expression — which can lead to a confusing syntax error.

Racket provides many tools to help macro programmers — in addition to a user-interface tool like the macro debugger there are also programmer-level tools where you can reject an input if it doesn’t contain an identifier at a certain place etc. Still, writing macros is much harder than writing functions — some of these problems are inherent to the problem that macros solve; for example, you may want a twice macro that replicates an expression. By specifying a transformation to the core language, a macro writer has full control over which expressions get evaluated and how, which identifiers are binding instances, and how is the scope of the given expression is shaped.

Meta Macros

One of the nice results of syntax-rules dealing with the subtle points of identifiers and scope is that things works fine even when we “go up a level”. For example, the short define-syntax-rule form that we’ve seen is itself a defined as a simple macro:

(define-syntax define-syntax-rule
  (syntax-rules ()
    [(define-syntax-rule (name P ...) B)
    (define-syntax name
      (syntax-rules ()
        [(name P ...) B]))]))

In fact, this is very similar to something that we have already seen: the rewrite form that we have used in Schlac is implemented in just this way. The only difference is that rewrite requires an actual => token to separate the input pattern from the output template. If we just use it in a syntax rule:

(define-syntax rewrite
  (syntax-rules ()
    [(rewrite (name P ...) => B)
    (define-syntax name
      (syntax-rules ()
        [(name P ...) B]))]))

it won’t work. Racket treats the above => just like any identifier, which in this case acts as a pattern variable which matches anything. The solution to this is to list the => as a keyword which is expected to appear in the macro use as-is — and that’s what the mysterious () of syntax-rules is used for: any identifier listed there is taken to be such a keyword. This makes the following version

(define-syntax rewrite
  (syntax-rules (=>)
    [(rewrite (name P ...) => B)
    (define-syntax name
      (syntax-rules ()
        [(name P ...) B]))]))

do what we want and throw a syntax error unless rewrite is used with an actual => in the proper place.

Lazy Constructions in an Eager Language

PLAI §37 (has some examples)

This is not really lazy evaluation, but it gets close, and provides the core useful property of easy-to-use infinite lists.

(define-syntax-rule (cons-stream x y)
  (cons x (lambda () y)))
(define stream? pair?)
(define null-stream null)
(define null-stream? null?)
;; note that there are not proper lists in racket,
;; so we use car and cdr here
(define stream-first car)
(define (stream-rest s) ((cdr s)))

Using it:

(define ones (cons-stream 1 ones))
(define (stream-map f s)
  (if (null-stream? s)
    null-stream
    (cons-stream (f (stream-first s))
                (stream-map f (stream-rest s)))))
(define (stream-map2 f s1 s2)
  (if (null-stream? s1)
    null-stream
    (cons-stream (f (stream-first s1) (stream-first s2))
                (stream-map2 f (stream-rest s1)
                                (stream-rest s2)))))
(define ints (cons-stream 0 (stream-map2 + ones ints)))

Actually, all Scheme implementations come with a generalized tool for (local) laziness: a delay form that delays computation of its body expression, and a force function that forces such promises. Here is a naive implementation of this:

(define-type promise
  [make-promise (-> Any)])

(define-syntax-rule (delay expr)
  (make-promise (lambda () expr)))

(define (force p)
  (cases p [(make-promise thunk) (thunk)]))

Proper definitions of delay/force cache the result — and practical ones can get pretty complex, for example, in order to allow tail calls via promises.

Recursive Macros

Syntax transformations can be recursive. For example, we have seen how let* can be implemented by a transformation that uses two rules, one of which expands to another use of let*:

(define-syntax let*
  (syntax-rules ()
    [(let* () body ...)
    (let () body ...)]
    [(let* ((x v) (xs vs) ...) body ...)
    (let ((x v)) (let* ((xs vs) ...) body ...))]))

When Racket expands a let* expression, the result may contain a new let* which needs extending as well. An important implication of this is that recursive macros are fine, as long as the recursive case is using a smaller expression. This is just like any form of recursion (or loop), where you need to be looping over a well-founded set of values — where each iteration uses a new value that is closer to some base case.

For example, consider the following macro:

(define-syntax-rule (while condition body ...)
  (when condition
    body ...
    (while condition body ...)))

It seems like this is a good implementation of a while loop — after all, if you were to implement it as a function using thunks, you’d write very similar code:

(define (while condition-thunk body-thunk)
  (when (condition-thunk)
    (body-thunk)
    (while condition-thunk body-thunk)))

But if you look at the nested while form in the transformation rule, you’ll see that it is exactly the same as the input form. This means that this macro can never be completely expanded — it specifies infinite code! In practice, this makes the (Racket) compiler loop forever, consuming more and more memory. This is unlike, for example, the recursive let* rule which uses one less binding-value pair than specified as its input.

The reason that the function version of while is fine is that it iterates using the same code, and the condition thunk will depend on some state that converges to a base case (usually the body thunk will perform some side-effects that makes the loop converge). But in the macro case there is no evaluation happening, if the transformed syntax contains the same input pattern, we end up having a macro that expands infinitely.

The correct solution for a while macro is therefore to use plain recursion using a local recursive function:

(define-syntax-rule (while condition body ...)
  (letrec ([loop (lambda ()
                  (when condition
                    body ...
                    (loop)))])
    (loop)))

A popular way to deal with macros like this that revolve around a specific control flow is to separate them into a function that uses thunks, and a macro that does nothing except wrap input expressions as thunks. In this case, we get this solution:

(define (while/proc condition-thunk body-thunk)
  (when (condition-thunk)
    (body-thunk)
    (while/proc condition-thunk body-thunk)))

(define-syntax-rule (while condition body ...)
  (while/proc (lambda () condition)
              (lambda () body ...)))

Another example: a simple loop

Here is an implementation of a macro that does a simple arithmetic loop:

(define-syntax for
  (syntax-rules (= to do)
    [(for x = m to n do body ...)
    (letrec ([loop (lambda (x)
                      (when (<= x n)
                        body ...
                        (loop (+ x 1))))])
      (loop m))]))

(Note that this is not complete code: it suffers from the usual problem of multiple evaluations of the n expression. We’ll deal with it soon.)

This macro combines both control flow and lexical scope. Control flow is specified by the loop (done, as usual in Racket, as a tail-recursive function) — for example, it determines how code is iterated, and it also determines what the for form will evaluate to (it evaluates to whatever when evaluates to, the void value in this case). Scope is also specified here, by translating the code to a function — this code makes x have a scope that covers the body so this is valid:

(for i = 1 to 3 do (printf "i = ~s\n" i))

but it also makes the boundary expression n be in this scope, making this:

(for i = 1 to (if (even? i) 10 20) do (printf "i = ~s\n" i))

valid. In addition, while evaluating the condition on each iteration might be desirable, in most cases it’s not — consider this example:

(for i = 1 to (read) do (printf "i = ~s\n" i))

This is easily solved by using a let to make the expression evaluate just once:

(define-syntax for
  (syntax-rules (= to do)
    [(for x = m to n do body ...)
    (let ([m* m]  ; execution order
          [n* n])
      (letrec ([loop (lambda (x)
                        (when (<= x n*)
                          body ...
                          (loop (+ x 1))))])
        (loop m*)))]))

which makes the previous use result in a “reference to undefined identifier: i” error.

Furthermore, the fact that we have a hygienic macro system means that it is perfectly fine to use nested for expressions:

(for a = 1 to 9 do
  (for b = 1 to 9 do (printf "~s,~s " a b))
  (newline))

The transformation is, therefore, completely specifying the semantics of this new form.

Extending this syntax is easy using multiple transformation rules — for example, say that we want to extend it to have a step optional keyword. The standard idiom is to have the step-less pattern translated into one that uses step 1:

(for x = m to n do body ...)
--> (for x = m to n step 1 do body ...)

Usually, you should remember that syntax-rules tries the patterns one by one until a match is found, but in this case there is no problems because the keywords make the choice unambiguous:

(define-syntax for
  (syntax-rules (= to do step)
    [(for x = m to n do body ...)
    (for x = m to n step 1 do body ...)]
    [(for x = m to n step d do body ...)
    (let ([m* m]
          [n* n]
          [d* d])
      (letrec ([loop (lambda (x)
                        (when (<= x n*)
                          body ...
                          (loop (+ x d*))))])
        (loop m*)))]))

(for i = 1 to 10 do (printf "i = ~s\n" i))
(for i = 1 to 10 step 2 do (printf "i = ~s\n" i))

We can even extend it to do a different kind of iteration, for example, iterate over list:

(define-syntax for
  (syntax-rules (= to do step in)
    [(for x = m to n do body ...)
    (for x = m to n step 1 do body ...)]
    [(for x = m to n step d do body ...)
    (let ([m* m]
          [n* n]
          [d* d])
      (letrec ([loop (lambda (x)
                        (when (<= x n*)
                          body ...
                          (loop (+ x d*))))])
        (loop m*)))]
    ;; list
    [(for x in l do body ...)
    (for-each (lambda (x) body ...) l)]))

(for i in (list 1 2 3 4) do (printf "i = ~s\n" i))

(for i in (list 1 2 3 4) do
  (for i = 0 to i do (printf "i = ~s  " i))
  (newline))

Yet Another: List Comprehension

At this point it’s clear that macros are a powerful language feature that makes it relatively easy to implement new features, making it a language that is easy to use as a tool for quick experimentation with new language features. As an example of a practical feature rather than a toy, let’s see how we can implement Python’s list comprehenions. These are expressions that conveniently combine map, filter, and nested uses of both.

First, a simple implementation that uses only the map feature:

(define-syntax list-of
  (syntax-rules (for in)
    [(list-of EXPR for ID in LIST)
    (map (lambda (ID) EXPR)
          LIST)]))

(list-of (* x x) for x in (range 10))

It is a good exercise to see how everything that we’ve seen above plays a role here. For example, how we get the ID to be bound in EXPR.

Next, add a condition expression with an if keyword, and implemented using a filter:

(define-syntax list-of
  (syntax-rules (for in if)
    [(list-of EXPR for ID in LIST if COND)
    (map (lambda (ID) EXPR)
          (filter (lambda (ID) COND) LIST))]
    [(list-of EXPR for ID in LIST)
    (list-of EXPR for ID in LIST if #t)]))

(list-of (* x x) for x in (range 10) if (odd? x))

Again, go over it and see how the binding structure makes the identifier available in both expressions. Note that since we’re just playing around we’re not paying too much attention to performance etc. (For example, if we cared, we could have implemented the if-less case by not using filter at all, or we could implement a filter that accepts #t as a predicate and in that case just returns the list, or even implementing it as a macro that identifies a (lambda (_) #t) pattern and expands to just the list (a bad idea in general).)

The last step: Python’s comprehension accepts multiple for-ins for nested loops, possibly with if filters at each level:

(define-syntax list-of
  (syntax-rules (for in if)
    [(list-of EXPR for ID in LIST if COND)
    (map (lambda (ID) EXPR)
          (filter (lambda (ID) COND) LIST))]
    [(list-of EXPR for ID in LIST)
    (list-of EXPR for ID in LIST if #t)]
    [(list-of EXPR for ID in LIST for MORE ...)
    (list-of EXPR for ID in LIST if #t for MORE ...)]
    [(list-of EXPR for ID in LIST if COND for MORE ...)
    (apply append (map (lambda (ID) (list-of EXPR for MORE ...))
                        (filter (lambda (ID) COND) LIST)))]))

A collection of examples that I found in the Python docs and elsewhere, demonstrating all of these:

;; [x**2 for x in range(10)]
(list-of (* x x) for x in (range 10))
;; [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
(list-of (list x y) for x in '(1 2 3) for y in '(3 1 4)
                    if (not (= x y)))

(define (round-n x n) ; python-like round to n digits
  (define 10^n (expt 10 n))
  (/ (round (* x 10^n)) 10^n))
;; [str(round(pi, i)) for i in range(1, 6)]
(list-of (number->string (round-n pi i)) for i in (range 1 6))

(define matrix
  '((1 2 3 4)
    (5 6 7 8)
    (9 10 11 12)))
;; [[row[i] for row in matrix] for i in range(4)]
(list-of (list-of (list-ref row i) for row in matrix)
        for i in (range 4))

(define text '(("bar" "foo" "fooba")
              ("Rome" "Madrid" "Houston")
              ("aa" "bb" "cc" "dd")))
;; [y for x in text if len(x)>3 for y in x]
(list-of y for x in text if (> (length x) 3) for y in x)
;; [y for x in text for y in x if len(y)>4]
(list-of y for x in text for y in x if (> (string-length y) 4))
;; [y.upper() for x in text if len(x) == 3
;;            for y in x if y.startswith('f')]
(list-of (string-upcase y) for x in text if (= (length x) 3)
                          for y in x if (regexp-match? #rx"^f" y))