Homework #17: Using Racket Macros (active)Out:   Monday, November 18th, Due:   Wednesday, December 4th, 11:23pm

Administrative

The language for this homework is:

#lang pl 17

In this homework you will get to implement a Racket macro, essentially extending Racket with a small DSL.

Introduction

The Swine Before Perl” is a known talk by Shriram Krishnamurthi (who also wrote the textbook we’re kind of using). In this talk, he demonstrates how a Racket macro can be used to implement a small domain specific language, using the problem of implementing a finite automaton. Not only does this result in extending Racket with a DSL for finite automata, it is also a simpler solution than the usual one, and a more efficient one. The efficiency benefit comes from the fact that the automatons are essentially compiled to Racket code rather than being represented as data that is interpreted — the speed improvement that is mentioned in the talk would be more substantial now, since Racket has improved a lot since then, and together with a JIT, we get automaton definitions that compile all the way down to machine-code. (Similar methods are common in implementations of regular expression matchers.)

In this homework, you will fix a bug in the automaton implementation, and extend it to implement pushdown-automatons. You should begin by going through the presentation slides. This shouldn’t take much time — you can begin reading at the slide titled “A Pearl”.

Debugging note: when you start to debug macro code, it will be useful to look at the [missing Debugging Macros] handout.

The Automaton, The Bug, and The Fix

The automaton code is not actually present in one piece in the slides — it is a combination of the transformation that is specified in one slide, and the wrapper that uses define-syntax and syntax-rules on the next one. We’re going to use a slightly revised version of the macro: first of all, we work in a typed language, and secondly, we can use the match form which is more powerful and convenient then the case form that appears in the slides.

The following is the version of the code with these changes, including a convenient type definition to avoid writing a union over and over:

(define-type Token = (U Symbol Integer))

;; A macro that defines a DFA language
(define-syntax automaton
  (syntax-rules (: ->)
    [(automaton init-state
      [state : (input-sym -> new-state) ...]
      ...)
    (lambda (string)
      (: state : (Listof Token) -> Boolean)
      ...
      (define (state stream)
        (match stream
          ['() #t]
          [(cons 'input-sym more) (new-state more)]
          ...
          [_ #f]))
      ...
      (init-state (explode-string string)))]))

As in the slides, an automaton form evaluates to a function, but to make it more convenient, we make this function consume a string and turn it into a list of symbols and digits using the explode-string utility function which is part of the course language.

Since the macro evaluates to a function, we can use it as usual, and bind some name to it. Note that the function form that automaton is translated to does not have a type, so we need to declare a type for a function that we define this way. Here is the automaton that is used in the slides — it is intended to identify strings that begin with a “c”, end with an “r”, and have any number of “a” and/or “d” in the middle:

(: cXr : String -> Boolean)
;; Identifies strings that match "c[ad]*r+"
(define cXr (automaton init
              [init : (c -> more)]
              [more : (a -> more)
                      (d -> more)
                      (r -> end)]
              [end  : (r -> end)]))

Note the small detail here that is not mentioned in the slides: the actual strings that are identified can end with one or morer” characters. This is not the main bug that we discuss here: it’s a minor technicality that is easy to fix, except that the fixed macro gets an odd-looking [end :] case, so we document it in the purpose statement.

Finally, here are some test cases to verify that everything works fine:

;; tests:
(test (cXr "cadr"))
(test (cXr "cadadadadadadddddaaarrr"))
(test (not (cXr "ccadr")))
(test (not (cXr "c"))) ; BAD TEST!

As noted, the last test shows that there is a bug in the implementation. The problem is that the new form does not have any place to specify an “accepting” state, so if the automaton manages to reach the end of the input (that is, there was no case where no transition was specified for the input token), then it will inevitably return #t.

Your first task is to fix this bug. The fix will have two parts: (a) extend the automaton syntax so there is a place to specify an accepting state; (b) use the new input to fix the broken place in the code (hint: there is a single place in the output pattern where a #t can be the result). The following is how the resulting macro should be used for a correct implementation of the cXr function:

(define cXr (automaton init end ; `end' is the accepting state
              [init : (c -> more)]
              [more : (a -> more)
                      (d -> more)
                      (r -> end)]
              [end  : (r -> end)]))

This should now pass all of the above tests. You can, of course, add new tests and new automata.

Note that there is a subtle point about the new code that you’re going to add: you need to do a comparison that involves states — but states are functions, and comparing functions is generally a bad idea. This is especially true with a compiler: what you think is a single function pointer might actually be implemented in several different ways depending on how your code gets used. This is why comparison of functions in Racket (and more generally in Scheme) is unspecified and should be avoided. There is an easy way out though: simply use the function names — using a quote.

Finally, make sure that you have complete coverage! You might see some part of the code that is not covered by the above tests. That is not some weird result of using macros. In fact, if you do see some part that is shown as uncovered, then this is most likely correct, and you need to add a test to get the needed coverage for exactly that part. This missing bit would be a missing test of your cXr, it’s not a test of the macro transformation. (In fact, we don’t get to test the transformation at all, we only do an indirect test of code that gets transformed by the macro.)

Side-comment: you might see complete coverage without adding a test. This is fine too — it is probably a result of better code in your expansion…

Important Note

Often, students get confused in this homework and mix up the generic code templates in the macro with the specific code in definitions that use the macro. Usually this leads to code that looks like it works, only it’s pretty much useless if you end up making a macro that generates only code specific to the cXr example. (Here “useless” implies a huge point loss!)

To help you avoid this kind of a mistake, make sure that your code is generic and can be used to implement different automatons too. Actually, since people who are confused in this way may just as well write confused tests, here is an addition test for you to use:

(: div5 : String -> Boolean)
;; Determine whether a binary number is divisible by 5
(define div5
  (automaton mod0 mod0
    [mod0 : (0 -> mod0) (1 -> mod1)]
    [mod1 : (0 -> mod2) (1 -> mod3)]
    [mod2 : (0 -> mod4) (1 -> mod0)]
    [mod3 : (0 -> mod1) (1 -> mod2)]
    [mod4 : (0 -> mod3) (1 -> mod4)]))
(test (div5 ""))
(test (div5 "0"))
(test (div5 "000"))
(test (div5 (number->string 12345 2)))
(test (not (div5 (number->string 123453 2))))

Implementing Pushdown Automata

Now that you know how to implement a simple automaton, you can proceed to implement an extended version: a pushdown version. The syntax for the new pushdown syntax is similar to the automaton macro, and the implementation is basically an extension of the code that keeps track a stack of tokens. The specification still has an initial state, an accepting state, and a sequence of transition rules. The rules are similar to the automaton rules, except that each transition has:

Each of these sequences is specified in parens, and each can be empty (but not all, since this will get the automaton stuck an infinite loop). When both of the first two sequences match the input and the stack, the matched portions of the two are discarded. Finally, when the automaton starts, the end of the input and the bottom of the stack are both initialized with a * token, and both should be consumed for the input to be accepted.

To clarify the syntax, here is the definition of a simple one-state pushdown automaton that accepts strings with balanced parentheses, with tests. Note that explode-string translates round parentheses to open and close symbols respectively.

(: balanced : String -> Boolean)
;; Identifies strings that contain only balanced parentheses
(define balanced (pushdown init init
                  [init : ((open)  ()    -> init (open))
                          ((close) (open) -> init ())
                          ((*)    (*)    -> init ())]))
;; tests:
(test (balanced ""))
(test (balanced "()"))
(test (balanced "(((())))"))
(test (balanced "((()())(()))"))
(test (not (balanced "(")))
(test (not (balanced ")")))
(test (not (balanced ")(")))

An explanation of the three transition rules in this code:

[These examples have either a single token (e.g., (open)) or none (i.e., ()) in places where tokens are specified. These could have more than one token too, it’s just not used in this example.]

Note that the assumption that * marks the bottom of the stack depends on no transition pushing this token while the automaton is doing its work. For simplicity, the code will not check this assumption.

To make this question easy, here is what the above code should expand to. Important: this is just an example of how the above code expands, the actual solution should work with any automatons! Avoid the temptation to blindly copy-paste this into your macro. It’s best if you write a different one to make sure that your code works as it should.

(lambda (string)
  (: init : (Listof Token) (Listof Token) -> Boolean)
  (define (init stream stack)
    (match (list stream stack)
      [(list '() '()) ...same as in `automaton`...]
      [(list (list-rest 'open more-input)
            (list-rest      more-stack))
      (init more-input (append '(open) more-stack))]
      [(list (list-rest 'close more-input)
            (list-rest 'open  more-stack))
      (init more-input (append '() more-stack))]
      [(list (list-rest '* more-input)
            (list-rest '* more-stack))
      (init more-input (append '() more-stack))]
      [_ #f]))
  (init (append (explode-string string) '(*)) '(*)))

Note how this code initializes the input to a *-terminated list, and the stack to have it also at the bottom. Note also that this has several expressions that look redundant, for example, the two append expressions that have an empty list as their first argument are not needed. These are there in the expanded expression — due to the uniform transformation that produces the final code.

This code uses a new match pattern: list-rest. This pattern is similar to a cons pattern, but instead of matching just the head and the tail of the list, it can match any prefix of the list, and the rest of it. For example,

Since we used only zero or one tokens in the macro, the resulting code has only (list-rest xs) or (list-rest 'foo xs) patterns. If there was a (foo bar) in the macro, it would translate to a (list-rest 'foo 'bar xs) expression.

Again, make sure that your macro is a proper abstraction by implementing a new kind of automaton, with a different “shape”. This is even more important here since you’ll have revised the code. Here’s one such example of a test:

(: zeros=ones : String -> Boolean)
;; Identifies strings of n 0s followed by n 1s
(define zeros=ones
  (pushdown 0s end
    [0s  : ((0) ()  -> 0s  (A))
          (()  ()  -> 1s  ())]
    [1s  : ((1) (A) -> 1s  ())
          ((*) (*) -> end (*))]
    [end : (()  (*) -> end ())]))
;; tests:
(test (zeros=ones ""))
(test (zeros=ones "01"))
(test (zeros=ones "000111"))
(test (not (zeros=ones "0")))
(test (not (zeros=ones "11")))
(test (not (zeros=ones "10")))
(test (not (zeros=ones "00011")))
(test (not (zeros=ones "00101111")))