Administrative
The language for this homework is:
In this homework you will get to implement a Racket macro, essentially extending Racket with a small DSL.
Introduction
“The Swine Before Perl” is a known talk by Shriram Krishnamurthi (who also wrote the textbook we’re kind of using). In this talk, he demonstrates how a Racket macro can be used to implement a small domain specific language, using the problem of implementing a finite automaton. Not only does this result in extending Racket with a DSL for finite automata, it is also a simpler solution than the usual one, and a more efficient one. The efficiency benefit comes from the fact that the automatons are essentially compiled to Racket code rather than being represented as data that is interpreted — the speed improvement that is mentioned in the talk would be more substantial now, since Racket has improved a lot since then, and together with a JIT, we get automaton definitions that compile all the way down to machine-code. (Similar methods are common in implementations of regular expression matchers.)
In this homework, you will fix a bug in the automaton implementation, and extend it to implement pushdown-automatons. You should begin by going through the presentation slides. This shouldn’t take much time — you can begin reading at the slide titled “A Pearl”.
Debugging note: when you start to debug macro code, it will be useful to look at the [missing Debugging Macros] handout.
The Automaton, The Bug, and The Fix
The automaton code is not actually present in one piece in the slides
— it is a combination of the transformation that is specified in one
slide, and the wrapper that uses define-syntax
and syntax-rules
on
the next one. We’re going to use a slightly revised version of the
macro: first of all, we work in a typed language, and secondly, we can
use the match
form which is more powerful and convenient then the
case
form that appears in the slides.
The following is the version of the code with these changes, including a convenient type definition to avoid writing a union over and over:
;; A macro that defines a DFA language
(define-syntax automaton
(syntax-rules (: ->)
[(automaton init-state
[state : (input-sym -> new-state) ...]
...)
(lambda (string)
(: state : (Listof Token) -> Boolean)
...
(define (state stream)
(match stream
['() #t]
[(cons 'input-sym more) (new-state more)]
...
[_ #f]))
...
(init-state (explode-string string)))]))
As in the slides, an automaton
form evaluates to a function, but to
make it more convenient, we make this function consume a string and turn
it into a list of symbols and digits using the explode-string
utility
function which is part of the course language.
Since the macro evaluates to a function, we can use it as usual, and
bind some name to it. Note that the function form that automaton
is
translated to does not have a type, so we need to declare a type for a
function that we define this way. Here is the automaton that is used in
the slides — it is intended to identify strings that begin with a
“c
”, end with an “r
”, and have any number of “a
” and/or “d
” in
the middle:
;; Identifies strings that match "c[ad]*r+"
(define cXr (automaton init
[init : (c -> more)]
[more : (a -> more)
(d -> more)
(r -> end)]
[end : (r -> end)]))
Note the small detail here that is not mentioned in the slides: the
actual strings that are identified can end with one or more “r
”
characters. This is not the main bug that we discuss here: it’s a minor
technicality that is easy to fix, except that the fixed macro gets an
odd-looking [end :]
case, so we document it in the purpose statement.
Finally, here are some test cases to verify that everything works fine:
(test (cXr "cadr"))
(test (cXr "cadadadadadadddddaaarrr"))
(test (not (cXr "ccadr")))
(test (not (cXr "c"))) ; BAD TEST!
As noted, the last test shows that there is a bug in the implementation.
The problem is that the new form does not have any place to specify an
“accepting” state, so if the automaton manages to reach the end of the
input (that is, there was no case where no transition was specified for
the input token), then it will inevitably return #t
.
Your first task is to fix this bug. The fix will have two parts: (a)
extend the automaton
syntax so there is a place to specify an
accepting state; (b) use the new input to fix the broken place in the
code (hint: there is a single place in the output pattern where a #t
can be the result). The following is how the resulting macro should be
used for a correct implementation of the cXr
function:
[init : (c -> more)]
[more : (a -> more)
(d -> more)
(r -> end)]
[end : (r -> end)]))
This should now pass all of the above tests. You can, of course, add new tests and new automata.
Note that there is a subtle point about the new code that you’re going to add: you need to do a comparison that involves states — but states are functions, and comparing functions is generally a bad idea. This is especially true with a compiler: what you think is a single function pointer might actually be implemented in several different ways depending on how your code gets used. This is why comparison of functions in Racket (and more generally in Scheme) is unspecified and should be avoided. There is an easy way out though: simply use the function names — using a quote.
Finally, make sure that you have complete coverage! You might see some
part of the code that is not covered by the above tests. That is not
some weird result of using macros. In fact, if you do see some part
that is shown as uncovered, then this is most likely correct, and you
need to add a test to get the needed coverage for exactly that part.
This missing bit would be a missing test of your cXr
, it’s not a
test of the macro transformation. (In fact, we don’t get to test the
transformation at all, we only do an indirect test of code that gets
transformed by the macro.)
Side-comment: you might see complete coverage without adding a test. This is fine too — it is probably a result of better code in your expansion…
Important Note
Often, students get confused in this homework and mix up the generic
code templates in the macro with the specific code in definitions that
use the macro. Usually this leads to code that looks like it works,
only it’s pretty much useless if you end up making a macro that
generates only code specific to the cXr
example. (Here “useless”
implies a huge point loss!)
To help you avoid this kind of a mistake, make sure that your code is generic and can be used to implement different automatons too. Actually, since people who are confused in this way may just as well write confused tests, here is an addition test for you to use:
;; Determine whether a binary number is divisible by 5
(define div5
(automaton mod0 mod0
[mod0 : (0 -> mod0) (1 -> mod1)]
[mod1 : (0 -> mod2) (1 -> mod3)]
[mod2 : (0 -> mod4) (1 -> mod0)]
[mod3 : (0 -> mod1) (1 -> mod2)]
[mod4 : (0 -> mod3) (1 -> mod4)]))
(test (div5 ""))
(test (div5 "0"))
(test (div5 "000"))
(test (div5 (number->string 12345 2)))
(test (not (div5 (number->string 123453 2))))
Implementing Pushdown Automata
Now that you know how to implement a simple automaton, you can proceed
to implement an extended version: a pushdown version. The syntax for
the new pushdown
syntax is similar to the automaton
macro, and the
implementation is basically an extension of the code that keeps track a
stack of tokens. The specification still has an initial state, an
accepting state, and a sequence of transition rules. The rules are
similar to the automaton
rules, except that each transition has:
- a sequence of tokens to match against the current input,
- a sequence of tokens to match against the top of the stack,
- the new state for this transition,
- a sequence of tokens to push back on the stack.
Each of these sequences is specified in parens, and each can be empty
(but not all, since this will get the automaton stuck an infinite loop).
When both of the first two sequences match the input and the stack, the
matched portions of the two are discarded. Finally, when the automaton
starts, the end of the input and the bottom of the stack are both
initialized with a *
token, and both should be consumed for the input
to be accepted.
To clarify the syntax, here is the definition of a simple one-state
pushdown automaton that accepts strings with balanced parentheses, with
tests. Note that explode-string
translates round parentheses to
open
and close
symbols respectively.
;; Identifies strings that contain only balanced parentheses
(define balanced (pushdown init init
[init : ((open) () -> init (open))
((close) (open) -> init ())
((*) (*) -> init ())]))
;; tests:
(test (balanced ""))
(test (balanced "()"))
(test (balanced "(((())))"))
(test (balanced "((()())(()))"))
(test (not (balanced "(")))
(test (not (balanced ")")))
(test (not (balanced ")(")))
An explanation of the three transition rules in this code:
-
The first rule matches (and consumes) an
open
token in the input, and any stack; it proceeds to push anopen
token on the stack. -
The second rule matches a
close
token in the input, and anopen
token on the stack; it consumes both tokens and does not add anything to the stack (that is, theopen
token is just consumed). -
Finally, the third rule matches when we see the
*
token in the input (meaning that we see the end of the input), and when the stack has a*
(meaning that we have the same stack as we had at the beginning). In this case, both the input and the stack are left empty, which will lead to a#t
result.
[These examples have either a single token (e.g., (open)
) or none
(i.e., ()
) in places where tokens are specified. These could have
more than one token too, it’s just not used in this example.]
Note that the assumption that *
marks the bottom of the stack depends
on no transition pushing this token while the automaton is doing its
work. For simplicity, the code will not check this assumption.
To make this question easy, here is what the above code should expand to. Important: this is just an example of how the above code expands, the actual solution should work with any automatons! Avoid the temptation to blindly copy-paste this into your macro. It’s best if you write a different one to make sure that your code works as it should.
(: init : (Listof Token) (Listof Token) -> Boolean)
(define (init stream stack)
(match (list stream stack)
[(list '() '()) ...same as in `automaton`...]
[(list (list-rest 'open more-input)
(list-rest more-stack))
(init more-input (append '(open) more-stack))]
[(list (list-rest 'close more-input)
(list-rest 'open more-stack))
(init more-input (append '() more-stack))]
[(list (list-rest '* more-input)
(list-rest '* more-stack))
(init more-input (append '() more-stack))]
[_ #f]))
(init (append (explode-string string) '(*)) '(*)))
Note how this code initializes the input to a *
-terminated list, and
the stack to have it also at the bottom. Note also that this has
several expressions that look redundant, for example, the two append
expressions that have an empty list as their first argument are not
needed. These are there in the expanded expression — due to the
uniform transformation that produces the final code.
This code uses a new match
pattern: list-rest
. This pattern is
similar to a cons
pattern, but instead of matching just the head and
the tail of the list, it can match any prefix of the list, and the rest
of it. For example,
-
A
(list-rest x xs)
pattern is the same as(cons x xs)
, -
(list-rest xs)
simply matches any input list, -
(list-rest x y more)
matches a list with at least two elements, wherex
matches the first one,y
the second, andmore
the rest.
Since we used only zero or one tokens in the macro, the resulting code
has only (list-rest xs)
or (list-rest 'foo xs)
patterns. If there
was a (foo bar)
in the macro, it would translate to a (list-rest 'foo 'bar xs)
expression.
Again, make sure that your macro is a proper abstraction by implementing a new kind of automaton, with a different “shape”. This is even more important here since you’ll have revised the code. Here’s one such example of a test:
;; Identifies strings of n 0s followed by n 1s
(define zeros=ones
(pushdown 0s end
[0s : ((0) () -> 0s (A))
(() () -> 1s ())]
[1s : ((1) (A) -> 1s ())
((*) (*) -> end (*))]
[end : (() (*) -> end ())]))
;; tests:
(test (zeros=ones ""))
(test (zeros=ones "01"))
(test (zeros=ones "000111"))
(test (not (zeros=ones "0")))
(test (not (zeros=ones "11")))
(test (not (zeros=ones "10")))
(test (not (zeros=ones "00011")))
(test (not (zeros=ones "00101111")))