The language for this homework is:
In this homework you will get to implement a Racket macro, essentially extending Racket with a small DSL.
“The Swine Before Perl” is a known talk by Shriram Krishnamurthi (who also wrote the textbook we’re kind of using). In this talk, he demonstrates how a Racket macro can be used to implement a small domain specific language, using the problem of implementing a finite automaton. Not only does this result in extending Racket with a DSL for finite automata, it is also a simpler solution than the usual one, and a more efficient one. The efficiency benefit comes from the fact that the automatons are essentially compiled to Racket code rather than being represented as data that is interpreted — the speed improvement that is mentioned in the talk would be more substantial now, since Racket has improved a lot since then, and together with a JIT, we get automaton definitions that compile all the way down to machine-code. (Similar methods are common in implementations of regular expression matchers.)
In this homework, you will fix a bug in the automaton implementation, and extend it to implement pushdown-automatons. You should begin by going through the presentation slides. This shouldn’t take much time — you can begin reading at the slide titled “A Pearl”.
Debugging note: when you start to debug macro code, it will be useful to look at the Debugging Macros handout.
The automaton code is not actually present in one piece in the slides
— it is a combination of the transformation that is specified in one
slide, and the wrapper that uses
the next one. We’re going to use a slightly revised version of the
macro: first of all, we work in a typed language, and secondly, we can
match form which is more powerful and convenient then the
case form that appears in the slides.
The following is the version of the code with these changes, including a convenient type definition to avoid writing a union over and over:
As in the slides, an
automaton form evaluates to a function, but to
make it more convenient, we make this function consume a string and turn
it into a list of symbols and digits using the
function which is part of the course language.
Since the macro evaluates to a function, we can use it as usual, and
bind some name to it. Note that the function form that
translated to does not have a type, so we need to declare a type for a
function that we define this way. Here is the automaton that is used in
the slides — it is intended to identify strings that begin with a
c”, end with an “
r”, and have any number of “
a” and/or “
Note the small detail here that is not mentioned in the slides: the
actual strings that are identified can end with one or more “
characters. This is not the main bug that we discuss here: it’s a minor
technicality that is easy to fix, except that the fixed macro gets an
[end :] case, so we document it in the purpose statement.
Finally, here are some test cases to verify that everything works fine:
As noted, the last test shows that there is a bug in the implementation.
The problem is that the new form does not have any place to specify an
“accepting” state, so if the automaton manages to reach the end of the
input (that is, there was no case where no transition was specified for
the input token), then it will inevitably return
Your first task is to fix this bug. The fix will have two parts: (a)
automaton syntax so there is a place to specify an
accepting state; (b) use the new input to fix the broken place in the
code (hint: there is a single place in the output pattern where a
can be the result). The following is how the resulting macro should be
used for a correct implementation of the
This should now pass all of the above tests. You can, of course, add new tests and new automata.
Note that there is a subtle point about the new code that you’re going to add: you need to do a comparison that involves states — but states are functions, and comparing functions is generally a bad idea. This is especially true with a compiler: what you think is a single function pointer might actually be implemented in several different ways depending on how your code gets used. This is why comparison of functions in Racket (and more generally in Scheme) is unspecified and should be avoided. There is an easy way out though: simply use the function names — using a quote.
Finally, make sure that you have complete coverage! You might see some
part of the code that is not covered by the above tests. That is not
some weird result of using macros. In fact, if you do see some part
that is shown as uncovered, then this is most likely correct, and you
need to add a test to get the needed coverage for exactly that part.
This missing bit would be a missing test of your
cXr, it’s not a
test of the macro transformation. (In fact, we don’t get to test the
transformation at all, we only do an indirect test of code that gets
transformed by the macro.)
Side-comment: you might see complete coverage without adding a test. This is fine too — it is probably a result of better code in your expansion…
Often, students get confused in this homework and mix up the generic
code templates in the macro with the specific code in definitions that
use the macro. Usually this leads to code that looks like it works,
only it’s pretty much useless if you end up making a macro that
generates only code specific to the
cXr example. (Here “useless”
implies a huge point loss!)
To help you avoid this kind of a mistake, make sure that your code is generic and can be used to implement different automatons too. Actually, since people who are confused in this way may just as well write confused tests, here is an addition test for you to use:
Now that you know how to implement a simple automaton, you can proceed
to implement an extended version: a pushdown version. The syntax for
pushdown syntax is similar to the
automaton macro, and the
implementation is basically an extension of the code that keeps track a
stack of tokens. The specification still has an initial state, an
accepting state, and a sequence of transition rules. The rules are
similar to the
automaton rules, except that each transition has:
Each of these sequences is specified in parens, and each can be empty
(but not all, since this will get the automaton stuck an infinite loop).
When both of the first two sequences match the input and the stack, the
matched portions of the two are discarded. Finally, when the automaton
starts, the end of the input and the bottom of the stack are both
initialized with a
* token, and both should be consumed for the input
to be accepted.
To clarify the syntax, here is the definition of a simple one-state
pushdown automaton that accepts strings with balanced parentheses, with
tests. Note that
explode-string translates round parentheses to
close symbols respectively.
An explanation of the three transition rules in this code:
The first rule matches (and consumes) an
open token in the input,
and any stack; it proceeds to push an
open token on the stack.
The second rule matches a
close token in the input, and an
token on the stack; it consumes both tokens and does not add anything
to the stack (that is, the
open token is just consumed).
Finally, the third rule matches when we see the
* token in the input
(meaning that we see the end of the input), and when the stack has a
* (meaning that we have the same stack as we had at the beginning).
In this case, both the input and the stack are left empty, which will
lead to a
[These examples have either a single token (e.g.,
(open)) or none
()) in places where tokens are specified. These could have
more than one token too, it’s just not used in this example.]
Note that the assumption that
* marks the bottom of the stack depends
on no transition pushing this token while the automaton is doing its
work. For simplicity, the code will not check this assumption.
To make this question easy, here is what the above code should expand to. Important: this is just an example of how the above code expands, the actual solution should work with any automatons! Avoid the temptation to blindly copy-paste this into your macro. It’s best if you write a different one to make sure that your code works as it should.
Note how this code initializes the input to a
*-terminated list, and
the stack to have it also at the bottom. Note also that this has
several expressions that look redundant, for example, the two
expressions that have an empty list as their first argument are not
needed. These are there in the expanded expression — due to the
uniform transformation that produces the final code.
This code uses a new
list-rest. This pattern is
similar to a
cons pattern, but instead of matching just the head and
the tail of the list, it can match any prefix of the list, and the rest
of it. For example,
(list-rest x xs) pattern is the same as
(cons x xs),
(list-rest xs) simply matches any input list,
(list-rest x y more) matches a list with at least two elements,
x matches the first one,
y the second, and
Since we used only zero or one tokens in the macro, the resulting code
(list-rest xs) or
(list-rest 'foo xs) patterns. If there
(foo bar) in the macro, it would translate to a
(list-rest 'foo 'bar xs) expression.
Again, make sure that your macro is a proper abstraction by implementing a new kind of automaton, with a different “shape”. This is even more important here since you’ll have revised the code. Here’s one such example of a test: