This homework is only required for master students. Undergrads can do it too, and you will be graded as usual — but you will not be able to withdraw a graded homework grade, so submit only if you think that your work is good enough to improve your grade. (Note: the homework grades are all weighted to compute your overall grade, so adding another grade means that you get smaller portions for more grades, which can serve as protection against local disasters.)
Note that while undergrads are not required to submit a solution, it will be a very good idea to read the homework and go through the solution when it is later posted — since you are expected to know the material.
The language for this homework is:
In the last homework the Toy interpreter was turned into a compiler. In this one, this work is further extended: the compiler will inline global bindings, and use de-Bruijn indexes to avoid name lookup in the compiled code. This would still be far from real compilers, but it should give you a better idea of moving work from the run-time to the compile-time.
As usual, begin your work with the posted solution to the previous homework.
The main change for the compiler is passing around a description of the bindings in the current lexical environment — which will happen completely at compile-time. This will later motivate all the other changes, so it’s a good first step:
Start with defining a type for this (in the “Compilation” part of the code):
Now add a
bindings argument to all three compilation functions:
Fix all calls to these functions to pass the correct value. This means that in most cases the new argument is passed around as-is, except for compiling forms that introduce new bindings which should add the new names.
Note that there should be a difference between the way new bindings
are made in
Doing this raises the question of how to deal with the global
environment. The obvious way to deal with it is to start things (the
compile call in
run) with its
BINDINGS description — a
list holding a list with all names in the global environment. However,
we will eventually inline uses of the global environment, which means
that it will turn to be something that gets used only by the compiler, a
simple mapping from names to values rather than an extensible value. So
use an empty list as the initial description.
Note that there are a few uses of
compile as a value that is passed to
map. For these, it will be convenient for you to define an internal
compile* helper that passes along the same
When you are done with this step, the code should run fine as before. (But obviously the compiler is only passing the description around, so nothing else changed.)
We pass around the
bindings environment description for one reason: to
be able to get the index that corresponds to some given name. This
should be your next step then. Add a new function —
that does that. The function receives a symbol and a
description, and returns either a
#f if the symbol is not in the
description or a list of two natural numbers
(List Natural Natural)
that correspond to the location. Here are some tests to clarify it:
(Note that usually we don’t use tests other than api-level tests, but in this case they can stay to make it easier to debug the code.)
The next step is to use the found index, but before you do that, it will be a good idea to switch to a little more convenient representation for environments. Instead of the type definition, use this:
and change the rest of the code accordingly. This should be a simple change, and Typed Racket will tell you about the places that need to be updated.
Once the above is done, you can start using the indexes. This should
happen in all places where
lookup is used — there are four such
places. The first is in
extend-rec, and then there are uses in the
code for looking up the value of an identifier, for mutating one, and in
compile-get-boxes. In each of these cases you should use
to find the index at compile-time, then use the index at runtime with
list-refs to pull out the right box from the nested list structure
that an environment is now. Each of these should have a slightly
different way of dealing with this:
This case is a bit of an exception since the use of
lookup there is
only serving to simplify the code. Instead of this, you could be change
it to avoid the
lookup, for example, you can construct the new frame
using a known list of boxes instead of delegating to
extend, and then
set these boxes directly. Another way to look at this is that the code
doesn’t actually need to use the names, since you know that the indexes
(0 0) to
(0 N-1) when we have N bindings.
This is the simplest case: you get the index for the identifier at compile-time, and throw a compilation error if it is not found. (The exact error message shouldn’t matter for now, since it will be revised in the next item.) At runtime, use this index to get the right box.
This case starts the same, but turns out to be slightly more
complicated. Start by doing the simple thing you did in the
— it would work now, except that our description of the lexical
environment doesn’t include the global environment names, which will
obviously lead to lots of test failures when almost all tests fail to
As said above, one way to solve this is to include this description, but
we’ll do more instead: if
#f, then use
(at compile time!) to find the global value, and then return a compiled
function that returns that value directly.
Note that this effectively inlines the globals — referring to one immediately returns the required value, with no lookups at all at runtime.
Set case should be changed again now. The problem is that it
throws the same error whether the code tries to
set! an unknown
identifier, or whether it tries to
set! a global binding. These are
two very different errors — and the second one is only forbidden
because we’re inlining the globals. So revise it to throw different
errors in both cases. This is actually very easy: you just use
global-environment, letting it fail as usual if the name is
not found there — and if it didn’t fail, then you can now ignore the
result and throw an error about mutating a global.
Finally, this case should be similar to the revised
Set case, with one
important difference: the error message about using a global should
not be a compile-time error, since this function is used for all
function calls. The error should therefore be deferred to the run-time,
so it happens only if a global identifier is used in an
(But unbound identifier errors should still happen at compile-time, of
Once all of this was done, we no longer need environments in their current form.
First, note that the
global-environment is not used at run-time at
all. Global values are inlined, and de-Bruijn indexes never go there
since it’s not included in the
bindings description. So change
run to start with an empty list instead.
Now change the
global-environment to be a simple association list,
(Listof (List Symbol VAL)), and remove the
lookup is now used only for
global lookups, so change its type accordingly, drop its
and make it always look in
global-environment, and change its name
global-lookup (and change all uses, of course).
Finally, the definition of
FRAME should be
(Listof (Boxof VAL))
(and you can just drop it and write the full type directly in
Once you do this, you’ll need to drop the names from all places that
extend the environment (and Typed Racket will help you with that).
All three variants of the environment-extending functions currently
receive a list of names. This is no longer needed, since the names are
no longer kept in our runtime environment representation. Therefore, it
makes sense to get rid of these as well. The only thing that they are
still used for is when their length is compared when making an extended
environment, so dropping the
names argument means that we don’t do
this test too.
In the two binders cases (
BindRec) this is just a matter of
removing the names, and nothing else is needed. (For
BindRec, it’s a
little easier to use the code version that builds and keeps the boxes
explicitly, as described above.) The lack of length test is fine, since
in these cases the length of the new values was always correct, and
since there are no names to be matched to the values now.
The other case is when applying a user-defined function (only
PrimVs) — and in this case we should still check the length,
so we can’t just blindly remove the list of names. So instead of the
list of names, change
FunV to hold just their length — the
function’s arity (use
Natural here too). Make sure that the length is
calculated at compile-time. In the code for calling a function, compare
that number against the length of
arg-exprs, which you should also
compute at compile-time.
Once you’re done with that, you’ll find that
raw-extend is now reduced
to just a trivial
cons, so you can get rid of it. Further, you’ll see
extend is also not needed — it’s only job is to box its input
values, but if you fold that into
caller (rename it to
and make it box its result) then that will not be needed too.
Throughout all of this, you can use some code, like
to measure speed improvements. (You can switch off debugging information in DrRacket’s Language dialog to make things run faster.) In my experiments, the original TOY extension ran for 3.06 seconds, the basic compiler from the last homework took 2.31 seconds, and this version took 1.45.