Administrative
This homework is only required for master students. Undergrads can do it too, and you will be graded as usual — but you will not be able to withdraw a graded homework grade, so submit only if you think that your work is good enough to improve your grade. (Note: the homework grades are all weighted to compute your overall grade, so adding another grade means that you get smaller portions for more grades, which can serve as protection against local disasters.)
Note that while undergrads are not required to submit a solution, it will be a very good idea to read the homework and go through the solution when it is later posted — since you are expected to know the material.
The language for this homework is:
In the last homework the Toy interpreter was turned into a compiler. In this one, this work is further extended: the compiler will inline global bindings, and use de-Bruijn indexes to avoid name lookup in the compiled code. This would still be far from real compilers, but it should give you a better idea of moving work from the run-time to the compile-time.
As usual, begin your work with the posted solution to the previous homework.
Preliminaries
The main change for the compiler is passing around a description of the bindings in the current lexical environment — which will happen completely at compile-time. This will later motivate all the other changes, so it’s a good first step:
-
Start with defining a type for this (in the “Compilation” part of the code):
(define-type BINDINGS = (Listof (Listof Symbol))) -
Now add a
bindings
argument to all three compilation functions:compile
,compile-body
, andcompile-get-boxes
. -
Fix all calls to these functions to pass the correct value. This means that in most cases the new argument is passed around as-is, except for compiling forms that introduce new bindings which should add the new names.
-
Note that there should be a difference between the way new bindings are made in
Bind
andBindRec
.
Doing this raises the question of how to deal with the global
environment. The obvious way to deal with it is to start things (the
first compile
call in run
) with its BINDINGS
description — a
list holding a list with all names in the global environment. However,
we will eventually inline uses of the global environment, which means
that it will turn to be something that gets used only by the compiler, a
simple mapping from names to values rather than an extensible value. So
use an empty list as the initial description.
Note that there are a few uses of compile
as a value that is passed to
map
. For these, it will be convenient for you to define an internal
compile*
helper that passes along the same bindings
to compile
.
When you are done with this step, the code should run fine as before. (But obviously the compiler is only passing the description around, so nothing else changed.)
Finding a Name Index(es)
We pass around the bindings
environment description for one reason: to
be able to get the index that corresponds to some given name. This
should be your next step then. Add a new function — find-index
—
that does that. The function receives a symbol and a bindings
description, and returns either a #f
if the symbol is not in the
description or a list of two natural numbers (List Natural Natural)
that correspond to the location. (It’s a good idea to use Racket’s
index-of
for the inner loop, but be sure to avoid redundant work.)
Here are some tests to clarify it:
(test (find-index 'e '((a b c) () (c d e))) => '(2 2))
(test (find-index 'c '((a b c) () (c d e))) => '(0 2))
(test (find-index 'x '((a b c) () (c d e))) => #f)
(Note that usually we don’t use tests other than api-level tests, but in this case they can stay to make it easier to debug the code.)
The next step is to use the found index, but before you do that, it will be a good idea to switch to a little more convenient representation for environments. Instead of the type definition, use this:
and change the rest of the code accordingly. This should be a simple change, and Typed Racket will tell you about the places that need to be updated.
Using the Indexes, and Inlining Globals
Once the above is done, you can start using the indexes. This should
happen in all places where lookup
is used — there are four such
places. The first is in extend-rec
, and then there are uses in the
code for looking up the value of an identifier, for mutating one, and in
compile-get-boxes
. In each of these cases you should use find-index
to find the index at compile-time, then use the index at runtime with
two list-ref
s to pull out the right box from the nested list structure
that an environment is now. Each of these should have a slightly
different way of dealing with this:
extend-rec
This case is a bit of an exception since the use of lookup
there is
only serving to simplify the code. Instead of this, you could be change
it to avoid the lookup
, for example, you can construct the new frame
using a known list of boxes instead of delegating to extend
, and then
set these boxes directly. Another way to look at this is that the code
doesn’t actually need to use the names, since you know that the indexes
are (0 0)
to (0 N-1)
when we have N bindings.
Set
This is the simplest case: you get the index for the identifier at compile-time, and throw a compilation error if it is not found. (The exact error message shouldn’t matter for now, since it will be revised in the next item.) At runtime, use this index to get the right box.
Id
This case starts the same, but turns out to be slightly more
complicated. Start by doing the simple thing you did in the Set
case
— it would work now, except that our description of the lexical
environment doesn’t include the global environment names, which will
obviously lead to lots of test failures when almost all tests fail to
compile.
As said above, one way to solve this is to include this description, but
we’ll do more instead: if find-index
returned #f
, then use lookup
(at compile time!) to find the global value, and then return a compiled
function that returns that value directly.
Note that this effectively inlines the globals — referring to one immediately returns the required value, with no lookups at all at runtime.
Back to Set
The Set
case should be changed again now. The problem is that it
throws the same error whether the code tries to set!
an unknown
identifier, or whether it tries to set!
a global binding. These are
two very different errors — and the second one is only forbidden
because we’re inlining the globals. So revise it to throw different
errors in both cases. This is actually very easy: you just use lookup
with the global-environment
, letting it fail as usual if the name is
not found there — and if it didn’t fail, then you can now ignore the
result and throw an error about mutating a global.
compile-get-boxes
Finally, this case should be similar to the revised Set
case, with one
important difference: the error message about using a global should
not be a compile-time error, since this function is used for all
function calls. The error should therefore be deferred to the run-time,
so it happens only if a global identifier is used in an rfun
call.
(But unbound identifier errors should still happen at compile-time, of
course.)
Cleanup: Getting Rid of Environments
Once all of this was done, we no longer need environments in their current form.
-
First, note that the
global-environment
is not used at run-time at all. Global values are inlined, and de-Bruijn indexes never go there since it’s not included in thebindings
description. So changerun
to start with an empty list instead. -
Now change the
global-environment
to be a simple association list,(Listof (List Symbol VAL))
, and remove thebox
fromracket-func->prim-val
. Furthermore,lookup
is now used only for global lookups, so change its type accordingly, drop itsenv
input and make it always look inglobal-environment
, and change its name toglobal-lookup
(and change all uses, of course). -
Finally, the definition of
FRAME
should be(Listof (Boxof VAL))
(and you can just drop it and write the full type directly inENV
). Once you do this, you’ll need to drop the names from all places that extend the environment (and Typed Racket will help you with that).
More Cleanup: Getting Rid of Names
All three variants of the environment-extending functions currently
receive a list of names. This is no longer needed, since the names are
no longer kept in our runtime environment representation. Therefore, it
makes sense to get rid of these as well. The only thing that they are
still used for is when their length is compared when making an extended
environment, so dropping the names
argument means that we don’t do
this test too.
In the two binders cases (Bind
and BindRec
) this is just a matter of
removing the names from the runtime, and nothing else is needed. (For
BindRec
, it’s a little easier to use the code version that builds and
keeps the boxes explicitly, as described above.) The lack of length test
is fine, since in these cases the length of the new values was always
correct, and since there are no names to be matched to the values now.
The other case is when applying a user-defined function (only FunV
s,
not not PrimV
s) — and in this case we should still check the length,
so we can’t just blindly remove the list of names. So instead of the
list of names, change FunV
to hold just their length — the
function’s arity (use Natural
here too). Make sure that the length is
calculated at compile-time. In the code for calling a function, compare
that number against the length of arg-exprs
, which you should also
compute at compile-time.
Once you’re done with that, you’ll find that raw-extend
is now reduced
to just a trivial cons
, so you can get rid of it. Further, you’ll see
that extend
is also not needed — it’s only job is to box its input
values, but if you fold that into caller
(rename it to boxed-caller
,
and make it box its result) then that will not be needed too.
Done
Throughout all of this, you can use some code, like
{if {< n 2}
n
{+ {fib {- n 1}}
{fib {- n 2}}}}}}}
{fib 27}}"))
to measure speed improvements. (You can switch off debugging information in DrRacket’s Language dialog to make things run faster.) In my experiments, the original TOY extension ran for 3.06 seconds, the basic compiler from the last homework took 2.31 seconds, and this version took 1.45.