Homework #14: Better Toy Compiler (active)Out:   Monday, November 11th, Due:   Wednesday, November 20th, 11:23pm (master homework)

Administrative

This homework is only required for master students. Undergrads can do it too, and you will be graded as usual — but you will not be able to withdraw a graded homework grade, so submit only if you think that your work is good enough to improve your grade. (Note: the homework grades are all weighted to compute your overall grade, so adding another grade means that you get smaller portions for more grades, which can serve as protection against local disasters.)

Note that while undergrads are not required to submit a solution, it will be a very good idea to read the homework and go through the solution when it is later posted — since you are expected to know the material.

The language for this homework is:

#lang pl 14

In the last homework the Toy interpreter was turned into a compiler. In this one, this work is further extended: the compiler will inline global bindings, and use de-Bruijn indexes to avoid name lookup in the compiled code. This would still be far from real compilers, but it should give you a better idea of moving work from the run-time to the compile-time.

As usual, begin your work with the posted solution to the previous homework.

Preliminaries

The main change for the compiler is passing around a description of the bindings in the current lexical environment — which will happen completely at compile-time. This will later motivate all the other changes, so it’s a good first step:

Doing this raises the question of how to deal with the global environment. The obvious way to deal with it is to start things (the first compile call in run) with its BINDINGS description — a list holding a list with all names in the global environment. However, we will eventually inline uses of the global environment, which means that it will turn to be something that gets used only by the compiler, a simple mapping from names to values rather than an extensible value. So use an empty list as the initial description.

Note that there are a few uses of compile as a value that is passed to map. For these, it will be convenient for you to define an internal compile* helper that passes along the same bindings to compile.

When you are done with this step, the code should run fine as before. (But obviously the compiler is only passing the description around, so nothing else changed.)

Finding a Name Index(es)

We pass around the bindings environment description for one reason: to be able to get the index that corresponds to some given name. This should be your next step then. Add a new function — find-index — that does that. The function receives a symbol and a bindings description, and returns either a #f if the symbol is not in the description or a list of two natural numbers (List Natural Natural) that correspond to the location. (It’s a good idea to use Racket’s index-of for the inner loop, but be sure to avoid redundant work.)

Here are some tests to clarify it:

(test (find-index 'a '((a b c) () (c d e))) => '(0 0))
(test (find-index 'e '((a b c) () (c d e))) => '(2 2))
(test (find-index 'c '((a b c) () (c d e))) => '(0 2))
(test (find-index 'x '((a b c) () (c d e))) => #f)

(Note that usually we don’t use tests other than api-level tests, but in this case they can stay to make it easier to debug the code.)

The next step is to use the found index, but before you do that, it will be a good idea to switch to a little more convenient representation for environments. Instead of the type definition, use this:

(define-type ENV = (Listof FRAME))

and change the rest of the code accordingly. This should be a simple change, and Typed Racket will tell you about the places that need to be updated.

Using the Indexes, and Inlining Globals

Once the above is done, you can start using the indexes. This should happen in all places where lookup is used — there are four such places. The first is in extend-rec, and then there are uses in the code for looking up the value of an identifier, for mutating one, and in compile-get-boxes. In each of these cases you should use find-index to find the index at compile-time, then use the index at runtime with two list-refs to pull out the right box from the nested list structure that an environment is now. Each of these should have a slightly different way of dealing with this:

extend-rec

This case is a bit of an exception since the use of lookup there is only serving to simplify the code. Instead of this, you could be change it to avoid the lookup, for example, you can construct the new frame using a known list of boxes instead of delegating to extend, and then set these boxes directly. Another way to look at this is that the code doesn’t actually need to use the names, since you know that the indexes are (0 0) to (0 N-1) when we have N bindings.

Set

This is the simplest case: you get the index for the identifier at compile-time, and throw a compilation error if it is not found. (The exact error message shouldn’t matter for now, since it will be revised in the next item.) At runtime, use this index to get the right box.

Id

This case starts the same, but turns out to be slightly more complicated. Start by doing the simple thing you did in the Set case — it would work now, except that our description of the lexical environment doesn’t include the global environment names, which will obviously lead to lots of test failures when almost all tests fail to compile.

As said above, one way to solve this is to include this description, but we’ll do more instead: if find-index returned #f, then use lookup (at compile time!) to find the global value, and then return a compiled function that returns that value directly.

Note that this effectively inlines the globals — referring to one immediately returns the required value, with no lookups at all at runtime.

Back to Set

The Set case should be changed again now. The problem is that it throws the same error whether the code tries to set! an unknown identifier, or whether it tries to set! a global binding. These are two very different errors — and the second one is only forbidden because we’re inlining the globals. So revise it to throw different errors in both cases. This is actually very easy: you just use lookup with the global-environment, letting it fail as usual if the name is not found there — and if it didn’t fail, then you can now ignore the result and throw an error about mutating a global.

compile-get-boxes

Finally, this case should be similar to the revised Set case, with one important difference: the error message about using a global should not be a compile-time error, since this function is used for all function calls. The error should therefore be deferred to the run-time, so it happens only if a global identifier is used in an rfun call. (But unbound identifier errors should still happen at compile-time, of course.)

Cleanup: Getting Rid of Environments

Once all of this was done, we no longer need environments in their current form.

More Cleanup: Getting Rid of Names

All three variants of the environment-extending functions currently receive a list of names. This is no longer needed, since the names are no longer kept in our runtime environment representation. Therefore, it makes sense to get rid of these as well. The only thing that they are still used for is when their length is compared when making an extended environment, so dropping the names argument means that we don’t do this test too.

In the two binders cases (Bind and BindRec) this is just a matter of removing the names from the runtime, and nothing else is needed. (For BindRec, it’s a little easier to use the code version that builds and keeps the boxes explicitly, as described above.) The lack of length test is fine, since in these cases the length of the new values was always correct, and since there are no names to be matched to the values now.

The other case is when applying a user-defined function (only FunVs, not not PrimVs) — and in this case we should still check the length, so we can’t just blindly remove the list of names. So instead of the list of names, change FunV to hold just their length — the function’s arity (use Natural here too). Make sure that the length is calculated at compile-time. In the code for calling a function, compare that number against the length of arg-exprs, which you should also compute at compile-time.

Once you’re done with that, you’ll find that raw-extend is now reduced to just a trivial cons, so you can get rid of it. Further, you’ll see that extend is also not needed — it’s only job is to box its input values, but if you fold that into caller (rename it to boxed-caller, and make it box its result) then that will not be needed too.

Done

Throughout all of this, you can use some code, like

(time (run "{bindrec {{fib {fun {n}
                            {if {< n 2}
                              n
                              {+ {fib {- n 1}}
                                  {fib {- n 2}}}}}}}
              {fib 27}}"))

to measure speed improvements. (You can switch off debugging information in DrRacket’s Language dialog to make things run faster.) In my experiments, the original TOY extension ran for 3.06 seconds, the basic compiler from the last homework took 2.31 seconds, and this version took 1.45.