Homework #6: BrangOut: ? Friday, February 9th, Due: ? Thursday, February 15th, 11:00pm

Administrative

The homework is for the same pairs by default. Email me if you have any problems or issues.

The language for this homework is:

#lang pl 06

Introduction

In this homework you will implement a Flang-like language by introducing a new preprocessor step into the interpreter. This step will do two things: it will be similar to a compiler pass in that it will translate one language to another (a lower-level one); and it will be used to add a new feature to the language, where the feature is also implemented by translating it to the lower-level language. More specifically, the preprocessor will translate bindings in the source program to a different representation that uses de-Bruijn indexes (see the class notes for reference), and then you will extend it to translate functions and function calls with more than one argument.

As is commonly done by compiler passes, you will need to keep track of the lexical scope of the code that is preprocessed. This will be done by adding an environment argument to the preprocessor which will keep track of the bindings. Note that this is not the same as the runtime environment argument in eval, it only keeps track of the names that are currently in scope — there are no runtime values at this point.

There is also a small twist here: the program processor will use a functional environment for this — implemented using a real Racket mapping function instead of an association list.

Begin your work with the Flang interpreter that uses environments.

Preliminaries

First of all, rename the language to avoid possible confusion. Since this is going to be a Flang variation that uses de-Bruijn indexes, we name this interpreter “Brang”. Edit the file accordingly.

As mentioned above, the transformation from source to de-Bruijn indexes that we will implement is essentially a new compilation step: it translates one language (Brang) to a different language, which in this case is quite similar except for identifier references. This language will be our “core” language in that it represents what we execute at the low level. We need a new type definition for this language — use CORE for its name. Create this language as a copy of BRANG except that all of its variant names are prefixed by a “C” (CNum, CAdd, etc).

The two languages are identical at this point, but they will drift apart in several important ways, and will end up being quite different. Using the fact that one language represents user input code, and the other represents what our core evaluator knows, we will get the flexibility of being able to develop each part independently of the other. Generally speaking, the BNF will be updated for the actual language, and on the other hand, the formal evaluation rules will be updated to follow the core language (so they’d end up incomplete since they don’t show how the user language runs). Ideally, both would get updated, but we skip it to make things easier.

To complete the new type definition, you need to modify it to fit its first purpose of using de-Bruijn indexes instead of identifier names. There are several things to change: first, instead of CId, it should have a CRef for identifier references — and it will have a non-negative integer index, and for that you will need to use the Natural type. Continue by eliminating all other mentions of identifier names in the AST (for example, CFun is going to be a one-argument constructor). Note that “AST” in this context is technically incorrect, since it’s no longer an abstract representation of the input syntax, but an internal language.

Run-Time Environments

The preprocessed programs that we will be running will have no identifiers, as they will all be translated into de-Bruijn references. You can therefore remove the code that deals with run-time environments: the ENV type definition, and the lookup function.

You still need some definition for an ENV type because closures still hold environments, but the only thing we need in an environment is a list of values — when we encounter a reference, it will tell us what position in this list is needed. Define ENV as the appropriate type.

(define-type ENV = fill-in)

Before we change eval, it is worthwhile to go over our evaluation rules, which can guide us through changing eval. There are two changes that are required:

You can now reflect these changes in the eval code. First, change all BRANG variant names to the CORE variants (in types too). Note that there is no CId — you have to use CRef instead.

Next, adjust the rest of the eval code as you did with the formal rules: specifically, you will need to replace Extend according to the new ENV type. The new eval code should be a little simpler than the original. There will be a few other parts of the code that will need to be updated.

Static Environments

The basic idea of the preprocessor is to scan the input program, and convert identifiers into CRef values. This would be a mostly-routine function that translates BRANG values to CORE values. But there is one important thing: when we encounter an identifier, we need to know the “shape” of the current scope — a mapping from identifiers to their de-Bruijn indexes. In other words, a kind of a compile-time environment.

You need to implement this new environment type — and the twist is that you should use functional values for this. These environments map identifiers to non-negative integers (Natural), so we get this type definition (call it DE-ENV):

(define-type DE-ENV = ? -> ?)

To complete the implementation, you need to implement de-empty-env as the empty environment, and de-extend that consumes a DE-ENV and a symbol, and returns the extended environment. Here are a few observations about the behavior of these

For example, the following tests should all work:

(define e1 (de-extend de-empty-env 'b))
(define e2 (de-extend e1 'a))
(test (e1 'a) =error> "...") ; e1 has no mapping for 'a
(test (e1 'b) => 0)          ; and 'b is mapped to 0
(test (e2 'a) => 0)          ; e2 maps 'a to 0
(test (e2 'b) => 1)          ; and now 'b is mapped to 1

The Preprocessor

Using the above, implementing the preprocessor should be easy. As said above, it is a simple recursive scan of its input that translates a given BRANG value (and a DE-ENV mapping) to the corresponding CORE value. The only interesting cases are ones that deal with scope: the two cases where a new scope is introduced, and dealing with identifiers.

Finally, you need to connect the preprocessor and the evaluator in run: it should now use preprocess over the parsed input (and an empty static environment), and the result is sent to eval with an empty run-time environment (which is now just an empty list).

You should have a working evaluator at this point. This would be a good time to add enough tests to cover your code.

CORE Simplification

Now that everything is working, we can work on the two parts independently. Begin with a change to the CORE type and the evaluator: there is no need for a CWith at the core — in class, we’ve seen that we can get local bindings with a simple function application. So for this part, begin by removing CWith variant from the CORE definition, and remove the corresponding eval case. To get back to a working evaluator, you need to extend the preprocessor code and translate Withs to the appropriate combination of CCall and CFun.

Notice how having a preprocessor allows changing the core evaluator implementation, with no changes to the language that users see.

Language Extension

We can also do the other side of this: extend the language, but deal with the extension through the preprocessor, so the core language doesn’t need to change.

To demonstrate this, extend the language with fun forms of more than one argument and a corresponding extension of call forms with more than one argument. Do this by translating them to curried functions and curried function calls. You will need to begin by changing the BNF, the BRANG type definition, and the parser.

Remember: <foo> <foo> ... means one or more <foo>s. Following this points to one convenient representation: have one field hold a “thing”, and another field hold a list of “more things”. It avoids the need for a list that can never be empty. However, for the actual type definition you may prefer to have a single field with a “non-empty list of things” — it works fine even without some “non-empty-list-of” type because assuming that the list is non-empty is risking only runtime errors (and you will know that the list is never empty).

Finally, you need to extend the preprocess function so it deals with translating away functions and applications of more than a single argument. When you do this, avoid using match over the lists (it would be convenient, but there are some technical problems with it). Also, note that the translation can be done in two ways: either translate the input to a different BRANG and continue, or translate to the final CORE. One of these will usually be more convenient.

Remember to add tests for the new feature.