Administrative
The homework is for the same pairs by default. Email me if you have any problems or issues.
The language for this homework is:
Introduction
In this homework you will implement a Flang-like language by introducing a new preprocessor step into the interpreter. This step will do two things: it will be similar to a compiler pass in that it will translate one language to another (a lower-level one); and it will be used to add a new feature to the language, where the feature is also implemented by translating it to the lower-level language. More specifically, the preprocessor will translate bindings in the source program to a different representation that uses de-Bruijn indexes (see the class notes for reference), and then you will extend it to translate functions and function calls with more than one argument.
As is commonly done by compiler passes, you will need to keep track of
the lexical scope of the code that is preprocessed. This will be done
by adding an environment argument to the preprocessor which will keep
track of the bindings. Note that this is not the same as the
runtime environment argument in eval
, it only keeps track of the
names that are currently in scope — there are no runtime values at
this point.
There is also a small twist here: the program processor will use a functional environment for this — implemented using real Racket mapping functions instead of association lists.
Begin your work with the Flang interpreter that uses environments.
Preliminaries
First of all, rename the language to avoid possible confusion. Since this is going to be a Flang variation that uses de-Bruijn indexes, we name this interpreter “Brang”. Edit the file accordingly.
As mentioned above, the transformation from source to de-Bruijn indexes
that we will implement is essentially a new compilation step: it
translates one language (Brang) to a different language, which in this
case is quite similar except for identifier references. This language
will be our “core” language in that it represents what we execute at the
low level. We need a new type definition for this language — use
CORE
for its name. Create this language as a copy of BRANG
except
that all of its variant names are prefixed by a “C
” (CNum
, CAdd
,
etc).
The two languages are identical at this point, but they will drift apart in several important ways, and will end up being quite different. Using the fact that one language represents user input code, and the other represents what our core evaluator knows, we will get the flexibility of being able to develop each part independently of the other. Generally speaking, the BNF will be updated for the actual language, and on the other hand, the formal evaluation rules will be updated to follow the core language (so they’d end up incomplete since they don’t show how the user language runs). Ideally, the specs would be complete, but to make things easier, all you need to do is update the BNF to reflect the actual language, and the evaluation rules for the core language (using the BNF syntax for Brang: this is technically wrong, but it simplifies what you need to do).
To complete the new type definition, you need to modify it to fit its
first purpose of using de-Bruijn indexes instead of identifier names.
There are several things to change: first, instead of CId
, it should
have a CRef
for identifier references — and it will have a
non-negative integer index (and for that you will need to use the
Natural
type). Continue by eliminating all other mentions of
identifier names in the AST (for example, CFun
is going to be a
one-argument constructor). Note that “AST” in this context is
technically incorrect, since it’s no longer an abstract representation
of the input syntax, but an internal language.
Run-Time Environments
The preprocessed programs that we will be running will have no
identifiers, as they will all be translated into de-Bruijn references.
You can therefore remove the code that deals with run-time environments:
the ENV
type definition, and the lookup
function.
You still need some definition for an ENV
type because closures still
hold environments, but the only thing we need in an environment is a
list of values — when we encounter a reference, it will tell us what
position in this list is needed. Define ENV
as the appropriate type.
Before we change eval
, it is worthwhile to go over our evaluation
rules, which can guide us through changing eval
. There are two
changes that are required:
-
First, we do not have identifiers, and we do not use
lookup
. Instead, when we encounter aCRef
, we use it as an index to pull out a value from the environment, which is now a simple list of values. In other words, we now have this rule:eval(CRef(0),cons(v,rest)) = v
eval(CRef(N),cons(v,rest)) = eval(CRef(N-1),rest) if N>0or write this more conveniently as:
eval(CRef(N),env) = list-ref(env,N)(Note that the evaluation rules are slightly bogus, since they still use concrete syntax for an internal language that has no concrete syntax. But we still keep this for convenience.)
-
The second change is in the way environments are extended. Previously, we used
extend
in the formal rules, which was a function of three arguments. Replace this with something appropriate.
You can now reflect these changes in the eval
code. First, change all
BRANG
variant names to the CORE
variants (in types too). Note that
there is no CId
— you have to use CRef
instead.
Next, adjust the rest of the eval
code as you did with the formal
rules: specifically, you will need to replace Extend
according to the
new ENV
type. The new eval
code should be a little simpler than the
original. There will be a few other parts of the code that will need to
be updated.
Static Environments
The basic idea of the preprocessor is to scan the input program, and
convert identifiers into CRef
values. This would be a mostly-routine
function that translates BRANG
values to CORE
values. But there is
one important thing: when we encounter an identifier, we need to know
the “shape” of the current scope — a mapping from identifiers to their
de-Bruijn indexes. In other words, a kind of a compile-time
environment.
You need to implement this new environment type — and the twist is
that you should use functional values for this. These environments map
identifiers to non-negative integers (Natural
), so we get this type
definition (call it DE-ENV
; fill in the "?"s):
To complete the implementation, you need to implement de-empty-env
as
the empty environment, and de-extend
that consumes a DE-ENV and a
symbol, and returns the extended environment. Here are a few
observations about the behavior of these
de-empty-env
is a vacuous mapping — it always throws an error.- For any symbol x and DE-ENV e,
(de-extend e x)
should map x to0
. - For any symbol x and DE-ENV e, if e maps x to n,
then (for a different symbol y)
(de-extend e y)
should map x to n+1.
For example, the following tests should all work:
(define e2 (de-extend e1 'a))
(test (e1 'a) =error> "...") ; e1 has no mapping for 'a
(test (e1 'b) => 0) ; and 'b is mapped to 0
(test (e2 'a) => 0) ; e2 maps 'a to 0
(test (e2 'b) => 1) ; and now 'b is mapped to 1
(These tests can be useful while you implement this functionality, but
your submission should be tested only via run
, as usual.)
The Preprocessor
Using the above, implementing the preprocessor should be easy. As said
above, it is a simple recursive scan of its input that translates a
given BRANG
value (and a DE-ENV
mapping) to the corresponding CORE
value. The only interesting cases are ones that deal with scope: the
two cases where a new scope is introduced, and dealing with identifiers.
Finally, you need to connect the preprocessor and the evaluator in
run
: it should now use preprocess
over the parsed input (and an
empty static environment), and the result is sent to eval
with an
empty run-time environment (which is now just an empty list).
You should have a working evaluator at this point. This would be a good time to add enough tests to cover your code.
CORE
Simplification
Now that everything is working, we can work on the two parts
independently. Begin with a change to the CORE
type and the
evaluator: there is no need for a CWith
at the core — in class,
we’ve seen that we can get local bindings with a simple function
application. So for this part, begin by removing CWith
variant from
the CORE
definition, and remove the corresponding eval
case. To get
back to a working evaluator, you need to extend the preprocessor code
and translate With
s to the appropriate combination of CCall
and
CFun
.
Notice how having a preprocessor allows changing the core evaluator implementation, with no changes to the language that users see.
Language Extension
We can also do the other side of this: extend the language, but deal with the extension through the preprocessor, so the core language doesn’t need to change.
To demonstrate this, extend the language with fun
forms of more than
one argument and a corresponding extension of call
forms with more
than one argument, e.g., {call {fun {x y} {+ x y}} 1 2}
. Do this by
translating them (= preprocessing them) to curried CORE functions and
curried CORE function calls. You will need to begin by changing the BNF,
the BRANG
type definition, and the parser.
Remember: <foo> <foo> ...
means one or more <foo>
s. Following
this points to one convenient representation: have one field hold a
“thing”, and another field hold a list of “more things”. It avoids the
need for a list that can never be empty. However, for the actual type
definition you may prefer to have a single field with a “non-empty list
of things” — it works fine even without some “non-empty-list-of” type
because assuming that the list is non-empty is risking only runtime
errors (and you will know that the list is never empty).
Finally, you need to extend the preprocess
function so it deals with
translating away functions and applications of more than a single
argument. When you do this, avoid using match
over the lists (it
would be convenient, but there are some technical problems with it).
Also, note that the translation can be done in two ways: either
translate the input to a different BRANG
and continue, or translate to
the final CORE
. One of these will usually be more convenient.
Remember to add tests for the new feature.