Lecture #6, Tuesday, February 1st
=================================
 Evaluation of `with` (contd.)
 Formal Specs
 Lazy vs Eager Evaluation
 de Bruijn Indexes

# Evaluation of `with` (contd.)
Oops, this program still has problems that were caught by the tests 
we encounter unexpected free identifier errors. What's the problem now?
In expressions like:
{with {x 5}
{with {y x}
y}}
we forgot to substitute `x` in the expression that `y` is bound to. We
need to the recursive substitute in both the with's body expression as
well as its named expression:
(: subst : WAE Symbol WAE > WAE)
;; substitutes the second argument with the third argument in the
;; first argument, as per the rules of substitution; the resulting
;; expression contains no free instances of the second argument
(define (subst expr from to)
(cases expr
[(Num n) expr]
[(Add l r) (Add (subst l from to) (subst r from to))]
[(Sub l r) (Sub (subst l from to) (subst r from to))]
[(Mul l r) (Mul (subst l from to) (subst r from to))]
[(Div l r) (Div (subst l from to) (subst r from to))]
[(Id name) (if (eq? name from) to expr)]
[(With boundid namedexpr boundbody)
(if (eq? boundid from)
expr
(With boundid
(subst namedexpr from to) ;*** new
(subst boundbody from to)))]))
And *still* we have a problem... Now it's
{with {x 5}
{with {x x}
x}}
that halts with an error, but we want it to evaluate to `5`! Carefully
trying out our substitution code reveals the problem: when we substitute
`5` for the outer `x`, we don't go inside the inner `with` because it
has the same name  but we *do* need to go into its named expression.
We need to substitute in the named expression even if the identifier is
the *same* one we're substituting:
(: subst : WAE Symbol WAE > WAE)
;; substitutes the second argument with the third argument in the
;; first argument, as per the rules of substitution; the resulting
;; expression contains no free instances of the second argument
(define (subst expr from to)
(cases expr
[(Num n) expr]
[(Add l r) (Add (subst l from to) (subst r from to))]
[(Sub l r) (Sub (subst l from to) (subst r from to))]
[(Mul l r) (Mul (subst l from to) (subst r from to))]
[(Div l r) (Div (subst l from to) (subst r from to))]
[(Id name) (if (eq? name from) to expr)]
[(With boundid namedexpr boundbody)
(With boundid
(subst namedexpr from to)
(if (eq? boundid from)
boundbody
(subst boundbody from to)))]))
The complete (and, finally, correct) version of the code is now:
;;; <<>>
#lang pl
# BNF for the WAE language:
::=
 { + }
 {  }
 { * }
 { / }
 { with { } }

#
;; WAE abstract syntax trees
(definetype WAE
[Num Number]
[Add WAE WAE]
[Sub WAE WAE]
[Mul WAE WAE]
[Div WAE WAE]
[Id Symbol]
[With Symbol WAE WAE])
(: parsesexpr : Sexpr > WAE)
;; parses sexpressions into WAEs
(define (parsesexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(symbol: name) (Id name)]
[(cons 'with more)
(match sexpr
[(list 'with (list (symbol: name) named) body)
(With name (parsesexpr named) (parsesexpr body))]
[else (error 'parsesexpr "bad `with' syntax in ~s" sexpr)])]
[(list '+ lhs rhs) (Add (parsesexpr lhs) (parsesexpr rhs))]
[(list ' lhs rhs) (Sub (parsesexpr lhs) (parsesexpr rhs))]
[(list '* lhs rhs) (Mul (parsesexpr lhs) (parsesexpr rhs))]
[(list '/ lhs rhs) (Div (parsesexpr lhs) (parsesexpr rhs))]
[else (error 'parsesexpr "bad syntax in ~s" sexpr)]))
(: parse : String > WAE)
;; parses a string containing a WAE expression to a WAE AST
(define (parse str)
(parsesexpr (string>sexpr str)))
# Formal specs for `subst':
(`N' is a , `E1', `E2' are s, `x' is some ,
`y' is a *different* )
N[v/x] = N
{+ E1 E2}[v/x] = {+ E1[v/x] E2[v/x]}
{ E1 E2}[v/x] = { E1[v/x] E2[v/x]}
{* E1 E2}[v/x] = {* E1[v/x] E2[v/x]}
{/ E1 E2}[v/x] = {/ E1[v/x] E2[v/x]}
y[v/x] = y
x[v/x] = v
{with {y E1} E2}[v/x] = {with {y E1[v/x]} E2[v/x]}
{with {x E1} E2}[v/x] = {with {x E1[v/x]} E2}
#
(: subst : WAE Symbol WAE > WAE)
;; substitutes the second argument with the third argument in the
;; first argument, as per the rules of substitution; the resulting
;; expression contains no free instances of the second argument
(define (subst expr from to)
(cases expr
[(Num n) expr]
[(Add l r) (Add (subst l from to) (subst r from to))]
[(Sub l r) (Sub (subst l from to) (subst r from to))]
[(Mul l r) (Mul (subst l from to) (subst r from to))]
[(Div l r) (Div (subst l from to) (subst r from to))]
[(Id name) (if (eq? name from) to expr)]
[(With boundid namedexpr boundbody)
(With boundid
(subst namedexpr from to)
(if (eq? boundid from)
boundbody
(subst boundbody from to)))]))
# Formal specs for `eval':
eval(N) = N
eval({+ E1 E2}) = eval(E1) + eval(E2)
eval({ E1 E2}) = eval(E1)  eval(E2)
eval({* E1 E2}) = eval(E1) * eval(E2)
eval({/ E1 E2}) = eval(E1) / eval(E2)
eval(id) = error!
eval({with {x E1} E2}) = eval(E2[eval(E1)/x])
#
(: eval : WAE > Number)
;; evaluates WAE expressions by reducing them to numbers
(define (eval expr)
(cases expr
[(Num n) n]
[(Add l r) (+ (eval l) (eval r))]
[(Sub l r) ( (eval l) (eval r))]
[(Mul l r) (* (eval l) (eval r))]
[(Div l r) (/ (eval l) (eval r))]
[(With boundid namedexpr boundbody)
(eval (subst boundbody
boundid
(Num (eval namedexpr))))]
[(Id name) (error 'eval "free identifier: ~s" name)]))
(: run : String > Number)
;; evaluate a WAE program contained in a string
(define (run str)
(eval (parse str)))
;; tests
(test (run "5") => 5)
(test (run "{+ 5 5}") => 10)
(test (run "{with {x 5} {+ x x}}") => 10)
(test (run "{with {x {+ 5 5}} {+ x x}}") => 20)
(test (run "{with {x 5} {with {y { x 3}} {+ y y}}}") => 4)
(test (run "{with {x {+ 5 5}} {with {y { x 3}} {+ y y}}}") => 14)
(test (run "{with {x 5} {+ x {with {x 3} 10}}}") => 15)
(test (run "{with {x 5} {+ x {with {x 3} x}}}") => 8)
(test (run "{with {x 5} {+ x {with {y 3} x}}}") => 10)
(test (run "{with {x 5} {with {y x} y}}") => 5)
(test (run "{with {x 5} {with {x x} x}}") => 5)
(test (run "{with {x 1} y}") =error> "free identifier")

Reminder:
* We started doing substitution, with a `let`like form: `with`.
* Reasons for using bindings:
 Avoid writing expressions twice.
* More expressive language (can express identity).
* Duplicating is bad! ("DRY": *Don't Repeat Yourself*.)
* Avoids *static* redundancy.
 Avoid redundant computations.
* More than *just* an optimization when it avoids exponential
resources.
* Avoids *dynamic* redundancy.
* BNF:
::=
 { + }
 {  }
 { * }
 { / }
 { with { } }

Note that we had to introduce two new rules: one for introducing an
identifier, and one for using it.
* Type definition:
(definetype WAE
[Num Number]
[Add WAE WAE]
[Sub WAE WAE]
[Mul WAE WAE]
[Div WAE WAE]
[Id Symbol]
[With Symbol WAE WAE])
* Parser:
(: parsesexpr : Sexpr > WAE)
;; parses sexpressions into WAEs
(define (parsesexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(symbol: name) (Id name)]
[(cons 'with more)
(match sexpr
[(list 'with (list (symbol: name) named) body)
(With name (parsesexpr named) (parsesexpr body))]
[else (error 'parsesexpr "bad `with' syntax in ~s"
sexpr)])]
[(list '+ lhs rhs) (Add (parsesexpr lhs) (parsesexpr rhs))]
[(list ' lhs rhs) (Sub (parsesexpr lhs) (parsesexpr rhs))]
[(list '* lhs rhs) (Mul (parsesexpr lhs) (parsesexpr rhs))]
[(list '/ lhs rhs) (Mul (parsesexpr lhs) (parsesexpr rhs))]
[else (error 'parsesexpr "bad syntax in ~s" sexpr)]))
* We need to define substitution. Terms:
1. Binding Instance.
2. Scope.
3. Bound Instance.
4. Free Instance.
* After lots of attempts:
> e[v/i]  To substitute an identifier `i` in an expression `e`
> with an expression `v`, replace all instances of `i` that are free
> in `e` with the expression `v`.
* Implemented the code, and again, needed to fix a few bugs:
(: subst : WAE Symbol WAE > WAE)
;; substitutes the second argument with the third argument in the
;; first argument, as per the rules of substitution; the resulting
;; expression contains no free instances of the second argument
(define (subst expr from to)
(cases expr
[(Num n) expr]
[(Add l r) (Add (subst l from to) (subst r from to))]
[(Sub l r) (Sub (subst l from to) (subst r from to))]
[(Mul l r) (Mul (subst l from to) (subst r from to))]
[(Div l r) (Div (subst l from to) (subst r from to))]
[(Id name) (if (eq? name from) to expr)]
[(With boundid namedexpr boundbody)
(With boundid
(subst namedexpr from to)
(if (eq? boundid from)
boundbody
(subst boundbody from to)))]))
(Note that the bugs that we fixed clarify the exact way that our
scopes work: in `{with {x 2} {with {x {+ x 2}} x}}`, the scope of the
first `x` is the `{+ x 2}` expression.)
* We then extended the AE evaluation rules:
eval(...) = ... same as the AE rules ...
eval({with {x E1} E2}) = eval(E2[eval(E1)/x])
eval(id) = error!
and noted the possible type problem.
* The above translated into a Racket definition for an `eval` function
(with a hack to avoid the type issue):
(: eval : WAE > Number)
;; evaluates WAE expressions by reducing them to numbers
(define (eval expr)
(cases expr
[(Num n) n]
[(Add l r) (+ (eval l) (eval r))]
[(Sub l r) ( (eval l) (eval r))]
[(Mul l r) (* (eval l) (eval r))]
[(Div l r) (/ (eval l) (eval r))]
[(With boundid namedexpr boundbody)
(eval (subst boundbody
boundid
(Num (eval namedexpr))))]
[(Id name) (error 'eval "free identifier: ~s" name)]))

# Formal Specs
Note the formal definitions that were included in the WAE code. They
are ways of describing pieces of our language that are more formal than
plain English, but still not as formal (and as verbose) as the actual
code.
A formal definition of `subst`:
(`N` is a ``, `E1`, `E2` are ``s, `x` is some ``, `y` is a
*different* ``)
N[v/x] = N
{+ E1 E2}[v/x] = {+ E1[v/x] E2[v/x]}
{ E1 E2}[v/x] = { E1[v/x] E2[v/x]}
{* E1 E2}[v/x] = {* E1[v/x] E2[v/x]}
{/ E1 E2}[v/x] = {/ E1[v/x] E2[v/x]}
y[v/x] = y
x[v/x] = v
{with {y E1} E2}[v/x] = {with {y E1[v/x]} E2[v/x]}
{with {x E1} E2}[v/x] = {with {x E1[v/x]} E2}
And a formal definition of `eval`:
eval(N) = N
eval({+ E1 E2}) = eval(E1) + eval(E2)
eval({ E1 E2}) = eval(E1)  eval(E2)
eval({* E1 E2}) = eval(E1) * eval(E2)
eval({/ E1 E2}) = eval(E1) / eval(E2)
eval(id) = error!
eval({with {x E1} E2}) = eval(E2[eval(E1)/x])

# Lazy vs Eager Evaluation
As we have previously seen, there are two basic approaches for
evaluation: either eager or lazy. In lazy evaluation, bindings are used
for sort of textual references  it is only for avoiding writing an
expression twice, but the associated computation is done twice anyway.
In eager evaluation, we eliminate not only the textual redundancy, but
also the computation.
Which evaluation method did our evaluator use? The relevant piece of
formalism is the treatment of `with`:
eval({with {x E1} E2}) = eval(E2[eval(E1)/x])
And the matching piece of code is:
[(With boundid namedexpr boundbody)
(eval (subst boundbody
boundid
(Num (eval namedexpr))))]
How do we make this lazy?
In the formal equation:
eval({with {x E1} E2}) = eval(E2[E1/x])
and in the code:
(: eval : WAE > Number)
;; evaluates WAE expressions by reducing them to numbers
(define (eval expr)
(cases expr
[(Num n) n]
[(Add l r) (+ (eval l) (eval r))]
[(Sub l r) ( (eval l) (eval r))]
[(Mul l r) (* (eval l) (eval r))]
[(With boundid namedexpr boundbody)
(eval (subst boundbody
boundid
namedexpr))] ;*** no eval and no Num wrapping
[(Id name) (error 'eval "free identifier: ~s" name)]))
We can verify the way this works by tracing `eval` (compare the trace
you get for the two versions):
> (trace eval) ; (put this in the definitions window)
> (run "{with {x {+ 1 2}} {* x x}}")
Ignoring the traces for now, the modified WAE interpreter works as
before, specifically, all tests pass. So the question is whether the
language we get is actually different than the one we had before. One
difference is in execution speed, but we can't really notice a
difference, and we care more about meaning. Is there any program that
will run differently in the two languages?
The main feature of the lazy evaluator is that it is not evaluating the
named expression until it is actually needed. As we have seen, this
leads to duplicating computations if the bound identifier is used more
than once  meaning that it does not eliminate the dynamic redundancy.
But what if the bound identifier is not used at all? In that case the
named expression simply evaporates. This is a good hint at an
expression that behaves differently in the two languages  if we add
division to both languages, we get a different result when we try
running:
{with {x {/ 8 0}} 7}
The eager evaluator stops with an error when it tries evaluating the
division  and the lazy evaluator simply ignores it.
Even without division, we get a similar behavior for
{with {x y} 7}
but it is questionable whether the fact that this evaluates to 7 is
correct behavior  we really want to forbid program that use free
variable.
Furthermore, there is an issue with name capturing  we don't want to
substitute an expression into a context that captures some of its free
variables. But our substitution allows just that, which is usually not
a problem because by the time we do the substitution, the named
expression should not have free variables that need to be replaced.
However, consider evaluating this program:
{with {y x}
{with {x 2}
{+ x y}}}
under the two evaluation regimens: the eager version stops with an
error, and the lazy version succeed. This points at a bug in our
substitution, or rather not dealing with an issue that we do not
encounter.
So the summary is: as long as the initial program is correct, both
evaluation regimens produce the same results. If a program contains
free variables, they might get captured in a naive lazy evaluator
implementation (but this is a bug that should be fixed). Also, there
are some cases where eager evaluation runs into a runtime problem which
does not happen in a lazy evaluator because the expression is not used.
It is possible to prove that when you evaluate an expression, if there
is an error that can be avoided, lazy evaluation will always avoid it,
whereas an eager evaluator will always run into it. On the other hand,
lazy evaluators are usually slower than eager evaluator, so it's a speed
vs. robustness tradeoff.
Note that with lazy evaluation we say that an identifier is bound to an
expression rather than a value. (Again, this is why the eager version
needed to wrap `eval`'s result in a `Num` and this one doesn't.)
(It is possible to change things and get a more well behaved
substitution, we basically will need to find if a capture might happen,
and rename things to avoid it. For example,
{with {y E1} E2}[v/x]
if `x' and `y' are equal
= {with {y E1[v/x]} E2} = {with {x E1[v/x]} E2}
if `y' has a free occurrence in `v'
= {with {y1 E1[v/x]} E2[y1/y][v/x]} ; `y1' is "fresh"
otherwise
= {with {x E1[v/x]} E2[v/x]}
With this, we might have gone through this path in evaluating the above:
{with {y x} {with {x 2} {+ x y}}}
{with {x₁ 2} {+ x₁ x}} ; note that x₁ is a fresh name, not x
{+ 2 x}
error: free `x`
But you can see that this is much more complicated (more code: requires
a `freein` predicate, being able to invent new *fresh* names, etc).
And it's not even the end of that story...)

# de Bruijn Indexes
This whole story revolves around names, specifically, name capture is a
problem that should always be avoided (it is one major source of PL
headaches).
But are names the only way we can use bindings?
There is a least one alternative way: note that the only thing we used
names for are for references. We don't really care what the name is,
which is pretty obvious when we consider the two WAE expressions:
{with {x 5} {+ x x}}
{with {y 5} {+ y y}}
or the two Racket function definitions:
(define (foo x) (list x x))
(define (foo y) (list y y))
Both of these show a pair of expressions that we should consider as
equal in some sense (this is called "alphaequality"). The only thing
we care about is what variable points where: the binding structure is
the only thing that matters. In other words, as long as DrRacket
produces the same arrows when we use Check Syntax, we consider the
program to be the same, regardless of name choices (for argument names
and local names, not for global names like `foo` in the above).
The alternative idea uses this principle: if all we care about is where
the arrows go, then simply get rid of the names... Instead of
referencing a binding through its name, just specify which of the
surrounding scopes we want to refer to. For example, instead of:
{with {x 5} {with {y 6} {+ x y}}}
we can use a new "reference" syntax  `[N]`  and use this instead
of the above:
{with 5 {with 6 {+ [1] [0]}}}
So the rules for `[N]` are  `[0]` is the value bound in the current
scope, `[1]` is the value from the next one up etc.
Of course, to do this translation, we have to know the precise scope
rules. Two more complicated examples:
{with {x 5} {+ x {with {y 6} {+ x y}}}}
is translated to:
{with 5 {+ [0] {with 6 {+ [1] [0]}}}}
(note how `x` appears as a different reference based on where it
appeared in the original code.) Even more subtle:
{with {x 5} {with {y {+ x 1}} {+ x y}}}
is translated to:
{with 5 {with {+ [0] 1} {+ [1] [0]}}}
because the inner `with` does not have its own named expression in its
scope, so the named expression is immediately in the scope of the outer
`with`.
This is called "de Bruijn Indexes": instead of referencing identifiers
by their name, we use an index into the surrounding binding context.
The major disadvantage, as can be seen in the above examples, is that it
is not convenient for humans to work with. Specifically, the same
identifier is referenced using different numbers, which makes it hard to
understand what some code is doing. After all, *abstractions* are the
main thing we deal with when we write programs, and having labels make
the bindings structure much easier to understand than scope counts.
However, practically all compilers use this for compiled code (think
about stack pointers). For example, GCC compiles this code:
{
int x = 5;
{
int y = x + 1;
return x + y;
}
}
to:
subl $8, %esp
movl $5, 4(%ebp) ; int x = 5
movl 4(%ebp), %eax
incl %eax
movl %eax, 8(%ebp) ; int y = %eax
movl 8(%ebp), %eax
addl 4(%ebp), %eax