2013-01-18 - Bindings & Substitution - Adding Bindings to AE: The WAE Language - Implementing `with' Evaluation - Formal Specs - Lazy vs Eager Evaluation - de Bruijn Indexes ======================================================================== >>> Bindings & Substitution We now get to an important concept: substitution. Even in our simple language, we encounter repeated expressions. For example, if we want to compute the square of some expression: {* {+ 4 2} {+ 4 2}} Why would we want to get rid of the repeated sub-expression? * It introduces a redundant computation. In this example, we want to avoid computing the same sub-expression a second time. * It makes the computation more complicated than it could be without the repetition. Compare the above with: with x = {+ 4 2}, {* x x} * This is related to a basic fact in programming that we have already discussed: duplicating information is always a bad thing. Among other bad consequences, it can even lead to bugs that could not happen if we wouldn't duplicate code. A toy example is "fixing" one of the numbers in one expression and forgetting to fix the corresponding one: {* {+ 4 2} {+ 4 1}} Real world examples involve much more code, which make such bugs very difficult to find, but they still follow the same principle. * This gives us more expressive power -- we don't just say that we want to multiply two expressions that both happen to be {+ 4 2}, we say that we multiply the {+ 4 2} expression by *itself*. It allows us to express identity of two values as well as using two values that happen to be the same. So, the normal way to avoid redundancy is to introduce an identifier. Even when we speak, we might say: "let x be 4 plus 2, multiply x by x". (These are often called "variables", but we will try to avoid this name: what if the identifier does not change (vary)?) To get this, we introduce a new form into our language: {with {x {+ 4 2}} {* x x}} We expect to be able to reduce this to: {* 6 6} by substituting 6 for `x' in the body sub-expression of `with'. A little more complicated example: {with {x {+ 4 2}} {with {y {* x x}} {+ y y}}} [add] = {with {x 6} {with {y {* x x}} {+ y y}}} [subst]= {with {y {* 6 6}} {+ y y}} [mul] = {with {y 36} {+ y y}} [subst]= {+ 36 36} [add] = 72 ======================================================================== >>> Adding Bindings to AE: The WAE Language [[[ PLAI Chapter 3 ]]] To add this to our language, we start with the BNF. We now call our language `WAE' (With+AE): ::= | { + } | { - } | { * } | { / } | { with { } } | Note that we had to introduce two new rules: one for introducing an identifier, and one for using it. This is common in many language specifications, for example `define-type' introduces a new type, and it comes with `cases' that allows us to destruct its instances. For we need to use some form of identifiers, the natural choice in Racket is to use symbols. We can therefore write the corresponding type definition: (define-type WAE [Num Number] [Add WAE WAE] [Sub WAE WAE] [Mul WAE WAE] [Div WAE WAE] [Id Symbol] [With Symbol WAE WAE]) The parser is easily extended to produce these syntax objects: (: parse-sexpr : Sexpr -> WAE) ;; to convert s-expressions into WAEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) But note that this parser is inconvenient -- if any of these expressions: {* 1 2 3} {foo 5 6} {with x 5 {* x 8}} {with {5 x} {* x 8}} would result in a "bad syntax" error, which is not very helpful. To make things better, we can add another case for `with' expressions that are malformed, and give a more specific message in that case: (: parse-sexpr : Sexpr -> WAE) ;; to convert s-expressions into WAEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [(cons 'with more) (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) and finally, to group all of the parsing code that deals with `with' expressions (both valid and invalid ones), we can use a single case for both of them: (: parse-sexpr : Sexpr -> WAE) ;; to convert s-expressions into WAEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'with more) ;; go in here for all sexpr that begin with a 'with (match sexpr [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [else (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)])] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) And now we're done with the syntactic part of the `with' extension. (Quick note -- why would we indent `With' like a normal function in code like this (With 'x (Num 2) (Add (Id 'x) (Num 4))) instead of an indentation that looks like a `let' (With 'x (Num 2) (Add (Id 'x) (Num 4))) ? The reason for this is that the second indentation looks more a binding construct (eg, how a `let' is indented), but `With' is *not* a binding form -- it's a plain function because it's at the Racket level. You should therefore keep in mind the huge difference between that `With' and the `with' that appears in WAE programs: {with {x 2} {+ x 4}} Another way to look at it: imagine that we intend for the language to be used by Spanish speakers. In this case we would translate "with": {con {x 2} {+ x 4}} but we will not do the same for `With'.) ======================================================================== >>> Implementing `with' Evaluation Now, to make this work, we will need to do some substitutions. We basically want to say that to evaluate: {with {id WAE1} WAE2} we need to evaluate WAE2 with id substituted by WAE1. Formally: eval( {with {id WAE1} WAE2} ) = eval( subst(WAE2,id,WAE1) ) There is a more common syntax for substitution (quick: what do I mean by this "syntax"?): eval( {with {id WAE1} WAE2} ) = eval( WAE2[WAE1/id] ) (Sidenote: this syntax originates with logicians who used `[x/v]e', and later there was a convention that mimicked the more natural order of arguments to a function with `e[x->v]', and eventually both of these got combined into `e[v/x]' which is a little confusing in that the left-to-right order of the arguments is not the same as for the `subst' function.) Now all we need is an exact definition of substitution. (Note that substitution is not the same as evaluation, only part of the evaluation process. In the previous examples, when we evaluated the expression we did substitutions as well as the usual arithmetic operations that were already part of the AE evaluator. In this last definition there is still a missing evaluation step, see if you can find it.) So let us try to define substitution now: [substitution, take 1] e[v/i] To substitute an identifier `i' in an expression `e' with an expression `v', replace all identifiers in `e' that have the same name `i' by the expression `v'. This seems to work with simple expressions, for example: {with {x 5} {+ x x}} --> {+ 5 5} {with {x 5} {+ 10 4}} --> {+ 10 4} however, we crash with an invalid syntax if we try: {with {x 5} {+ x {with {x 3} 10}}} --> {+ 5 {with {5 3} 10}} ??? -- we got to an invalid expression. To fix this, we need to distinguish "normal" occurrences of identifiers, and ones that are used as new bindings. We need a few new terms for this: 1. Binding Instance: a binding instance of an identifier is one that is used to name it in a new binding. In our syntax, binding instances are only the position of the `with' form. 2. Scope: the scope of a binding instance is the region of program text in which instances of the identifier refer to the value bound in the binding instance. (Note that this definition actually relies on a definition of substitution, because that is what is used to specify how identifiers refer to values.) 3. Bound Instance (or Bound Occurrence): an instance of an identifier is bound if it is contained within the scope of a binding instance of its name. 4. Free Instance (or Free Occurrence): An identifier that is not contained in any binding instance of its name is said to be free. Using this we can say that the problem with the previous definition of substitution is that it failed to distinguish between bound instances (which should be substituted) and binding instances (which should not). So we try to fix this: [substitution, take 2] e[v/i] To substitute an identifier `i' in an expression `e' with an expression `v', replace all instances of `i' that are not themselves binding instances with the expression `v'. First of all, check the previous examples: {with {x 5} {+ x x}} --> {+ 5 5} {with {x 5} {+ 10 4}} --> {+ 10 4} still work, and {with {x 5} {+ x {with {x 3} 10}}} --> {+ 5 {with {x 3} 10}} --> {+ 5 10} also works. However, if we try this: {with {x 5} {+ x {with {x 3} x}}} we get: --> {+ 5 {with {x 3} 5}} --> {+ 5 5} --> 10 but we want that to be 8: the inner `x' should be bound by the closest `with' that binds it. The problem is that the new definition of substitution that we have respects binding instances, but it fails to deal with their scope. In the above example, we want the inner `with' to *shadow* the outer `with's binding for `x'. [substitution, take 3] e[v/i] To substitute an identifier `i' in an expression `e' with an expression `v', replace all instances of `i' that are not themselves binding instances, and that are not in any nested scope, with the expression `v'. This avoids bad substitution above, but it is now doing things too carefully: {with {x 5} {+ x {with {y 3} x}}} becomes --> {+ 5 {with {y 3} x}} --> {+ 5 x} which is an error because `x' is unbound (and there is reasonable no rule that we can specify to evaluate it). The problem is that our substitution halts at every new scope, in this case, it stopped at the new `y' scope, but it shouldn't have because it uses a different name. In fact, that last definition of substitution cannot handle any nested scope. Revise again: [substitution, take 4] e[v/i] To substitute an identifier `i' in an expression `e' with an expression `v', replace all instances of `i' that are not themselves binding instances, and that are not in any nested scope of `i', with the expression `v'. which, finally, is a good definition. This is just a little too mechanical. Notice that we actually refer to all instances of `i' that are not in a scope of a binding instance of `i', which simply means all *free occurrences* of `i' -- free in `e' (why? -- remember the definition of "free"?): [substitution, take 4b] e[v/i] To substitute an identifier `i' in an expression `e' with an expression `v', replace all instances of `i' that are free in `e' with the expression `v'. Based on this we can finally write the code for it: (: subst : WAE Symbol WAE -> WAE) ;; substitutes the second argument with the third argument in the ;; first argument, as per the rules of substitution; the resulting ;; expression contains no free instances of the second argument (define (subst expr from to) ; returns expr[to/from] (cases expr [(Num n) expr] [(Add l r) (Add (subst l from to) (subst r from to))] [(Sub l r) (Sub (subst l from to) (subst r from to))] [(Mul l r) (Mul (subst l from to) (subst r from to))] [(Div l r) (Div (subst l from to) (subst r from to))] [(Id name) (if (eq? name from) to expr)] [(With bound-id named-expr bound-body) (if (eq? bound-id from) expr ; <-- don't go in! (With bound-id named-expr (subst bound-body from to)))])) ... and this is just the same as writing a formal "paper version" of the substitution rule. ... but we still have bugs! ======================================================================== Before we find the bugs, we need to see when and how substitution is used in the evaluation process. To modify our evaluator, we will need rules to deal with the new syntax pieces -- `with' expressions and identifiers. When we see an expression that looks like: {with {x E1} E2} we continue by *evaluating* `E1' to get a value `V1', we then substitute the identifier `x' with the expression `V1' in `E2', and continue by evaluating this new expression. In other words, we have the following evaluation rule: eval( {with {x E1} E2} ) = eval( E2[eval(E1)/x] ) So we know what to do with `with' expressions. How about identifiers? The main feature of `subst', as said in the purpose statement, is that it leaves no free instances of the substituted variable around. This means that if the initial expression is valid (did not contain any free variables), then when we go from {with {x E1} E2} to E2[E1/x] the result is an expression that has *no* free instances of `x'. So we don't need to handle identifiers in the evaluator -- substitutions make them all go away. We can now extend the formal definition of AE to that of WAE: eval(...) = ... same as the AE rules ... eval({with {x E1} E2}) = eval(E2[eval(E1)/x]) eval(id) = error! If you're paying close attention, you might catch a potential problem in this definition: we're substituting `eval(E1)' for `x' in `E2' -- an operation that requires a WAE expression, but `eval(E1)' is a number. (Look at the type of the `eval' definition we had for AE, then look at the above definition of `subst'.) This seems like being overly pedantic, but we it will require some resolution when we get to the code. The above rules are easily coded as follows: (: eval : WAE -> Number) ;; evaluates WAE expressions by reducing them to numbers (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))] [(With bound-id named-expr bound-body) (eval (subst bound-body bound-id (Num (eval named-expr))))] ; <-*** [(Id name) (error 'eval "free identifier: ~s" name)])) Note the `Num' expression in the marked line: evaluating the named expression gives us back a number -- we need to convert this number into a syntax to be able to use it with `subst'. The solution is to use `Num' to convert the resulting number into a numeral (the syntax of a number). It's not an elegant solution, but it will do for now. Finally, here are a few test cases. We use a new `test' special form which is part of the course plugin. The way to use `test' is with two expressions and an `=>' arrow -- DrRacket evaluates both, and nothing will happen if the results are equal. If the results are different, you will get a warning line, but evaluation will continue so you can try additional tests. You can also use an `=error>' arrow to test an error message -- use it with some text from the expected error, `?' stands for any single character, and `*' is a sequence of zero or more characters. (When you use `test' in your homework, the handin server will abort when tests fail.) We expect these tests to succeed (make sure that you understand *why* they should succeed). ;; tests (test (run "5") => 5) (test (run "{+ 5 5}") => 10) (test (run "{with {x {+ 5 5}} {+ x x}}") => 20) (test (run "{with {x 5} {+ x x}}") => 10) (test (run "{with {x {+ 5 5}} {with {y {- x 3}} {+ y y}}}") => 14) (test (run "{with {x 5} {with {y {- x 3}} {+ y y}}}") => 4) (test (run "{with {x 5} {+ x {with {x 3} 10}}}") => 15) (test (run "{with {x 5} {+ x {with {x 3} x}}}") => 8) (test (run "{with {x 5} {+ x {with {y 3} x}}}") => 10) (test (run "{with {x 5} {with {y x} y}}") => 5) (test (run "{with {x 5} {with {x x} x}}") => 5) (test (run "{with {x 1} y}") =error> "free identifier") Putting this all together, we get the following code; trying to run this code will raise an unexpected error... ---------------------------------------------------------------------- #lang pl #| BNF for the WAE language: ::= | { + } | { - } | { * } | { / } | { with { } } | |# ;; WAE abstract syntax trees (define-type WAE [Num Number] [Add WAE WAE] [Sub WAE WAE] [Mul WAE WAE] [Div WAE WAE] [Id Symbol] [With Symbol WAE WAE]) (: parse-sexpr : Sexpr -> WAE) ;; to convert s-expressions into WAEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'with more) (match sexpr [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [else (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)])] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> WAE) ;; parses a string containing a WAE expression to a WAE AST (define (parse str) (parse-sexpr (string->sexpr str))) (: subst : WAE Symbol WAE -> WAE) ;; substitutes the second argument with the third argument in the ;; first argument, as per the rules of substitution; the resulting ;; expression contains no free instances of the second argument (define (subst expr from to) (cases expr [(Num n) expr] [(Add l r) (Add (subst l from to) (subst r from to))] [(Sub l r) (Sub (subst l from to) (subst r from to))] [(Mul l r) (Mul (subst l from to) (subst r from to))] [(Div l r) (Div (subst l from to) (subst r from to))] [(Id name) (if (eq? name from) to expr)] [(With bound-id named-expr bound-body) (if (eq? bound-id from) expr (With bound-id named-expr (subst bound-body from to)))])) (: eval : WAE -> Number) ;; evaluates WAE expressions by reducing them to numbers (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))] [(With bound-id named-expr bound-body) (eval (subst bound-body bound-id (Num (eval named-expr))))] [(Id name) (error 'eval "free identifier: ~s" name)])) (: run : String -> Number) ;; evaluate a WAE program contained in a string (define (run str) (eval (parse str))) ;; tests (test (run "5") => 5) (test (run "{+ 5 5}") => 10) (test (run "{with {x {+ 5 5}} {+ x x}}") => 20) (test (run "{with {x 5} {+ x x}}") => 10) (test (run "{with {x {+ 5 5}} {with {y {- x 3}} {+ y y}}}") => 14) (test (run "{with {x 5} {with {y {- x 3}} {+ y y}}}") => 4) (test (run "{with {x 5} {+ x {with {x 3} 10}}}") => 15) (test (run "{with {x 5} {+ x {with {x 3} x}}}") => 8) (test (run "{with {x 5} {+ x {with {y 3} x}}}") => 10) (test (run "{with {x 5} {with {y x} y}}") => 5) (test (run "{with {x 5} {with {x x} x}}") => 5) (test (run "{with {x 1} y}") =error> "free identifier") ---------------------------------------------------------------------- ======================================================================== Oops, this program still has problems that were caught by the tests -- we encounter unexpected free identifier errors. What's the problem now? In expressions like: {with {x 5} {with {y x} y}} we forgot to substitute `x' in the expression that `y' is bound to. We need to the recursive substitute in both the with's body expression as well as its named expression: (: subst : WAE Symbol WAE -> WAE) ;; substitutes the second argument with the third argument in the ;; first argument, as per the rules of substitution; the resulting ;; expression contains no free instances of the second argument (define (subst expr from to) (cases expr [(Num n) expr] [(Add l r) (Add (subst l from to) (subst r from to))] [(Sub l r) (Sub (subst l from to) (subst r from to))] [(Mul l r) (Mul (subst l from to) (subst r from to))] [(Div l r) (Div (subst l from to) (subst r from to))] [(Id name) (if (eq? name from) to expr)] [(With bound-id named-expr bound-body) (if (eq? bound-id from) expr (With bound-id (subst named-expr from to) ; <-- new (subst bound-body from to)))])) And *still* we have a problem... Now it's {with {x 5} {with {x x} x}} that halts with an error, but we want it to evaluate to 5! Carefully trying out our substitution code reveals the problem: when we substitute `5' for the outer `x', we don't go inside the inner `with' because it has the same name -- but we *do* need to go into its named expression. We need to substitute in the named expression even if the identifier is the *same* one we're substituting: (: subst : WAE Symbol WAE -> WAE) ;; substitutes the second argument with the third argument in the ;; first argument, as per the rules of substitution; the resulting ;; expression contains no free instances of the second argument (define (subst expr from to) (cases expr [(Num n) expr] [(Add l r) (Add (subst l from to) (subst r from to))] [(Sub l r) (Sub (subst l from to) (subst r from to))] [(Mul l r) (Mul (subst l from to) (subst r from to))] [(Div l r) (Div (subst l from to) (subst r from to))] [(Id name) (if (eq? name from) to expr)] [(With bound-id named-expr bound-body) (With bound-id (subst named-expr from to) (if (eq? bound-id from) bound-body (subst bound-body from to)))])) The complete (and, finally, correct) version of the code is now: ---<<>>---------------------------------------------------------- #lang pl #| BNF for the WAE language: ::= | { + } | { - } | { * } | { / } | { with { } } | |# ;; WAE abstract syntax trees (define-type WAE [Num Number] [Add WAE WAE] [Sub WAE WAE] [Mul WAE WAE] [Div WAE WAE] [Id Symbol] [With Symbol WAE WAE]) (: parse-sexpr : Sexpr -> WAE) ;; to convert s-expressions into WAEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'with more) (match sexpr [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [else (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)])] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> WAE) ;; parses a string containing a WAE expression to a WAE AST (define (parse str) (parse-sexpr (string->sexpr str))) #| Formal specs for `subst': (`N' is a , `E1', `E2' are s, `x' is some , `y' is a *different* ) N[v/x] = N {+ E1 E2}[v/x] = {+ E1[v/x] E2[v/x]} {- E1 E2}[v/x] = {- E1[v/x] E2[v/x]} {* E1 E2}[v/x] = {* E1[v/x] E2[v/x]} {/ E1 E2}[v/x] = {/ E1[v/x] E2[v/x]} y[v/x] = y x[v/x] = v {with {y E1} E2}[v/x] = {with {y E1[v/x]} E2[v/x]} {with {x E1} E2}[v/x] = {with {x E1[v/x]} E2} |# (: subst : WAE Symbol WAE -> WAE) ;; substitutes the second argument with the third argument in the ;; first argument, as per the rules of substitution; the resulting ;; expression contains no free instances of the second argument (define (subst expr from to) (cases expr [(Num n) expr] [(Add l r) (Add (subst l from to) (subst r from to))] [(Sub l r) (Sub (subst l from to) (subst r from to))] [(Mul l r) (Mul (subst l from to) (subst r from to))] [(Div l r) (Div (subst l from to) (subst r from to))] [(Id name) (if (eq? name from) to expr)] [(With bound-id named-expr bound-body) (With bound-id (subst named-expr from to) (if (eq? bound-id from) bound-body (subst bound-body from to)))])) #| Formal specs for `eval': eval(N) = N eval({+ E1 E2}) = eval(E1) + eval(E2) eval({- E1 E2}) = eval(E1) - eval(E2) eval({* E1 E2}) = eval(E1) * eval(E2) eval({/ E1 E2}) = eval(E1) / eval(E2) eval(id) = error! eval({with {x E1} E2}) = eval(E2[eval(E1)/x]) |# (: eval : WAE -> Number) ;; evaluates WAE expressions by reducing them to numbers (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))] [(With bound-id named-expr bound-body) (eval (subst bound-body bound-id (Num (eval named-expr))))] [(Id name) (error 'eval "free identifier: ~s" name)])) (: run : String -> Number) ;; evaluate a WAE program contained in a string (define (run str) (eval (parse str))) ;; tests (test (run "5") => 5) (test (run "{+ 5 5}") => 10) (test (run "{with {x {+ 5 5}} {+ x x}}") => 20) (test (run "{with {x 5} {+ x x}}") => 10) (test (run "{with {x {+ 5 5}} {with {y {- x 3}} {+ y y}}}") => 14) (test (run "{with {x 5} {with {y {- x 3}} {+ y y}}}") => 4) (test (run "{with {x 5} {+ x {with {x 3} 10}}}") => 15) (test (run "{with {x 5} {+ x {with {x 3} x}}}") => 8) (test (run "{with {x 5} {+ x {with {y 3} x}}}") => 10) (test (run "{with {x 5} {with {y x} y}}") => 5) (test (run "{with {x 5} {with {x x} x}}") => 5) (test (run "{with {x 1} y}") =error> "free identifier") ---------------------------------------------------------------------- ======================================================================== Reminder: * We started doing substitution, with a `let'-like form: `with'. * Reasons for using bindings: - Avoid writing expressions twice. -> More expressive language (can express identity). -> Duplicating is bad! (=> DRY, Don't Repeat Yourself) --> Static redundancy. - Avoid redundant computations. --> Dynamic redundancy. * BNF: ::= | { + } | { - } | { * } | { / } | { with { } } | Note that we had to introduce two new rules: one for introducing an identifier, and one for using it. * Type definition: (define-type WAE [Num Number] [Add WAE WAE] [Sub WAE WAE] [Mul WAE WAE] [Div WAE WAE] [Id Symbol] [With Symbol WAE WAE]) * Parser: (: parse-sexpr : Sexpr -> WAE) ;; to convert s-expressions into WAEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'with more) (match sexpr [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [else (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)])] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) * We need to define substitution. Terms: 1. Binding Instance. 2. Scope. 3. Bound Instance. 4. Free Instance. * After lots of attempts: e[v/i] -- To substitute an identifier `i' in an expression `e' with an expression `v', replace all instances of `i' that are free in `e' with the expression `v'. * Implemented the code, and again, needed to fix a few bugs: (: subst : WAE Symbol WAE -> WAE) ;; substitutes the second argument with the third argument in the ;; first argument, as per the rules of substitution; the resulting ;; expression contains no free instances of the second argument (define (subst expr from to) (cases expr [(Num n) expr] [(Add l r) (Add (subst l from to) (subst r from to))] [(Sub l r) (Sub (subst l from to) (subst r from to))] [(Mul l r) (Mul (subst l from to) (subst r from to))] [(Div l r) (Div (subst l from to) (subst r from to))] [(Id name) (if (eq? name from) to expr)] [(With bound-id named-expr bound-body) (With bound-id (subst named-expr from to) (if (eq? bound-id from) bound-body (subst bound-body from to)))])) (Note that the bugs that we fixed clarify the exact way that our scopes work: in `{with {x 2} {with {x {+ x 2}} x}}', the scope of the first `x' is: ^^^^^^^) * We then extended the AE evaluation rules: eval(...) = ... same as the AE rules ... eval({with {x E1} E2}) = eval(E2[eval(E1)/x]) eval(id) = error! and noted the possible type problem. * The above translated into a Racket definition for an `eval' function (with a hack to avoid the type issue): (: eval : WAE -> Number) ;; evaluates WAE expressions by reducing them to numbers (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))] [(With bound-id named-expr bound-body) (eval (subst bound-body bound-id (Num (eval named-expr))))] [(Id name) (error 'eval "free identifier: ~s" name)])) ======================================================================== >>> Formal Specs Note the formal definitions that were included in the WAE code. They are ways of describing pieces of our language that are more formal than plain English, but still not as formal (and as verbose) as the actual code. A formal definition of `subst': (`N' is a , `E1', `E2' are s, `x' is some , `y' is a *different* ) N[v/x] = N {+ E1 E2}[v/x] = {+ E1[v/x] E2[v/x]} {- E1 E2}[v/x] = {- E1[v/x] E2[v/x]} {* E1 E2}[v/x] = {* E1[v/x] E2[v/x]} {/ E1 E2}[v/x] = {* E1[v/x] E2[v/x]} y[v/x] = y x[v/x] = v {with {y E1} E2}[v/x] = {with {y E1[v/x]} E2[v/x]} {with {x E1} E2}[v/x] = {with {x E1[v/x]} E2} And a formal definition of `eval': eval(N) = N eval({+ E1 E2}) = eval(E1) + eval(E2) eval({- E1 E2}) = eval(E1) - eval(E2) eval({* E1 E2}) = eval(E1) * eval(E2) eval({/ E1 E2}) = eval(E1) / eval(E2) eval(id) = error! eval({with {x E1} E2}) = eval(E2[eval(E1)/x]) ======================================================================== >>> Lazy vs Eager Evaluation As we have previously seen, there are two basic approaches for evaluation: either eager or lazy. In lazy evaluation, bindings are used for sort of textual references -- it is only for avoiding writing an expression twice, but the associated computation is done twice anyway. In eager evaluation, we eliminate not only the textual redundancy, but also the computation. Which evaluation method did our evaluator use? The relevant piece of formalism is the treatment of `with': eval({with {x E1} E2}) = eval(E2[eval(E1)/x]) And the matching piece of code is: [(With bound-id named-expr bound-body) (eval (subst bound-body bound-id (Num (eval named-expr))))] How do we make this lazy? In the formal equation: eval({with {x E1} E2}) = eval(E2[E1/x]) and in the code: (: eval : WAE -> Number) ;; evaluates WAE expressions by reducing them to numbers (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(With bound-id named-expr bound-body) (eval (subst bound-body bound-id named-expr))] ; <- no eval and no Num wrapping [(Id name) (error 'eval "free identifier: ~s" name)])) We can verify the way this works by tracing `eval' (compare the trace you get for the two versions): > (trace eval) > (run "{with {x {+ 1 2}} {* x x}}") Ignoring the traces for now, the modified WAE interpreter works as before, specifically, all tests pass. So the question is whether the language we get is actually different than the one we had before. One difference is in execution speed, but we can't really notice a difference, and we care more about meaning. Is there any program that will run differently in the two languages? The main feature of the lazy evaluator is that it is not evaluating the named expression until it is actually needed. As we have seen, this leads to duplicating computations if the bound identifier is used more than once -- meaning that it does not eliminate the dynamic redundancy. But what if the bound identifier is not used at all? In that case the named expression simply evaporates. This is a good hint at an expression that behaves differently in the two languages -- if we add division to both languages, we get a different result when we try running: {with {x {/ 8 0}} 7} The eager evaluator stops with an error when it tries evaluating the division -- and the lazy evaluator simply ignores it. Even without division, we get a similar behavior for {with {x y} 7} but it is questionable whether the fact that this evaluates to 7 is correct behavior -- we really want to forbid program that use free variable. Furthermore, there is an issue with name capturing -- we don't want to substitute an expression into a context that captures some of its free variables. But our substitution allows just that, which is usually not a problem because by the time we do the substitution, the named expression should not have free variables that need to be replaced. However, consider evaluating this program: {with {y x} {with {x 2} {+ x y}}} under the two evaluation regimens: the eager version stops with an error, and the lazy version succeed. This points at a bug in our substitution, or rather not dealing with an issue that we do not encounter. So the summary is: as long as the initial program is correct, both evaluation regimens produce the same results. If a program contains free variables, they might get captured in a naive lazy evaluator implementation (but this is a bug that should be fixed). Also, there are some cases where eager evaluation runs into a run-time problem which does not happen in a lazy evaluator because the expression is not used. It is possible to prove that when you evaluate an expression, if there is an error that can be avoided, lazy evaluation will always avoid it, whereas an eager evaluator will always run into it. On the other hand, lazy evaluators are usually slower than eager evaluator, so it's a speed vs. robustness trade-off. Note that with lazy evaluation we say that an identifier is bound to an expression rather than a value. (Again, this is why the eager version needed to wrap `eval's result in a `Num' and this one doesn't.) (It is possible to change things and get a more well behaved substitution, we basically will need to find if a capture might happen, and rename things to avoid it. For example, {with {y E1} E2}[v/x] if `x' and `y' are equal = {with {y E1[v/x]} E2} = {with {x E1[v/x]} E2} if `y' has a free occurrence in `v' = {with {y1 E1[v/x]} E2[y1/y][v/x]} otherwise = {with {x E1[v/x]} E2[v/x]} But you can see that this is much more complicated (more code: requires a `free-in' predicate, being able to invent new "fresh" names, etc). And it's not even the end of that story...) ======================================================================== >>> de Bruijn Indexes This whole story revolves around names, specifically, name capture is a problem that should always be avoided (it is one major source of PL headaches). But are names the only way we can use bindings? There is a least one alternative way: note that the only thing we used names for are for references. We don't really care what the name is, which is pretty obvious when we consider the two WAE expressions: {with {x 5} {+ x x}} {with {y 5} {+ y y}} or the two Racket function definitions: (define (foo x) (list x x)) (define (foo y) (list y y)) Both of these show a pair of expressions that we should consider as equal in some sense (this is called "alpha-equality"). The only thing we care about is what variable points where: the binding structure is the only thing that matters. In other words, as long as DrRacket produces the same arrows when we use Check Syntax, we consider the program to be the same, regardless of name choices (for argument names and local names, not for global names like `foo' in the above). The alternative idea uses this principle: if all we care about is where the arrows go, then simply get rid of the names... Instead of referencing a binding through its name, just specify which of the surrounding scopes we want to refer to. For example, instead of: {with {x 5} {with {y 6} {+ x y}}} we can use a new `reference' syntax -- "[N]" -- and use this instead of the above: {with 5 {with 6 {+ [1] [0]}}} So the rules for [N] are -- [0] is the value bound in the current scope, [1] is the value from the next one up etc. Of course, to do this translation, we have to know the precise scope rules. Two more complicated examples: {with {x 5} {+ x {with {y 6} {+ x y}}}} is translated to: {with 5 {+ [0] {with 6 {+ [1] [0]}}}} (note how `x' appears as a different reference based on where it appeared in the original code.) Even more subtle: {with {x 5} {with {y {+ x 1}} {+ x y}}} is translated to: {with 5 {with {+ [0] 1} {+ [1] [0]}}} because the inner `with' does not have its own named expression in its scope, so the named expression is immediately in the scope of the outer `with'. This is called "de Bruijn Indexes": instead of referencing identifiers by their name, we use an index into the surrounding binding context. The major disadvantage, as can be seen in the above examples, is that it is not convenient for humans to work with. Specifically, the same identifier is referenced using different numbers, which makes it hard to understand what some code is doing. However, practically all compilers use this for compiled code (think about stack pointers). For example, GCC compiles this code: { int x = 5; { int y = x + 1; return x + y; } } to: subl $8, %esp movl $5, -4(%ebp) movl -4(%ebp), %eax incl %eax movl %eax, -8(%ebp) movl -8(%ebp), %eax addl -4(%ebp), %eax ========================================================================