2013-02-27 - Variable Mutation - State and Environments - Implementing Objects with State - The Toy Language - "Compilation" and Partial Evaluation ======================================================================== >>> Variable Mutation [[[ PLAI Chapter 12 & 13 (different: adds boxes to the language) ]]] [[[ PLAI Chapter 14 (what we do) ]]] The code that we now have implements recursion by *changing* bindings, and to make that possible we made environments hold boxes for all bindings, therefore bindings are *all* mutable now. We can use this to add more functionality to our evaluator, by allowing changing any variable -- we can add a `set!' form: {set! } to the evaluator that will modify the value of a variable. To implement this functionality, all we need to do is to use `lookup' to retrieve some box, then evaluate the expression and put the result in that box. The actual implementation is left as a homework exercise. One thing that should be considered here is -- all of the expressions in our language evaluate to some value, the question is what should be the value of a `set!' expression? There are three obvious choices: 1. return some bogus value, 2. return the value that was assigned, 3. return the value that was previously in the box. Each one of these has its own advantage -- for example, C uses the second option to `chain' assignments (eg, "x = y = 0") and to allow side effects where an expression is expected (eg, "while (x = x-1) ..."). The third one is useful in cases where you might use the old value that is overwritten -- for example, if C had this behavior, you could `pop' a value from a linked list using something like: first(l = rest(l)); because the argument to `first' will be the old value of `l', before it changed to be its `rest'. You could also swap two variables in a single expression: "x = y = x". (Note that the expression "x = x + 1" has the meaning of C's "++x" when option (2) is used, and "x++" when option (3) is used.) Racket chooses the first option, and we will do the same in our language. The advantage here is that you get no discounts, therefore you must be explicit about what values you want to return in situations where there is no obvious choice. This leads to more robust programs since you do not get other programmers that will rely on a feature of your code that you did not plan on. In any case, the modification that introduces mutation is small, but it has a tremendous effect on our language: it was true for Racket, and it is true for FLANG. We have seen how mutation affects the language subset that we use, and in the extension of our FLANG the effect is even stronger: since *any* variable can change (there is no need for an explicit `box' value). In other words, a binding is not always the same -- in can change as a result of a `set!' expression. Of course, we could extend our language with boxes (using Racket boxes to implement FLANG boxes), but that will be a little more verbose. (Note that Racket does have a `set!' form, also, fields in structs can be made modifiable. However, we do not use any of these. At least not for now.) ======================================================================== >>> State and Environments A quick example of how mutation can be used: (define counter (let ([counter (box 0)]) (lambda () (set-box! counter (+ 1 (unbox counter))) (unbox counter)))) and compare that to: (define (make-counter) (let ([counter (box 0)]) (lambda () (set-box! counter (+ 1 (unbox counter))) (unbox counter)))) It is a good idea if you follow the exact evaluation of (define foo (make-counter)) (define bar (make-counter)) and see how both bindings have separate environment so each one gets its own private state. The equivalent code in the homework interpreter extended with `set!' doesn't need boxes: {with {make-counter {fun {} {with {counter 0} {fun {} {set! counter {+ counter 1}} counter}}}} {with {foo {call make-counter}} {with {bar {call make-counter}} ...}}} (To see multiple values from a single expression you can extend the language with a `list' binding.) Note that we cannot describe this behavior with substitution rules! We now use the environments to make it possible to change bindings -- so finally an environment is actually an environment rather than a substitution cache. When you look at the above, note that we still use lexical scope -- in fact, the local binding is actually a private state that nobody can access. For example, if we write this: (define counter (let ([counter (box 0)]) (lambda () (set-box! counter (+ 1 (unbox counter))) (if (zero? (modulo (unbox counter) 4)) 'tock 'tick)))) then the resulting function that us bound to `counter' keeps a local integer state which no other code can access -- you cannot modify it, reset it, or even know if it is really an integer that is used in there. ======================================================================== >>> Implementing Objects with State We have already seen how several pieces of information can be encapsulate in a Racket closure that keeps them all; now we can do a little more -- we can actually have mutable state, which leads to a natural way to implement objects. For example: (define (make-point x y) (let ([xb (box x)] [yb (box y)]) (lambda (msg) (match msg ['getx (unbox xb)] ['gety (unbox yb)] ['incx (set-box! xb (add1 (unbox xb)))])))) implements a constructor for `point' objects which keep two values and can move one of them. Note that the messages act as a form of methods, and that the values themselves are hidden and are accessible only through the interface that these messages make. For example, if these points correspond to some graphic object on the screen, we can easily incorporate a necessary screen update: (define (make-point x y) (let ([xb (box x)] [yb (box y)]) (lambda (msg) (match msg ['getx (unbox xb)] ['gety (unbox yb)] ['incx (set-box! xb (add1 (unbox xb))) (update-screen)])))) and be sure that this is always done when the value changes -- since there is no way to change the value except through this interface. A more complete example would define functions that actually send these messages -- here is a better implementation of a point object and the corresponding accessors and mutators: (define (make-point x y) (let ([xb (box x)] [yb (box y)]) (lambda (msg) (match msg ['getx (unbox xb)] ['gety (unbox yb)] [(list 'setx newx) (set-box! xb newx) (update-screen)] [(list 'sety newy) (set-box! yb newy) (update-screen)])))) (define (point-x p) (p 'getx)) (define (point-y p) (p 'gety)) (define (set-point-x! p x) (p (list 'setx x))) (define (set-point-y! p y) (p (list 'sety y))) And a quick imitation of inheritance can be achieved using delegation to an instance of the super-class: (define (make-colored-point x y color) (let ([p (make-point x y)]) (lambda (msg) (match msg ['getcolor color] [else (p msg)])))) You can see how all of these could come from some preprocessing of a more normal-looking class definition form, like: (defclass point (x y) (public (getx) x) (public (gety) y) (public (setx new) (set! x newx)) (public (setx new) (set! x newx))) (defclass colored-point point (c) (public (getcolor) c)) ======================================================================== >>> The Toy Language [[[ Not in PLAI ]]] A quick note: from now on, we will work with a variation of our language -- it will change the syntax to look a little more like Racket, and we will use Racket values for values in our language and Racket functions for built-ins in our language. Main highlights: * There can be multiple bindings in function arguments and local `bind' forms -- the names are required to be distinct. * There are now a few keywords like `bind' that are parsed in a special way. Other forms are taken as function application, which means that there are no special parse rules (and AST entries) for arithmetic functions. They're now bindings in the global environment, and treated in the same way as all bindings. For example, `*' is an expression that evaluates to the "primitive" multiplication function, and {bind {{+ *}} {+ 2 3}} evaluates to 6. * Since function applications are now the same for primitive functions and user-bound functions, there is no need for a `call' keyword. Note that the function call part of the parser must be last, since it should apply only if the input is not some other known form. * Note the use of `make-untyped-list-function': it's a library function (included in the course language) that can convert a few known Racket functions to a function that consumes a list of *any* Racket values, and returns the result of applying the given Racket function on these values. For example: (define add (make-untyped-list-function +)) (add (list 1 2 3 4)) evaluates to 10. * Another important aspect of this is its type -- the type of `add' in the previous example is (List -> Any), so the resulting function can consume *any* input values. If it gets a bad value, it will throw an appropriate error. This is a hack: it basically means that the resulting `add' function has a very generic type (requiring just a list), so errors can be thrown at run-time. However, in this case, a better solution is not going to make these run-time errors go away because the language that we're implementing is not statically typed. * The benefit of this is that we can avoid the hassle of more verbose code by letting these functions dynamically check the input values, so we can use a single `RktV' variant in `VAL' which wraps any Racket value. (Otherwise we'd need different wrappers for different types, and implement these dynamic checks.) The following is the complete implementation. ---<<>>---------------------------------------------------------- #lang pl ;;; ================================================================== ;;; Syntax #| The BNF: ::= | | { bind {{ } ... } } | { fun { ... } } | { if } | { ... } |# ;; A matching abstract syntax tree datatype: (define-type TOY [Num Number] [Id Symbol] [Bind (Listof Symbol) (Listof TOY) TOY] [Fun (Listof Symbol) TOY] [Call TOY (Listof TOY)] [If TOY TOY TOY]) (: unique-list? : (Listof Any) -> Boolean) ;; Tests whether a list is unique, used to guard Bind and Fun values. (define (unique-list? xs) (or (null? xs) (and (not (member (first xs) (rest xs))) (unique-list? (rest xs))))) (: parse-sexpr : Sexpr -> TOY) ;; to convert s-expressions into TOYs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'bind more) (match sexpr [(list 'bind (list (list (symbol: names) (sexpr: nameds)) ...) body) (if (unique-list? names) (Bind names (map parse-sexpr nameds) (parse-sexpr body)) (error 'parse-sexpr "`bind' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `bind' syntax in ~s" sexpr)])] [(cons 'fun more) (match sexpr [(list 'fun (list (symbol: names) ...) body) (if (unique-list? names) (Fun names (parse-sexpr body)) (error 'parse-sexpr "`fun' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `fun' syntax in ~s" sexpr)])] [(cons 'if more) (match sexpr [(list 'if cond then else) (If (parse-sexpr cond) (parse-sexpr then) (parse-sexpr else))] [else (error 'parse-sexpr "bad `if' syntax in ~s" sexpr)])] [(list fun (sexpr: args) ...) ; other lists are applications (Call (parse-sexpr fun) (map parse-sexpr args))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> TOY) ;; Parses a string containing an TOY expression to a TOY AST. (define (parse str) (parse-sexpr (string->sexpr str))) ;;; ================================================================== ;;; Values and environments (define-type ENV [EmptyEnv] [FrameEnv FRAME ENV]) (define-type VAL [RktV Any] [FunV (Listof Symbol) TOY ENV] [PrimV ((Listof VAL) -> VAL)]) ;; a frame is an association list of names and values. (define-type FRAME = (Listof (List Symbol VAL))) (: extend : (Listof Symbol) (Listof VAL) ENV -> ENV) ;; extends an environment with a new frame. (define (extend names values env) (if (= (length names) (length values)) (FrameEnv (map (lambda: ([name : Symbol] [val : VAL]) (list name val)) names values) env) (error 'extend "arity mismatch for names: ~s" names))) (: lookup : Symbol ENV -> VAL) ;; looks for a name in an environment, searching through each frame. (define (lookup name env) (cases env [(EmptyEnv) (error 'lookup "no binding for ~s" name)] [(FrameEnv frame rest) (let ([cell (assq name frame)]) (if cell (second cell) (lookup name rest)))])) (: racket-func->prim-val : Function -> VAL) ;; converts a racket function to a primitive evaluator function which ;; is a PrimV holding a ((Listof VAL) -> VAL) function. (the ;; resulting function will use the list function as is, and it is the ;; list function's responsibility to throw an error if it's given a ;; bad number of arguments or bad input types.) (define (racket-func->prim-val racket-func) (let ([list-func (make-untyped-list-function racket-func)]) (PrimV (lambda (args) (let ([args (map (lambda: ([a : VAL]) (cases a [(RktV v) v] [else (error 'racket-func "bad input: ~s" a)])) args)]) (RktV (list-func args))))))) ;; The global environment has a few primitives: (: global-environment : ENV) (define global-environment (FrameEnv (list (list '+ (racket-func->prim-val +)) (list '- (racket-func->prim-val -)) (list '* (racket-func->prim-val *)) (list '/ (racket-func->prim-val /)) (list '< (racket-func->prim-val <)) (list '> (racket-func->prim-val >)) (list '= (racket-func->prim-val =)) ;; values (list 'true (RktV #t)) (list 'false (RktV #f))) (EmptyEnv))) ;;; ================================================================== ;;; Evaluation (: eval : TOY ENV -> VAL) ;; evaluates TOY expressions. (define (eval expr env) ;; convenient helper (: eval* : TOY -> VAL) (define (eval* expr) (eval expr env)) (cases expr [(Num n) (RktV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (eval* fun-expr)] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (eval* cond-expr) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))])) (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let ([result (eval (parse str) global-environment)]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) ;;; ================================================================== ;;; Tests (test (run "{{fun {x} {+ x 1}} 4}") => 5) (test (run "{bind {{add3 {fun {x} {+ x 3}}}} {add3 1}}") => 4) (test (run "{bind {{add3 {fun {x} {+ x 3}}} {add1 {fun {x} {+ x 1}}}} {bind {{x 3}} {add1 {add3 x}}}}") => 7) (test (run "{bind {{identity {fun {x} x}} {foo {fun {x} {+ x 1}}}} {{identity foo} 123}}") => 124) (test (run "{bind {{x 3}} {bind {{f {fun {y} {+ x y}}}} {bind {{x 5}} {f 4}}}}") => 7) (test (run "{{{fun {x} {x 1}} {fun {x} {fun {y} {+ x y}}}} 123}") => 124) ;; More tests for complete coverage (test (run "{bind x 5 x}") =error> "bad `bind' syntax") (test (run "{fun x x}") =error> "bad `fun' syntax") (test (run "{if x}") =error> "bad `if' syntax") (test (run "{}") =error> "bad syntax") (test (run "{bind {{x 5} {x 5}} x}") =error> "bind* duplicate names") (test (run "{fun {x x} x}") =error> "fun* duplicate names") (test (run "{+ x 1}") =error> "no binding for") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{1 2}") =error> "with a non-function") (test (run "{{fun {x} x}}") =error> "arity mismatch") (test (run "{if {< 4 5} 6 7}") => 6) (test (run "{if {< 5 4} 6 7}") => 7) (test (run "{if + 6 7}") => 6) (test (run "{fun {x} x}") =error> "returned a bad value") ;;; ================================================================== ---------------------------------------------------------------------- ======================================================================== >>> "Compilation" and Partial Evaluation Instead of interpreting an expression, which is performing a full evaluation, we can think about "compiling" it, which is translating it to a different language which we can later run more easily. Another feature that is usually associated with compilation is that a lot more work was done at the compilation stage, making the actual running of the code faster. For example, translating an AST into one that has de-Bruijn indexes instead of identifier names is a form of compilation -- not only is it translating one language into another, it does the work involved in name lookup before the program starts running. This is something that we can experiment with now. An easy way to achieve this is to start with our evaluation function: (: eval : TOY ENV -> VAL) ;; evaluates TOY expressions. (define (eval expr env) ;; convenient helper (: eval* : TOY -> VAL) (define (eval* expr) (eval expr env)) (cases expr [(Num n) (RktV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (eval* fun-expr)] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (eval* cond-expr) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))])) and change it so it compiles a given expression to a Racket function. (This is, of course, just to demonstrate a conceptual point, it is only the tip of what compilers actually do...) This means that we need to turn it into a function that receives a TOY expression and compiles it. In other words, `eval' no longer consumes and environment argument which makes sense because the environment is a place to hold run-time values, so it is a data structure that is not part of the compiler (it is usually represented as the call stack). So we split the two arguments into a compile-time and run-time, which can be done by simply currying the `eval' function -- here this is done, and all calls to `eval' are also curried: (: eval : TOY -> ENV -> VAL) ; <-- note the curried type ;; evaluates TOY expressions. (define (eval expr) (lambda (env) ;; convenient helper (: eval* : TOY -> VAL) (define (eval* expr) ((eval expr) env)) (cases expr [(Num n) (RktV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) ((eval bound-body) (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (eval* fun-expr)] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) ((eval body) (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (eval* cond-expr) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))]))) We also need to change the `eval' call in the main `run' function: (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let ([result ((eval (parse str)) global-environment)]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) Not much has changed so far. Note that in the general case of a compiler we need to run a program several times, so we'd want to avoid parsing it over and over again. We can do that by keeping a single parsed AST of the input. Now we went one step further by making it possible to do more work ahead and keep the result of the first "stage" of eval around (except that "more work" is really not saying much at the moment): (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let* ([compiled (eval (parse str))] [result (compiled global-environment)]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) At this point, even though our "compiler" is not much more than a slightly different representation of the same functionality, we rename `eval' to `compile' which is a more appropriate description of what we intend it to do (so we change the purpose statement too): (: compile : TOY -> ENV -> VAL) ;; compiles TOY expressions to Racket functions. (define (compile expr) (lambda (env) (: compile* : TOY -> VAL) (define (compile* expr) ((compile expr) env)) (cases expr [(Num n) (RktV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) ((compile bound-body) (extend names (map compile* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (compile* fun-expr)] [arg-vals (map compile* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) ((compile body) (extend names arg-vals fun-env))] [else (error 'call ; this is *not* a compilation error "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (compile* (if (cases (compile* cond-expr) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))]))) (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let* ([compiled (compile (parse str))] [result (compiled global-environment)]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) ========================================================================