2010-03-12 - The Toy Language - "Compilation" and Partial Evaluation ======================================================================== >>> The Toy Language A quick note: from now on, we will work with a variation of our language -- it will change the syntax to look a little more like Scheme, and we will use Scheme values for values in our language and Scheme functions for built-ins in our language. Main highlights: * There can be multiple bindings in function arguments and local `bind' forms -- the names are required to be distinct. * There are now a few keywords like `bind' that are parsed in a special way. Other forms are taken as function application, which means that there are no special parse rules (and AST entries) for arithmetic functions. They're now bindings in the global environment, and treated in the same way as all bindings. For example, `*' is an expression that evaluates to the "primitive" multiplication function, and {bind {{+ *}} {+ 2 3}} evaluates to 6. * Since function applications are now the same for primitive functions and user-bound functions, there is no need for a `call' keyword. * Note the use of `make-untyped-list-function': it's a library function (included in the course language) that can convert a few known Scheme functions to a function that consumes a list of *any* Scheme values, and returns the result of applying the given Scheme function on these values. For example: (define add (make-untyped-list-function +)) (add (list 1 2 3 4)) evaluates to 10. * Another important aspect of this is its type -- the type of `add' in the previous example is (List -> Any), so the resulting function can consume *any* input values. If it gets a bad value, it will throw an appropriate error. This is a hack: it basically means that the resulting `add' function has a very generic type (requiring just a list), so errors can be thrown at run-time. However, in this case, a better solution is not going to make these run-time errors go away because the language that we're implementing is not statically typed. * The benefit of this is that we can avoid the hassle of more verbose code by letting these functions dynamically check the input values, so we can use a single `ScmV' variant in `VAL' which wraps any Scheme value. (Otherwise we'd need different wrappers for different types, and implement these dynamic checks.) The following is the complete implementation. ---<<>>---------------------------------------------------------- #lang pl ;;; ================================================================== ;;; Syntax #| The BNF: ::= | | { bind {{ } ... } } | { fun { ... } } | { if } | { ... } |# ;; A matching abstract syntax tree datatype: (define-type TOY [Num (n Number)] [Id (name Symbol)] [Bind (names (Listof Symbol)) (exprs (Listof TOY)) (body TOY)] [Fun (names (Listof Symbol)) (body TOY)] [Call (fun-expr TOY) (arg-exprs (Listof TOY))] [If (cond-expr TOY) (then-expr TOY) (else-expr TOY)]) (: unique-list? : (Listof Any) -> Boolean) ;; Tests whether a list is unique, used to guard Bind and Fun values. (define (unique-list? xs) (or (null? xs) (and (not (member (first xs) (rest xs))) (unique-list? (rest xs))))) (: parse-sexpr : Sexpr -> TOY) ;; to convert s-expressions into TOYs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'bind more) (match sexpr [(list 'bind (list (list (symbol: names) (sexpr: nameds)) ...) body) (if (unique-list? names) (Bind names (map parse-sexpr nameds) (parse-sexpr body)) (error 'parse-sexpr "`bind' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `bind' syntax in ~s" sexpr)])] [(cons 'fun more) (match sexpr [(list 'fun (list (symbol: names) ...) body) (if (unique-list? names) (Fun names (parse-sexpr body)) (error 'parse-sexpr "`fun' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `fun' syntax in ~s" sexpr)])] [(cons 'if more) (match sexpr [(list 'if cond then else) (If (parse-sexpr cond) (parse-sexpr then) (parse-sexpr else))] [else (error 'parse-sexpr "bad `if' syntax in ~s" sexpr)])] [(list fun (sexpr: args) ...) ; other lists are applications (Call (parse-sexpr fun) (map parse-sexpr args))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> TOY) ;; Parses a string containing an TOY expression to a TOY AST. (define (parse str) (parse-sexpr (string->sexpr str))) ;;; ================================================================== ;;; Values and environments (define-type ENV [EmptyEnv] [FrameEnv (frame FRAME) (rest ENV)]) (define-type VAL [ScmV (x Any)] [FunV (names (Listof Symbol)) (body TOY) (env ENV)] [PrimV (prim ((Listof VAL) -> VAL))]) ;; a frame is an association list of names and values. (define-type FRAME = (Listof (List Symbol VAL))) (: extend : (Listof Symbol) (Listof VAL) ENV -> ENV) ;; extends an environment with a new frame. (define (extend names values env) (if (= (length names) (length values)) (FrameEnv (map (lambda: ([name : Symbol] [val : VAL]) (list name val)) names values) env) (error 'extend "arity mismatch for names: ~s" names))) (: lookup : Symbol ENV -> VAL) ;; looks a name in an environment, searching through each frame. (define (lookup name env) (cases env [(EmptyEnv) (error 'lookup "no binding for ~s" name)] [(FrameEnv frame rest) (let ([cell (assq name frame)]) (if cell (second cell) (lookup name rest)))])) (: scheme-func->prim-val : Function -> VAL) ;; converts a scheme function to a primitive evaluator function which ;; is a PrimV holding a ((Listof VAL) -> VAL) procedure. (the ;; resulting procedure will use the list function as is, and it is the ;; list function's responsibility to throw an error if it's given a ;; bad number of arguments or bad input types.) (define (scheme-func->prim-val scheme-func) (let ([list-func (make-untyped-list-function scheme-func)]) (PrimV (lambda: ([args : (Listof VAL)]) (let ([args (map (lambda: ([a : VAL]) (cases a [(ScmV v) v] [else (error 'scheme-func "bad input: ~s" a)])) args)]) (ScmV (list-func args))))))) ;; The global environment has a few primitives: (: global-environment : ENV) (define global-environment (FrameEnv (list (list '+ (scheme-func->prim-val +)) (list '- (scheme-func->prim-val -)) (list '* (scheme-func->prim-val *)) (list '/ (scheme-func->prim-val /)) (list '< (scheme-func->prim-val <)) (list '> (scheme-func->prim-val >)) (list '= (scheme-func->prim-val =)) ;; values (list 'true (ScmV #t)) (list 'false (ScmV #f))) (EmptyEnv))) ;;; ================================================================== ;;; Evaluation (: eval : TOY ENV -> VAL) ;; evaluates TOY expressions. (define (eval expr env) ;; convenient helper (: eval* : TOY -> VAL) (define (eval* expr) (eval expr env)) (cases expr [(Num n) (ScmV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (eval* fun-expr)] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (eval* cond-expr) [(ScmV v) v] ; Scheme value => use as boolean [else #t]) ; other values are always true then-expr else-expr))])) (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let ([result (eval (parse str) global-environment)]) (cases result [(ScmV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) ;;; ================================================================== ;;; Tests (test (run "{{fun {x} {+ x 1}} 4}") => 5) (test (run "{bind {{add3 {fun {x} {+ x 3}}}} {add3 1}}") => 4) (test (run "{bind {{add3 {fun {x} {+ x 3}}} {add1 {fun {x} {+ x 1}}}} {bind {{x 3}} {add1 {add3 x}}}}") => 7) (test (run "{bind {{identity {fun {x} x}} {foo {fun {x} {+ x 1}}}} {{identity foo} 123}}") => 124) (test (run "{bind {{x 3}} {bind {{f {fun {y} {+ x y}}}} {bind {{x 5}} {f 4}}}}") => 7) (test (run "{{{fun {x} {x 1}} {fun {x} {fun {y} {+ x y}}}} 123}") => 124) ;; More tests for complete coverage (test (run "{bind x 5 x}") =error> "bad `bind' syntax") (test (run "{fun x x}") =error> "bad `fun' syntax") (test (run "{if x}") =error> "bad `if' syntax") (test (run "{}") =error> "bad syntax") (test (run "{bind {{x 5} {x 5}} x}") =error> "bind* duplicate names") (test (run "{fun {x x} x}") =error> "fun* duplicate names") (test (run "{+ x 1}") =error> "no binding for") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{1 2}") =error> "with a non-function") (test (run "{{fun {x} x}}") =error> "arity mismatch") (test (run "{if {< 4 5} 6 7}") => 6) (test (run "{if {< 5 4} 6 7}") => 7) (test (run "{if + 6 7}") => 6) (test (run "{fun {x} x}") =error> "returned a bad value") ;;; ================================================================== ---------------------------------------------------------------------- ======================================================================== >>> "Compilation" and Partial Evaluation Instead of interpreting an expression, which is performing a full evaluation, we can think about "compiling" it, which is translating it to a different language which we can later run more easily. Another feature that is usually associated with compilation is that a lot more work was done at the compilation stage, making the actual running of the code faster. For example, translating an AST into one that has de-Bruijn indices instead of identifier names is a form of compilation -- not only is it translating one language into another, it does the work involved in name lookup before the program starts running. This is something that we can experiment with now. An easy way to achieve this is to start with our evaluation function: (: eval : TOY ENV -> VAL) ;; evaluates TOY expressions. (define (eval expr env) ;; convenient helper (: eval* : TOY -> VAL) (define (eval* expr) (eval expr env)) (cases expr [(Num n) (ScmV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (eval* fun-expr)] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (eval* cond-expr) [(ScmV v) v] ; Scheme value => use as boolean [else #t]) ; other values are always true then-expr else-expr))])) and change it so it compiles a given expression to a Scheme function. (This is, of course, just to demonstrate a conceptual point, it is only the tip of what compilers actually do...) This means that we need to turn it into a function that receives a TOY expression and compiles it. In other words, `eval' no longer consumes and environment argument which makes sense because the environment is a place to hold run-time values, so it is a data structure that is not part of the compiler (it is usually represented as the call stack). So we split the two arguments into a compile-time and run-time, which can be done by simply currying the `eval' function -- here this is done, and all calls to `eval' are also curried: (: eval : TOY -> (ENV -> VAL)) ; <-- note the curried type ;; evaluates TOY expressions. (define (eval expr) (lambda (env) ;; convenient helper (: eval* : TOY -> VAL) (define (eval* expr) ((eval expr) env)) (cases expr [(Num n) (ScmV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) ((eval bound-body) (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (eval* fun-expr)] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) ((eval body) (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (eval* cond-expr) [(ScmV v) v] ; Scheme value => use as boolean [else #t]) ; other values are always true then-expr else-expr))]))) We also need to change the `eval' call in the main `run' function: (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let ([result ((eval (parse str)) global-environment)]) (cases result [(ScmV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) Not much has changed so far. Note that in the general case of a compiler we need to run a program several times, so we'd want to avoid parsing it over and over again. We can do that by keeping a single parsed AST of the input. Now we went one step further by making it possible to do more work ahead and keep the result of the first "stage" of eval around (except that "more work" is really not saying much at the moment): (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let* ([compiled (eval (parse str))] [result (compiled global-environment)]) (cases result [(ScmV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) At this point, even though our "compiler" is not much more than a slightly different representation of the same functionality, we rename `eval' to `compile' which is a more appropriate description of what we intend it to do (so we change the purpose statement too): (: compile : TOY -> (ENV -> VAL)) ;; compiles TOY expressions to Scheme functions. (define (compile expr) (lambda (env) (: compile* : TOY -> VAL) (define (compile* expr) ((compile expr) env)) (cases expr [(Num n) (ScmV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) ((compile bound-body) (extend names (map compile* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (compile* fun-expr)] [arg-vals (map compile* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) ((compile body) (extend names arg-vals fun-env))] [else (error 'call ; this is *not* a compilation error "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (compile* (if (cases (compile* cond-expr) [(ScmV v) v] ; Scheme value => use as boolean [else #t]) ; other values are always true then-expr else-expr))]))) (: run : String -> Any) ;; evaluate a TOY program contained in a string (define (run str) (let* ([compiled (compile (parse str))] [result (compiled global-environment)]) (cases result [(ScmV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) ========================================================================