2013-03-13 - Lazy Evaluation: Shell Examples - Lazy Evaluation: Programming Examples - Side Note: Similarity with Generators and Channels - Call by Need vs Call by Name - Example of Feature Embedding - Implementing Laziness (in plain Racket) - Sloth: A Lazy Evaluator ======================================================================== >>> Lazy Evaluation: Shell Examples There is a very simple and elegant principle in shell programming -- we get a single data type, a character stream, and many small functions, each doing a single simple job. With these small building blocks, we can construct more sequences that achieve more complex tasks, for example -- a sorted frequency table of lines in a file: sort foo | uniq -c | sort -nr This is very much like a programming language -- we get small blocks, and build stuff out of them. Of course there are swiss army knives like awk that try to do a whole bunch of stuff, (the same attitude that brought Perl to the world...) and even these respect the "stream" data type. For example, a simple "{ print $1 }" statement will work over all lines, one by one, making it a program over an infinite input stream, which is what happens in reality in something like: cat /dev/console | awk ... But there is something else in shell programming that makes so effective: it is implementing a sort of a lazy evaluation. For example, compare this: cat foo | awk '{ print $1+$2; }' | uniq to: cat foo | awk '{ print $1+$2; }' | uniq | head -5 Each element in the pipe is doing its own small job, and it is always doing just enough to feed its output. Each basic block is designed to work even on infinite inputs! (Even sort works on unlimited inputs...) (Soon we will see a stronger connection with lazy evaluation.) As an aside: (Alan Perlis) "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures"... But the uniformity comes at a price: the biggest problem shells have is in their lack of a recursive structure, contaminating the world with way too many hacked up solutions. More than that, it is extremely inefficient and usually leads to data being re-parsed over and over and over -- each small Unix command needs to always output stuff that is human readable, but the next command in the pipe will need to re-parse that, eg, rereading decimal numbers. If you look at pipelines as composing functions, then a pipe of numeric commands translates to something like: itoa(baz(atoi(itoa(bar(atoi(itoa(foo(atoi(inp))))))))) and it is impossible to get rid of the redundant "atoi(itoa(...))"s. ======================================================================== >>> Lazy Evaluation: Programming Examples We already know that when we use lazy evaluation, we are guaranteed to have more robust programs. For example, a function like: (define (my-if x y z) (if x y z)) is completely useless in Racket because all functions are eager, but in a lazy language, it would behave exactly like the real if. Note that we still need *some* primitive conditional, but this primitive can be a function (and it is, in Lazy Racket). But we get more than that. If we have a lazy language, then *computations* are pushed around as if they were values (computations, because these are expressions that are yet to be evaluated). In fact, there is no distinction between computations and values, it just happens that some values contain "computational promises", things that will do something in the future. To see how this happens, we write a simple program to compute the (infinite) list of prime numbers using the sieve of Eratosthenes. To do this, we begin by defining the list of all natural numbers: (define nats (cons 1 (map add1 nats))) And now define a `sift' function: it receives an integer `n' and an infinite list of integers `l', and returns a list without the numbers that can be divided by `n'. This is simple to write using `filter': (define (sift n l) (filter (lambda (x) (not (divides? n x))) l)) and it requires a definition for `divides?' -- we use Racket's `modulo' for this: (define (divides? n m) (zero? (modulo m n))) Now, a `sieve' is a function that consumes a list that begins with a prime number, and returns the prime numbers from this list. To do this, it returns a list that has the same first number, and for its tail it sifts out numbers that are divisible by the first from the original list's tail, and calls itself recursively on the result: (define (sieve l) (cons (first l) (sieve (sift (first l) (rest l))))) Finally, the list of prime numbers is the result of applying `sieve' on the list of numbers from 2. The whole program is now: ---------------------------------------------------------------------- #lang pl lazy (define nats (cons 1 (map add1 nats))) (define (divides? n m) (zero? (modulo m n))) (define (sift n l) (filter (lambda (x) (not (divides? n x))) l)) (define (sieve l) (cons (first l) (sieve (sift (first l) (rest l))))) (define primes (sieve (rest nats))) ---------------------------------------------------------------------- To see how this runs, we trace `modulo' to see which tests are being used. The effect of this is that each time `divides?' is actually required to return a value, we will see a line with its inputs, and its output. This output looks quite tricky -- things are computed only on a "need to know" basis, meaning that debugging lazy programs can be difficult, since things happen when they are needed which takes time to get used to. However, note that the program actually performs the same tests that you'd do using any eager-language implementation of the sieve of Eratosthenes, and the advantage is that we don't need to decide in advance how many values we want to compute -- all values will be computed when you want to see the corresponding result. Implementing *this* behavior in an eager language is more difficult than a simple program, yet we don't need such complex code when we use lazy evaluation. Note that if we trace `divides?' we see results that are some promise struct -- these are unevaluated expressions, and they point at the fact that when `divides?' is used, it doesn't really force its arguments -- this happens later when these results are forced. The analogy with shell programming using pipes should be clear now -- for example, we have seen this: cat foo | awk '{ print $1+$2; }' | uniq | head -5 The last `head -5' means that no computation is done on parts of the original file that are not needed. It is similar to a `(take 5 l)' expression in Lazy Racket. ======================================================================== >>> Side Note: Similarity with Generators and Channels Using infinite lists can often be similar to using channels (see the talk at http://www.youtube.com/watch?v=hB05UFqOtFA), and generators (as they exist in Python). Here are examples of both, note how similar they both are, and how similar they are to the above definition of `primes'. (But note that there is an important difference, can you see it? It has to be with whether a stream is reusable or not.) ---------------------------------------------------------------------- #lang racket (define-syntax-rule (bg expr ...) (thread (lambda () expr ...))) (define nats (let ([out (make-channel)]) (define (loop i) (channel-put out i) (loop (add1 i))) (bg (loop 1)) out)) (define (divides? n m) (zero? (modulo m n))) (define (filter pred c) (define out (make-channel)) (define (loop) (let ([x (channel-get c)]) (when (pred x) (channel-put out x)) (loop))) (bg (loop)) out) (define (sift n c) (filter (lambda (x) (not (divides? n x))) c)) (define (sieve c) (define out (make-channel)) (define (loop c) (define first (channel-get c)) (channel-put out first) (loop (sift first c))) (bg (loop c)) out) (define primes (begin (channel-get nats) (sieve nats))) (define (take n c) (if (zero? n) null (cons (channel-get c) (take (sub1 n) c)))) (take 10 primes) ---------------------------------------------------------------------- ---------------------------------------------------------------------- #lang racket (require racket/generator) (define nats (generator () (define (loop i) (yield i) (loop (add1 i))) (loop 1))) (define (divides? n m) (zero? (modulo m n))) (define (filter pred g) (generator () (define (loop) (let ([x (g)]) (when (pred x) (yield x)) (loop))) (loop))) (define (sift n g) (filter (lambda (x) (not (divides? n x))) g)) (define (sieve g) (define (loop g) (define first (g)) (yield first) (loop (sift first g))) (generator () (loop g))) (define primes (begin (nats) (sieve nats))) (define (take n g) (if (zero? n) null (cons (g) (take (sub1 n) g)))) (take 10 primes) ---------------------------------------------------------------------- ======================================================================== >>> Call by Need vs Call by Name Finally, note that on requiring different parts of the `primes', the same calls are not repeated. This indicates that our language implements "call by need" rather than "call by name": once an expression is forced, its value is remembered, so subsequent usages of this value do not require further computations. Using "call by name" means that we actually use expressions which can lead to confusing code. An old programming language that used this is Algol. A confusing example that demonstrates this evaluation strategy is: ---------------------------------------------------------------------- #lang algol60 begin integer procedure SIGMA(x, i, n); value n; integer x, i, n; begin integer sum; sum := 0; for i := 1 step 1 until n do sum := sum + x; SIGMA := sum; end; integer q; printnln(SIGMA(q*2-1, q, 7)); end ---------------------------------------------------------------------- `x' and `i' are arguments that are passed by name, which means that they can use the same memory location. This is called "aliasing", a problem that happens when pointers are involved (eg, pointers in C and `reference' arguments in C++). ======================================================================== >>> Example of Feature Embedding Another interesting behavior that we can now observe, is that the TOY evaluation rule for `with': eval({with {x E1} E2}) = eval(E2[eval(E1)/x]) is specifying an eager evaluator *only if* the language that this rule is written in is itself eager. Indeed, if we run the TOY interpreter in Lazy Racket (or other interpreters we have implemented), we can verify that running: (run "{bind {{x {/ 1 0}}} 1}") is perfectly fine -- the call to Racket's division is done in the evaluation of the TOY division expression, but since Lazy Racket is lazy, then if this value is never used then we never get to do this division! On the other hand, if we evaluate (run "{bind {{x {/ 1 0}}} {+ x 1}}") we do get an error when DrRacket tries to display the result, which forces strictness. Note how the arrows in DrRacket that show where the computation is are quite confusing: the computation seem to go directly to the point of the arithmetic operations (arith-op) since the rest of the evaluation that the evaluator performed was already done, and succeeded. The actual failure happens when we try to force the resulting promise which contains only the strict points in our code. ======================================================================== >>> Implementing Laziness (in plain Racket) [[[ PLAI Chapter 8 ]]] Generally, we know how lazy evaluation works when we use the substitution model. We even know that if we have: {bind {{x y}} {bind {{y 2}} {+ x y}}} then the result should be an error because we cannot substitute the `y' expression in because it will capture the `y' -- changing the binding structure. As an indication, the original expression contains a free reference to `y', which is exactly why we shouldn't substitute it. But what about: {bind {{x {+ 4 5}}} {bind {{y {+ x x}}} {bind {{z y}} {bind {{x 4}} z}}}} Evaluating this eagerly returns 18, we therefore expect any other evaluation (eager or lazy, using substitutions or environments) to return 18, because any of these options should not change the meaning of numbers, of addition, *or* of the scoping rules. For example, try using lazy evaluation with substitutions: {bind {{x {+ 4 5}}} {bind {{y {+ x x}}} {bind {{z y}} {bind {{x 4}} z}}}} --> {bind {{y {+ {+ 4 5} {+ 4 5}}}} {bind {{z y}} {bind {{x 4}} z}}} --> {bind {{z {+ {+ 4 5} {+ 4 5}}}} {bind {{x 4}} z}} --> {bind {{x 4}} {+ {+ 4 5} {+ 4 5}}} --> {+ {+ 4 5} {+ 4 5}} --> {+ 9 9} --> 18 And what about lazy evaluation using environments: {bind {{x {+ 4 5}}} {bind {{y {+ x x}}} {bind {{z y}} {bind {{x 4}} z}}}} --> {bind {{y {+ x x}}} {bind {{z y}} {bind {{x 4}} z}}} [x:={+ 4 5}] --> {bind {{z y}} {bind {{x 4}} z}} [x:={+ 4 5}, y:={+ x x}] --> {bind {{x 4}} z} [x:={+ 4 5}, y:={+ x x}, z:=y] --> z [x:=4, y:={+ x x}, z:=y] --> y [x:=4, y:={+ x x}, z:=y] --> {+ x x} [x:=4, y:={+ x x}, z:=y] --> {+ 4 4} [x:=4, y:={+ x x}, z:=y] --> 8 [x:=4, y:={+ x x}, z:=y] We have a problem! This problem should be familiar now, it is very similar to the problem that led us down the mistaken path of dynamic scoping when we tried to have first-class functions. In both cases, substitution always worked, and it looks like in both cases the problem is that we don't remember the environment of an expression: in the case of functions, it is the environment at the time of creating the closure that we want to capture and use when we go back later to evaluate the body of the function. Here we have a similar situation, except that we don't need a function to defer computation: *most* expressions get evaluated at some time in the future, so every time we defer such a computation we need to remember the lexical environment of the expression. This is the major point that will make things work again: every expression creates something like a closure -- an object that closes over an expression and an environment at the (lexical) place where that expression was used, and when we actually want to evaluate it later, we need to do it in the right lexical context. So it is like a closure except it doesn't need to be applied, and there are no arguments. In fact it is also a form of a closure -- instead of closing over a function body and an environment, it closes over any expression and an environment. (As we shall see, lazy evaluation is tightly related to using nullary functions: "thunks".) ======================================================================== >>> Sloth: A Lazy Evaluator So we implement this by creating such `closure' values for all expressions that are not evaluated right now. We begin with the Toy language, and rename it to "Sloth". We then add one more case to the data type of values which implements the new kind of expression closures, which contains the expression and its environment: (define-type VAL [RktV Any] [FunV (Listof Symbol) SLOTH ENV] [ExprV SLOTH ENV] ;*** new: expression and scope [PrimV ((Listof VAL) -> VAL)]) Where should we use the new `ExprV'? -- At any place where we want to be lazy and defer evaluating an expression for later. The two places in the interpreter where we want to delay evaluation is the named expressions in a bind form and the argument expressions in a function application. Both of these cases use the helper `eval*' function to do their evaluations, for example: [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] To delay these evaluations, we need to change `eval*' so it returns an expression closure instead of actually doing the evaluation -- change: (: eval* : SLOTH -> VAL) (define (eval* expr) (eval expr env)) to: (: eval* : SLOTH -> VAL) (define (eval* expr) (ExprV expr env)) Note how simple this change is -- instead of an `eval' function call, we create a value that contains the parts that would have been used in the `eval' function call. This value serves as a promise to do this evaluation (the `eval' call) later, if needed. (This is exactly why a Lazy Racket would make this a lazy evaluator: in it, *all* function calls are promises.) ======================================================================== Side note: this can be used in any case when you're using an eager language, and you want to delay some function call -- all you need to do is replace (using a C-ish syntax) int foo(int x, str y) { ...do some work... } with // rename `foo': int real_foo(int x, str y) { ...same work... } // `foo' is a delayed constructor, instead of a plain function struct delayed_foo { int x; str y; } delayed_foo foo(int x, str y) { return new delayed_foo(x, y); } now all calls to `foo' return a `delayed_foo' instance instead of an integer. Whenever we want to force the delayed promise, we can use this function: int force_foo(delayed_foo promise) { return real_foo(promise.x, promise.y); } You might even want to make sure that each such promise is evaluated exactly once -- this is simple to achieve by adding a cache field to the struct: int real_foo(int x, str y) { ...same work... } struct delayed_foo { int x; str y; bool is_computed; int result; } delayed_foo foo(int x, str y) { return new delayed_foo(x, y, false, 0); } int force_foo(delayed_foo promise) { if (!promise.is_computed) { promise.result = real_foo(promise.x, promise.y); promise.is_computed = true; } return promise.result; } As we will see shortly, this corresponds to switching from a call-by-name lazy language to a call-by-need one. ======================================================================== Back to our Sloth interpreter -- given the `eval*' change, we expect that `eval'-uating: {bind {{x 1}} x} will return: (ExprV (Num 1) ...the-global-environment...) and the same goes for `eval'-uating {{fun {x} x} 1} Similarly, evaluating {bind {{x {+ 1 2}}} x} should return (ExprV (Call (Id +) (Num 1) (Num 2)) ...the-global-environment...) But what about evaluating an expression like this one: {bind {{x 2}} {+ x x}} ? Using what we have so far, we will get to evaluate the body, which is a (Call ...) expression, but when we evaluate the arguments for this function call, we will get `ExprV' values -- so we will not be able to perform the addition. Instead, we will get an error from the function that `racket-func->prim-val' creates, due to the value being an `ExprV' instead of a `RktV'. What we really want is to actually add two *values*, not promises. So maybe distinguish the two applications -- treat PrimV differently from FunV closures? (: eval* : SLOTH -> VAL) (define (eval* expr) (ExprV expr env)) (: real-eval* : SLOTH -> VAL) (define (real-eval* expr) (eval expr env)) (cases expr ... [(Call fun-expr arg-exprs) (let ([fval (eval fun-expr env)] ;; move: [arg-vals (map eval* arg-exprs)] ) (cases fval [(PrimV proc) (proc (map real-eval* arg-exprs))] ; change [(FunV names body fun-env) (eval body (extend names (map eval* arg-exprs) fun-env))] ...))] ...) This still doesn't work -- the problem is that the function now gets a bunch of values, where some of these can still be `ExprV's because the evaluation itself can return such values... Another way to see this problem is to consider the code for evaluating an `If' conditional expression: [(If cond-expr then-expr else-expr) (eval* (if (cases (real-eval* cond-expr) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))] ...we need to take care of a possible ExprV here. What should we do? The obvious solution is to use `eval' if we get an `ExprV' value: [(If cond-expr then-expr else-expr) (eval* (if (cases (real-eval* cond-expr) [(RktV v) v] ; Racket value => use as boolean [(ExprV expr env) (eval expr env)] ; force a promise [else #t]) ; other values are always true then-expr else-expr))] Note how this translates back the data structure that represents a delayed `eval' promise back into a real `eval' call... Going back to our code for `Call', there is a problem with it -- the (define (real-eval* expr) (eval expr env)) will indeed evaluate the expression instead of lazily deferring this to the future, but this evaluation might itself return such lazy values. So we need to inspect the resulting value again, forcing the promise if needed: (define (real-eval* expr) (let ([v (eval expr env)]) (cases v [(ExprV expr env) (eval expr env)] [else v]))) But we *still* have a problem -- programs can get an arbitrarily long nested chains of `ExprV's that get forced to other `ExprV's. {bind {{x true}} {bind {{y x}} {bind {{z y}} {if z {foo} {bar}}}}} What we really need is to write a loop that keeps forcing promises over and over until it gets a proper non-`ExprV' value. (: strict : VAL -> VAL) ;; forces a (possibly nested) ExprV promise, returns a VAL that is ;; not an ExprV (define (strict v) (cases v [(ExprV expr env) (strict (eval expr env))] ; loop back [else v])) Note that it's close to `real-eval*', but there's no need to mix it with `eval'. The recursive call is important: we can never be sure that `eval' didn't return an ExprV promise, so we have to keep looping until we get a "real" value. Now we can change the evaluation of function calls to something more manageable: [(Call fun-expr arg-exprs) (let ([fval (strict (eval* fun-expr))] ;*** strict! [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc (map strict arg-vals))] ;*** strict! [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] The code is fairly similar to what we had previously -- the only difference is that we wrap a `strict' call where a proper value is needed -- the function value itself, and arguments to primitive functions. The `If' case is similar (note that it doesn't matter if `strict' is used with the result of `eval' or `eval*' (which returns an `ExprV')): [(If cond-expr then-expr else-expr) (eval* (if (cases (strict (eval* cond-expr)) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))] Note that, like before, we always return `#t' for non-`RktV' values -- this is because we know that the value there is never an `ExprV'. All we need now to get a working evaluator, is one more strictness point: the outermost point that starts our evaluation -- `run' -- needs to use `strict' to get a proper result value. (: run : String -> Any) ;; evaluate a SLOTH program contained in a string (define (run str) (let ([result (strict (eval (parse str) global-environment))]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) With this, all of the tests that we took from the Toy evaluator run successfully. To make sure that the interpreter is lazy, we can add a test that will fail if the language is strict: ;; Test laziness (test (run "{{fun {x} 1} {/ 9 0}}") => 1) (test (run "{{fun {x} 1} {{fun {x} {x x}} {fun {x} {x x}}}}") => 1) (test (run "{bind {{x {{fun {x} {x x}} {fun {x} {x x}}}}} 1}") => 1) [In fact, we can continue and replace all `eval' calls with `ExprV', leaving only the one call in `strict'. This doesn't make any difference, because the resulting promises will eventually be forced by `strict' anyway.] ======================================================================== >> Getting more from Sloth As we've seen, using `strict' in places where we need an actual value rather than a delayed promise is enough to get a working lazy evaluator. Our current implementation assumes that all primitive functions need strict values, therefore the argument values are all passed through the `strict' function -- but this is not always the case. Specifically, if we have constructor functions, then we don't need (and usually don't want) to force the promises. This is basically what allows us to use infinite lists in Lazy Racket: the fact that `list' and `cons' do not require forcing their arguments. To allow some primitive functions to consume strict values and some to leave them as is, we're going to change `racket-func->prim-val' and add a flag that indicates whether the primitive function is strict or not. Obviously, we also need to move the `strict' call around arguments to a primitive function application into the `racket-func->prim-val' generated function -- which simplifies the `Call' case in `eval' (we go from (proc (map strict arg-vals)) back to (proc arg-vals)). The new code for `racket-func->prim-val' is: (: racket-func->prim-val : Function Boolean -> VAL) ;; converts a racket function to a primitive evaluator function ... (define (racket-func->prim-val racket-func strict?) (let ([list-func (make-untyped-list-function racket-func)]) (PrimV (lambda (args) (let ([args (if strict? (map (lambda: ([a : VAL]) (let ([v (strict a)]) (cases v [(RktV x) x] [else (error 'racket-func "bad input: ~s" v)]))) args) args)]) ;*** use values as is! (RktV (list-func args))))))) We now need to annotate the primitives in the global environment, as well as add a few constructors: ;; The global environment has a few primitives: (: global-environment : ENV) (define global-environment (FrameEnv (list (list '+ (racket-func->prim-val + #t)) (list '- (racket-func->prim-val - #t)) (list '* (racket-func->prim-val * #t)) (list '/ (racket-func->prim-val / #t)) (list '< (racket-func->prim-val < #t)) (list '> (racket-func->prim-val > #t)) (list '= (racket-func->prim-val = #t)) ;; note flags: (list 'cons (racket-func->prim-val cons #f)) (list 'list (racket-func->prim-val list #f)) (list 'first (racket-func->prim-val first #t)) (list 'rest (racket-func->prim-val rest #t)) (list 'null? (racket-func->prim-val null? #t)) ;; values (list 'true (RktV #t)) (list 'false (RktV #f)) (list 'null (RktV null))) (EmptyEnv))) Note that this last change raises a subtle type issue: we're actually abusing the Racket `list' and `cons' constructors to hold Sloth values. One way in which this becomes a problem is the current assumption that a primitive function always returns a Racket value (it is always wrapped in a `RktV') -- but this is no longer the case for `first' and `rest': when we use {cons 1 null} in Sloth, the resulting value will be (RktV (cons (ExprV (Num 1) ...) (ExprV (Id null) ...))) so if we try and grab the first value of this {first {cons 1 null}} we will eventually get back the `ExprV' and wrap it in a `RktV': (RktV (ExprV (Num 1) ...)) and finally `run' will strip off the `RktV' and return the `ExprV'. A solution to this is to make our `first' and `rest' functions return a value *without* wrapping it in a `RktV' -- we can identify this situation by the fact that the returned value is already a VAL instead of some other Racket value. We can identify such values with the `VAL?' predicate that gets defined by our `define-type': (: racket-func->prim-val : Function Boolean -> VAL) ;; converts a racket function to a primitive evaluator function ... (define (racket-func->prim-val racket-func strict?) (let ([list-func (make-untyped-list-function racket-func)]) (PrimV (lambda (args) (let* ([args (if strict? (map (lambda: ([a : VAL]) (let ([v (strict a)]) (cases v [(RktV x) x] [else (error 'racket-func "bad input: ~s" v)]))) args) args)] [result (list-func args)]) ;; Because there are non-strict constructors, ;; primitives like `first' might be returning promises ;; which are already VAL objects. (if (VAL? result) result (RktV result))))))) Note that we don't need to worry about the result being an `ExprV' -- that will eventually be taken care of by `strict'. The complete Sloth code follows. It can be used to do the same fun things we did with Lazy Racket. ---<<>>-------------------------------------------------------- #lang pl ;;; ================================================================== ;;; Syntax #| The BNF: ::= | | { bind {{ } ... } } | { fun { ... } } | { if } | { ... } |# ;; A matching abstract syntax tree datatype: (define-type SLOTH [Num Number] [Id Symbol] [Bind (Listof Symbol) (Listof SLOTH) SLOTH] [Fun (Listof Symbol) SLOTH] [Call SLOTH (Listof SLOTH)] [If SLOTH SLOTH SLOTH]) (: unique-list? : (Listof Any) -> Boolean) ;; Tests whether a list is unique, used to guard Bind and Fun values. (define (unique-list? xs) (or (null? xs) (and (not (member (first xs) (rest xs))) (unique-list? (rest xs))))) (: parse-sexpr : Sexpr -> SLOTH) ;; to convert s-expressions into SLOTHs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'bind more) (match sexpr [(list 'bind (list (list (symbol: names) (sexpr: nameds)) ...) body) (if (unique-list? names) (Bind names (map parse-sexpr nameds) (parse-sexpr body)) (error 'parse-sexpr "`bind' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `bind' syntax in ~s" sexpr)])] [(cons 'fun more) (match sexpr [(list 'fun (list (symbol: names) ...) body) (if (unique-list? names) (Fun names (parse-sexpr body)) (error 'parse-sexpr "`fun' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `fun' syntax in ~s" sexpr)])] [(cons 'if more) (match sexpr [(list 'if cond then else) (If (parse-sexpr cond) (parse-sexpr then) (parse-sexpr else))] [else (error 'parse-sexpr "bad `if' syntax in ~s" sexpr)])] [(list fun (sexpr: args) ...) ; other lists are applications (Call (parse-sexpr fun) (map parse-sexpr args))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> SLOTH) ;; Parses a string containing an SLOTH expression to a SLOTH AST. (define (parse str) (parse-sexpr (string->sexpr str))) ;;; ================================================================== ;;; Values and environments (define-type ENV [EmptyEnv] [FrameEnv FRAME ENV]) (define-type VAL [RktV Any] [FunV (Listof Symbol) SLOTH ENV] [ExprV SLOTH ENV] [PrimV ((Listof VAL) -> VAL)]) ;; a frame is an association list of names and values. (define-type FRAME = (Listof (List Symbol VAL))) (: extend : (Listof Symbol) (Listof VAL) ENV -> ENV) ;; extends an environment with a new frame. (define (extend names values env) (if (= (length names) (length values)) (FrameEnv (map (lambda: ([name : Symbol] [val : VAL]) (list name val)) names values) env) (error 'extend "arity mismatch for names: ~s" names))) (: lookup : Symbol ENV -> VAL) ;; looks for a name in an environment, searching through each frame. (define (lookup name env) (cases env [(EmptyEnv) (error 'lookup "no binding for ~s" name)] [(FrameEnv frame rest) (let ([cell (assq name frame)]) (if cell (second cell) (lookup name rest)))])) (: racket-func->prim-val : Function Boolean -> VAL) ;; converts a racket function to a primitive evaluator function which ;; is a PrimV holding a ((Listof VAL) -> VAL) function. (the ;; resulting function will use the list function as is, and it is the ;; list function's responsibility to throw an error if it's given a ;; bad number of arguments or bad input types.) (define (racket-func->prim-val racket-func strict?) (let ([list-func (make-untyped-list-function racket-func)]) (PrimV (lambda (args) (let* ([args (if strict? (map (lambda: ([a : VAL]) (let ([v (strict a)]) (cases v [(RktV x) x] [else (error 'racket-func "bad input: ~s" v)]))) args) args)] [result (list-func args)]) ;; Because there are non-strict constructors, ;; primitives like `first' might be returning promises ;; which are already VAL objects. (if (VAL? result) result (RktV result))))))) ;; The global environment has a few primitives: (: global-environment : ENV) (define global-environment (FrameEnv (list (list '+ (racket-func->prim-val + #t)) (list '- (racket-func->prim-val - #t)) (list '* (racket-func->prim-val * #t)) (list '/ (racket-func->prim-val / #t)) (list '< (racket-func->prim-val < #t)) (list '> (racket-func->prim-val > #t)) (list '= (racket-func->prim-val = #t)) ;; note flags: (list 'cons (racket-func->prim-val cons #f)) (list 'list (racket-func->prim-val list #f)) (list 'first (racket-func->prim-val first #t)) (list 'rest (racket-func->prim-val rest #t)) (list 'null? (racket-func->prim-val null? #t)) ;; values (list 'true (RktV #t)) (list 'false (RktV #f)) (list 'null (RktV null))) (EmptyEnv))) ;;; ================================================================== ;;; Evaluation (: strict : VAL -> VAL) ;; forces a (possibly nested) ExprV promise, returns a VAL that is ;; not an ExprV (define (strict v) (cases v [(ExprV expr env) (strict (eval expr env))] [else v])) (: eval : SLOTH ENV -> VAL) ;; evaluates SLOTH expressions. (define (eval expr env) ;; convenient helper (: eval* : SLOTH -> VAL) (define (eval* expr) (ExprV expr env)) (cases expr [(Num n) (RktV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (strict (eval* fun-expr))] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (strict (eval* cond-expr)) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))])) (: run : String -> Any) ;; evaluate a SLOTH program contained in a string (define (run str) (let ([result (strict (eval (parse str) global-environment))]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) ;;; ================================================================== ;;; Tests (test (run "{{fun {x} {+ x 1}} 4}") => 5) (test (run "{bind {{add3 {fun {x} {+ x 3}}}} {add3 1}}") => 4) (test (run "{bind {{add3 {fun {x} {+ x 3}}} {add1 {fun {x} {+ x 1}}}} {bind {{x 3}} {add1 {add3 x}}}}") => 7) (test (run "{bind {{identity {fun {x} x}} {foo {fun {x} {+ x 1}}}} {{identity foo} 123}}") => 124) (test (run "{bind {{x 3}} {bind {{f {fun {y} {+ x y}}}} {bind {{x 5}} {f 4}}}}") => 7) (test (run "{{{fun {x} {x 1}} {fun {x} {fun {y} {+ x y}}}} 123}") => 124) ;; More tests for complete coverage (test (run "{bind x 5 x}") =error> "bad `bind' syntax") (test (run "{fun x x}") =error> "bad `fun' syntax") (test (run "{if x}") =error> "bad `if' syntax") (test (run "{}") =error> "bad syntax") (test (run "{bind {{x 5} {x 5}} x}") =error> "bind* duplicate names") (test (run "{fun {x x} x}") =error> "fun* duplicate names") (test (run "{+ x 1}") =error> "no binding for") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{1 2}") =error> "with a non-function") (test (run "{{fun {x} x}}") =error> "arity mismatch") (test (run "{if {< 4 5} 6 7}") => 6) (test (run "{if {< 5 4} 6 7}") => 7) (test (run "{if + 6 7}") => 6) (test (run "{fun {x} x}") =error> "returned a bad value") ;; Test laziness (test (run "{{fun {x} 1} {/ 9 0}}") => 1) (test (run "{{fun {x} 1} {{fun {x} {x x}} {fun {x} {x x}}}}") => 1) (test (run "{bind {{x {{fun {x} {x x}} {fun {x} {x x}}}}} 1}") => 1) ;; Test lazy constructors (test (run "{bind {{l {list 1 {/ 9 0} 3}}} {+ {first l} {first {rest {rest l}}}}}") => 4) ;;; ================================================================== ---------------------------------------------------------------------- ========================================================================