2013-03-15 - Implementing Call by Need - Side Effects in a Lazy Language - Designing Domain Specific Languages (DSLs) ======================================================================== >>> Implementing Call by Need As we have seen, there are a number of advantages for lazy evaluation, but its main disadvantage is the fact that it is extremely inefficient, to the point of rendering lots of programs impractical, for example, in: {bind {{x {+ 4 5}}} {bind {{y {+ x x}}} y}} we end up adding 4 and 5 twice. In other words, we don't suffer from textual redundancy (each expression is written once), but we don't avoid dynamic redundancy. We can get it back by simply caching evaluation results, using a box that will be used to remember the results. The box will initially hold `#f', and it will change to hold the VAL that results from evaluation: (define-type VAL [RktV Any] [FunV (Listof Symbol) SLOTH ENV] [ExprV SLOTH ENV (Boxof (U #f VAL))] ;*** new: mutable cache field [PrimV ((Listof VAL) -> VAL)]) We need a utility function to create an evaluation promise, because when an ExprV is created, its initial cache box needs to be initialized. (: eval-promise : SLOTH ENV -> VAL) ;; used instead of `eval' to create an evaluation promise (define (eval-promise expr env) (ExprV expr env (box #f))) (And note that Typed Racket needs to figure out that the `#f' in this definition has a type of (U #f VAL) and not just `#f'.) This `eval-promise' is used instead of `ExprV' in eval. Finally, whenever we force such an ExprV promise, we need to check if it was already evaluated, otherwise force it and cache the result. This is simple to do since there is a single field that is used both as a flag and a cached value: (: strict : VAL -> VAL) ;; forces a (possibly nested) ExprV promise, returns a VAL that is ;; not an ExprV (define (strict v) (cases v [(ExprV expr env cache) (or (unbox cache) (let ([val (strict (eval expr env))]) (set-box! cache val) val))] [else v])) But note that this makes using side-effects in our interpreter even more confusing. (It was true with call-by-name too.) The resulting code follows. ---<<>>------------------------------------------------- ;; A call-by-need version of the SLOTH interpreter #lang pl ;;; ================================================================== ;;; Syntax #| The BNF: ::= | | { bind {{ } ... } } | { fun { ... } } | { if } | { ... } |# ;; A matching abstract syntax tree datatype: (define-type SLOTH [Num Number] [Id Symbol] [Bind (Listof Symbol) (Listof SLOTH) SLOTH] [Fun (Listof Symbol) SLOTH] [Call SLOTH (Listof SLOTH)] [If SLOTH SLOTH SLOTH]) (: unique-list? : (Listof Any) -> Boolean) ;; Tests whether a list is unique, used to guard Bind and Fun values. (define (unique-list? xs) (or (null? xs) (and (not (member (first xs) (rest xs))) (unique-list? (rest xs))))) (: parse-sexpr : Sexpr -> SLOTH) ;; to convert s-expressions into SLOTHs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(cons 'bind more) (match sexpr [(list 'bind (list (list (symbol: names) (sexpr: nameds)) ...) body) (if (unique-list? names) (Bind names (map parse-sexpr nameds) (parse-sexpr body)) (error 'parse-sexpr "`bind' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `bind' syntax in ~s" sexpr)])] [(cons 'fun more) (match sexpr [(list 'fun (list (symbol: names) ...) body) (if (unique-list? names) (Fun names (parse-sexpr body)) (error 'parse-sexpr "`fun' got duplicate names: ~s" names))] [else (error 'parse-sexpr "bad `fun' syntax in ~s" sexpr)])] [(cons 'if more) (match sexpr [(list 'if cond then else) (If (parse-sexpr cond) (parse-sexpr then) (parse-sexpr else))] [else (error 'parse-sexpr "bad `if' syntax in ~s" sexpr)])] [(list fun (sexpr: args) ...) ; other lists are applications (Call (parse-sexpr fun) (map parse-sexpr args))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> SLOTH) ;; Parses a string containing an SLOTH expression to a SLOTH AST. (define (parse str) (parse-sexpr (string->sexpr str))) ;;; ================================================================== ;;; Values and environments (define-type ENV [EmptyEnv] [FrameEnv FRAME ENV]) (define-type VAL [RktV Any] [FunV (Listof Symbol) SLOTH ENV] [ExprV SLOTH ENV (Boxof (U #f VAL))] [PrimV ((Listof VAL) -> VAL)]) ;; a frame is an association list of names and values. (define-type FRAME = (Listof (List Symbol VAL))) (: extend : (Listof Symbol) (Listof VAL) ENV -> ENV) ;; extends an environment with a new frame. (define (extend names values env) (if (= (length names) (length values)) (FrameEnv (map (lambda: ([name : Symbol] [val : VAL]) (list name val)) names values) env) (error 'extend "arity mismatch for names: ~s" names))) (: lookup : Symbol ENV -> VAL) ;; looks for a name in an environment, searching through each frame. (define (lookup name env) (cases env [(EmptyEnv) (error 'lookup "no binding for ~s" name)] [(FrameEnv frame rest) (let ([cell (assq name frame)]) (if cell (second cell) (lookup name rest)))])) (: racket-func->prim-val : Function Boolean -> VAL) ;; converts a racket function to a primitive evaluator function which ;; is a PrimV holding a ((Listof VAL) -> VAL) function. (the ;; resulting function will use the list function as is, and it is the ;; list function's responsibility to throw an error if it's given a ;; bad number of arguments or bad input types.) (define (racket-func->prim-val racket-func strict?) (let ([list-func (make-untyped-list-function racket-func)]) (PrimV (lambda (args) (let* ([args (if strict? (map (lambda: ([a : VAL]) (let ([v (strict a)]) (cases v [(RktV x) x] [else (error 'racket-func "bad input: ~s" v)]))) args) args)] [result (list-func args)]) ;; Because there are non-strict constructors, ;; primitives like `first' might be returning promises ;; which are already VAL objects. (if (VAL? result) result (RktV result))))))) ;; The global environment has a few primitives: (: global-environment : ENV) (define global-environment (FrameEnv (list (list '+ (racket-func->prim-val + #t)) (list '- (racket-func->prim-val - #t)) (list '* (racket-func->prim-val * #t)) (list '/ (racket-func->prim-val / #t)) (list '< (racket-func->prim-val < #t)) (list '> (racket-func->prim-val > #t)) (list '= (racket-func->prim-val = #t)) ;; note flags: (list 'cons (racket-func->prim-val cons #f)) (list 'list (racket-func->prim-val list #f)) (list 'first (racket-func->prim-val first #t)) (list 'rest (racket-func->prim-val rest #t)) (list 'null? (racket-func->prim-val null? #t)) ;; values (list 'true (RktV #t)) (list 'false (RktV #f)) (list 'null (RktV null))) (EmptyEnv))) ;;; ================================================================== ;;; Evaluation (: eval-promise : SLOTH ENV -> VAL) ;; used instead of `eval' to create an evaluation promise (define (eval-promise expr env) (ExprV expr env (box #f))) (: strict : VAL -> VAL) ;; forces a (possibly nested) ExprV promise, returns a VAL that is ;; not an ExprV (define (strict v) (cases v [(ExprV expr env cache) (or (unbox cache) (let ([val (strict (eval expr env))]) (set-box! cache val) val))] [else v])) (: eval : SLOTH ENV -> VAL) ;; evaluates SLOTH expressions. (define (eval expr env) ;; convenient helper (: eval* : SLOTH -> VAL) (define (eval* expr) (eval-promise expr env)) (cases expr [(Num n) (RktV n)] [(Id name) (lookup name env)] [(Bind names exprs bound-body) (eval bound-body (extend names (map eval* exprs) env))] [(Fun names bound-body) (FunV names bound-body env)] [(Call fun-expr arg-exprs) (let ([fval (strict (eval* fun-expr))] [arg-vals (map eval* arg-exprs)]) (cases fval [(PrimV proc) (proc arg-vals)] [(FunV names body fun-env) (eval body (extend names arg-vals fun-env))] [else (error 'eval "function call with a non-function: ~s" fval)]))] [(If cond-expr then-expr else-expr) (eval* (if (cases (strict (eval* cond-expr)) [(RktV v) v] ; Racket value => use as boolean [else #t]) ; other values are always true then-expr else-expr))])) (: run : String -> Any) ;; evaluate a SLOTH program contained in a string (define (run str) (let ([result (strict (eval (parse str) global-environment))]) (cases result [(RktV v) v] [else (error 'run "evaluation returned a bad value: ~s" result)]))) ;;; ================================================================== ;;; Tests (test (run "{{fun {x} {+ x 1}} 4}") => 5) (test (run "{bind {{add3 {fun {x} {+ x 3}}}} {add3 1}}") => 4) (test (run "{bind {{add3 {fun {x} {+ x 3}}} {add1 {fun {x} {+ x 1}}}} {bind {{x 3}} {add1 {add3 x}}}}") => 7) (test (run "{bind {{identity {fun {x} x}} {foo {fun {x} {+ x 1}}}} {{identity foo} 123}}") => 124) (test (run "{bind {{x 3}} {bind {{f {fun {y} {+ x y}}}} {bind {{x 5}} {f 4}}}}") => 7) (test (run "{{{fun {x} {x 1}} {fun {x} {fun {y} {+ x y}}}} 123}") => 124) ;; More tests for complete coverage (test (run "{bind x 5 x}") =error> "bad `bind' syntax") (test (run "{fun x x}") =error> "bad `fun' syntax") (test (run "{if x}") =error> "bad `if' syntax") (test (run "{}") =error> "bad syntax") (test (run "{bind {{x 5} {x 5}} x}") =error> "bind* duplicate names") (test (run "{fun {x x} x}") =error> "fun* duplicate names") (test (run "{+ x 1}") =error> "no binding for") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{+ 1 {fun {x} x}}") =error> "bad input") (test (run "{1 2}") =error> "with a non-function") (test (run "{{fun {x} x}}") =error> "arity mismatch") (test (run "{if {< 4 5} 6 7}") => 6) (test (run "{if {< 5 4} 6 7}") => 7) (test (run "{if + 6 7}") => 6) (test (run "{fun {x} x}") =error> "returned a bad value") ;; Test laziness (test (run "{{fun {x} 1} {/ 9 0}}") => 1) (test (run "{{fun {x} 1} {{fun {x} {x x}} {fun {x} {x x}}}}") => 1) (test (run "{bind {{x {{fun {x} {x x}} {fun {x} {x x}}}}} 1}") => 1) ;; Test lazy constructors (test (run "{bind {{l {list 1 {/ 9 0} 3}}} {+ {first l} {first {rest {rest l}}}}}") => 4) ;;; ================================================================== ---------------------------------------------------------------------- ======================================================================== >>> Side Effects in a Lazy Language We've seen that a lazy language without the call-by-need optimization is too slow to be practical, but the optimization makes using side-effects extremely confusing. Specifically, when we deal with side-effects (I/O, mutation, errors, etc) the order of evaluation matters, but in our interpreter expressions are getting evaluated as needed. (Remember tracing the prime-numbers code in Lazy Racket -- numbers are tested as needed, not in order.) If we can't do these things, the question is whether there is any point in using a purely functional lazy language at all -- since computer programs often interact with an imperative world. There is a solution for this: the lazy language does not have any (sane) facilities for *doing* things (like `printf' that prints something in plain Racket), but it can use a data structure that *describes* such operations. For example, in Lazy Racket we cannot print stuff sanely using `printf', but we can construct a string using `format' (which is just like `printf', except that it returns the formatted string instead of printing it). So (assuming Racket syntax for simplicity), instead of: (define (foo n) (printf "~s + 1 = ~s\n" n (+ n 1))) we will write: (define (foo n) (format "~s + 1 = ~s\n" n (+ n 1))) and get back a string. We can now change the way that our interpreter deals with the output value that it receives after evaluating a lazy expression: if it receives a string, then it can take that string as denoting a request for printout, and simply print it. Such an evaluator will do the printout when the lazy evaluation is done, and everything works fine because we don't try to use any side-effects in the lazy language -- we just describe the desired side-effects, and constructing such a description does not require *performing* side-effects. But this only solves printing a single string, and nothing else. If we want to print two strings, then the only thing we can do is concatenate the two strings -- but that is not only inefficient, it cannot describe infinite output (since we will not be able to construct the infinite string in memory). So we need a better way to chain several printout representations. One way to do so is to use a list of strings, but to make things a little easier to manage, we will create a type for I/O descriptions -- and populate it with one variant holding a string (for plain printout) and one for holding a chain of two descriptions (which can be used to construct an arbitrarily long sequence of descriptions): (define-type IO [Print String] [Begin2 IO IO]) Now we can use this to chain any number of printout representations by turning them into a single `Begin2' request, which is very similar to simply using a loop to print the list. For example, the eager printout code: (: print-list : (Listof A) -> Void) (define (print-list l) (if (null? l) (printf "\n") (begin (printf "~s " (first l)) (print-list (rest l))))) turns to the following code: (: print-list : (Listof A) -> IO) (define (print-list l) (if (null? l) (Print "\n") (Begin2 (Print (format "~s " (first l))) (print-list (rest l))))) This will basically scan an input list like the eager version, but instead of printing the list, it will convert it into a single output request that forms a recipe for this printout. Note that within the lazy world, the result of `print-list' is just a value, there are no side effects involved. Turning this value into the actual printout is something that needs to be done on the eager side, which must be part of the implementation. In the case of Lazy Racket, we have no access to the implementation, but we can do so in our Sloth implementation: again, `run' will inspect the result and either print a given string (if it gets a `Print' value), or print two things recursively (if it gets a `Begin2' value). (To implement this, we will add an `IOV' variant to the `VAL' type definition, and have it contain an `IO' description of the above type.) Because the sequence is constructed in the lazy world, it will not require allocating the whole sequence in memory -- it can be forced bits by bits (using `strict') as the imperative back-end (the `run' part of the implementation) follows the instructions in the resulting IO description. More concretely, it will also work on an infinite list: the translation of an infinite-loop printout function will be one that returns an infinite IO description tree of `Begin2' values. This loop will also force only what it needs to print and will go on recursively printing the whole sequence (possibly not terminating). For example (again, using Racket syntax), the infinite printout loop (: print-loop : -> Void) (define (print-loop) (printf "foo\n") (print-loop)) is translated into a function that returns an infinite tree of print operations: (: print-loop : -> IO) (define (print-loop) (Begin2 (Print "foo\n") (print-loop))) When this tree is converted to actions, it will result in an infinite loop that produces the same output -- it is essentially the same infinite loop, only now it's derived by an infinite description rather than an infinite process. Finally, how should we deal with inputs? We can add another variant to our type definition that represents a `read-line' operation, assuming that like `read-line' it does not require any arguments: (define-type IO [Print String] [ReadLine ] [Begin2 IO IO]) Now the eager implementation can invoke `read-line' when it encounters a `ReadLine' value -- but what should it do with the resulting string? The solution is to use a `receiver' function as part of the `ReadLine' operation description. This receiver value is a kind of a "continuation" of the computation, provided as a callback value -- it will get the string that was read on the terminal, and will return a new description of side-effects that represents the rest of the process: (define-type IO [Print String] [ReadLine (String -> IO)] [Begin2 IO IO]) Now, when the eager side sees a `ReadLine' value, it will read a line, and invoke the callback function with the string that it has read. By doing this, the control goes back to the lazy world to process the value and get back another IO value to continue the processing. This results in a process where the lazy code generates some IO descriptions, then the imperative side will execute it and control goes back to the lazy code, then back to the imperative side, etc. For example, this silly loop: (: silly-loop : -> Void) (define (silly-loop) (printf "What is your name? ") (let ([name (read-line)]) (if (equal? name "quit") (printf "bye\n") (begin (printf "Your name is ~s\n" name) (silly-loop))))) is translated to: (: silly-loop : -> IO) (define (silly-loop) (Begin2 (Print "What is your name? ") (ReadLine (lambda (name) (if (equal? name "quit") (Print "bye\n") (Begin2 (Print (format "Your name is ~s\n" name)) (silly-loop))))))) Using this strategy to implement side-effects is possible, and you will do that in the homework -- some technical details are going to be different but the principle is the same as discussed above. The last problem is that the above code is difficult to work with -- in the homework you will see how to use syntactic abstractions to make thing much simpler. ======================================================================== >>> Designing Domain Specific Languages (DSLs) [[[ PLAI Chapter 35 ]]] Programming languages differ in numerous ways: 1. Each uses different notations for writing down programs. As we've observed, however, syntax is only partially interesting. (This is, however, less true of languages that are trying to mirror the notation of a particular domain.) 2. Control constructs: for instance, early languages didn't even support recursion, while most modern languages still don't have continuations. 3. The kinds of data they support. Indeed, sophisticated languages like Racket blur the distinction between control and data by making fragments of control into data values (such as first-class functions and continuations). 4. The means of organizing programs: do they have functions, modules, classes, ...? 5. Automation such as memory management, run-time safety checks, and so on. Each of these items suggests natural questions to ask when you design your own languages in particular domains. Obviously, there are a lot of domain specific languages these days -- and that's not new. For example, four of the oldest languages were conceived as domain specific languages: Fortran -- Formula Translator Algol -- Algorithmic Language Cobol -- Common Business Oriented Language Lisp -- List Processing Only in the late 60s / early 70s languages began to get free from their special purpose domain and become "general purpose" languages (GPLs). These days, we usually use some GPL for our programs and often come up with small "domain specific" languages (DSLs) for specific jobs. The problem is designing such a specific language. There are lots of decisions to make, and as should be clear now, many ways of shooting your self in the foot. You need to know: * What is your domain? * What are the common notations in this domain (need to be convenient both for the machine and for humans)? * What do you expect to get from your DSL? (eg, performance gains when you know that you're dealing with a certain limited kind of functionality like arithmetics.) * Do you have any semantic reason for a new language? (For example, using special scoping rules, or a mixture of lazy and eager evaluation, maybe a completely different way of evaluation (eg, makefiles).) * Is your language expected to envelope other functionality (eg, shell scripts, TCL), perhaps throwing some functionality on a different language (makefiles and shell scripts), or is it going to be embedded in a bigger application (eg, PHP), or embedded in a way that exposes parts of an application to user automation (Emacs Lisp, Word Basic, Visual Basic for Office Application or Some Other Long List of Buzzwords). * If you have one language embedded in another enveloping language -- how do you handle syntax? How can they communicate (eg, share variables)? And very important: * Is there a benefit for implementing a DSL over using a GPL -- how much will your DSL grow (usually more than you think)? Will it get to a point where it will need the power of a full GPL? Do you want to risk doing this just to end up admitting that you need a "Real Language" and dump your solution for "Visual Basic for Applications"? => It might be useful to think ahead about things that you know you don't need, rather than things you need. To clarify why this can be applicable in more situations than you think, consider what programming languages are used for. One example that should not be ignored is using a programming language to implement a programming language -- for example, what we did so far (or any other interpreter or compiler). In the same way that some piece of code in a PL represent functions about the "real world", there are other programs that represent things in a language -- possibly even the same one. To make a side-effect-full example, the meaning of `one-brick' might abstract over laying a brick when making a wall -- it abstracts all the little details into a function: (define (one-brick wall brick-pile) (move-eye (location brick-pile)) (let ([pos (find-available-brick-position brick-pile)]) (move-hand pos) (grab-object)) (move-eye wall) (let ([pos (find-next-brick-position wall)]) (move-hand pos) (drop-object))) and we can now write (one-brick my-wall my-brick-pile) instead of all of the above. We might use that in a loop: (define (build-wall wall pile) (define (loop n) (when (< n 500) (one-brick my-wall my-brick-pile) (loop (add1 left)))) (loop 0)) This is a common piece of looping code that we've seen in many forms, and a common complaint of newcomers to functional languages is the lack of some kind of a loop. But once you know the template, writing such loops is easy -- and in fact, you can write code that would take something like: (define (build-wall wall pile) (loop-for i from 1 to 500 (one-brick wall pile))) and produce the previous code. Note the main point here: we switch from code that deals with bricks to code that deals with code. Now, a viable option for implementing a new DSL is to do so by transforming it into an existing language. Such a process is usually tedious and error prone -- tedious because you need to deal with the boring parts of a language (making a parser etc), and error prone because it's easy to generate bad code (especially when you're dealing with strings) and you get bad errors in terms of the translated code instead of the actual code, resorting to debugging the intermediate generated programs. Lisp languages traditionally have taken this idea one level further than other languages: instead of writing a new transformer for your language, you use the host language, but you extend and customize it by adding you own forms. ======================================================================== >> Side-note: WSJ on the proliferation of PLs "Computer Languages Multiply, Pleasing Many--But Not All" Wall Street Journal (12/14/05) P. B1; Gomes, Lee While the proliferation of languages has been a boon to software programmers, the extensive variety often frustrates their bosses and confounds the larger software companies. C and the subsequent C++ may be the most popular languages in use today, but any programmer working on the Web must also include languages such as Perl, Python, PHP, and TCL in his resume. The explosion has been partially fueled by the ability of an individual programmer or a small group to create and market a language, as was the case with Ruby on Rails, which became an overnight sensation thanks to a 15-minute demonstration video the Danish programmer David Hansson circulated over the Web. Once a language has gained a core following, blogs and Web sites appear to track its developments. Many languages owe their origins to small design firms trying to make a commercial success of themselves, while others are labors of love, as is the case with many open source projects. As new languages continue to emerge, however, more programmers are defecting from mainstream systems such as .NET and Java in favor of niche offerings that are more tailored to a specific project. CIOs are often assailed by complaints from their programmers when they try to impose restrictions on the number of languages that are permissible. While it has been demonstrated theoretically that each language is the rough equivalent of any other, it is no more likely for a consensus to appear within the programming community than it is for a single car to be met with a universal embrace from the entire fleet of motorists. ========================================================================