2010-01-22 - Semantics (= Evaluation) - Implementing an Evaluator - Introduction to Typed Scheme - Implementing The AE Language ======================================================================== >>> Semantics (= Evaluation) Back to BNF -- now, meaning. An important feature of these BNF specifications: we can use the derivations to specify *meaning* (and meaning in our context is "running" a program (or "interpreting", "compiling", but we will use "evaluating")). For example: ::= ; evaluates to the number | + ; evaluates to the sum of evaluating ; and | - ; ... the subtraction of from (... roughly!) To do this a little more formally: a. eval() = ; <-- special rule: moves syntax into a value b. eval( + ) = eval() + eval() c. eval( - ) = eval() - eval() Note the completely different roles of the two "+"s and "-"s. In fact, it might have been more correct to write: a. eval("") = b. eval(" + ") = eval("") + eval("") c. eval(" - ") = eval("") - eval("") or even using a marker to denote meta-holes in these strings: a. eval("$") = b. eval("$ + $") = eval("$") + eval("$") c. eval("$ - $") = eval("$") - eval("$") but we will avoid pretending that we're doing that kind of string manipulation. (For example, it will require specifying what does it mean to return for "$" (involves `string->number'), and the fragments on the right side mean that we need to specify these as substring operations.) Note that there's a similar kind of informality in our BNF specifications, where we assume that "" refers to some terminal or non-terminal. In texts where more formality is required (for example, in RFC specifications), each literal part of the BNF is usually marked with double quotes, so we'd get ::= | "+" | "-" An alternative popular notation for eval(X) is [[X]]: a. [[]] = b. [[ + ]] = [[]] + [[]] c. [[ - ]] = [[]] - [[]] Is there a problem with this definition? Ambiguity: eval(1 - 2 + 3) = ? Depending on the way the expression is parsed, we get either 2 or -4: eval(1 - 2 + 3) = eval(1 - 2) + eval(3) [b] = eval(1) - eval(2) + eval(3) [c] = 1 - 2 + 3 [a,a,a] = 2 eval(1 - 2 + 3) = eval(1) - eval(2 + 3) [c] = eval(1) - (eval(2) + eval(3)) [a] = 1 - (2 + 3) [a,a,a] = -4 Again, be very aware of confusing subtleties which are extremely important: We need parens around a sub-expression only in one case, why? -- When we write: eval(1 - 2 + 3) = ... = 1 - 2 + 3 we have two expressions, but one stands for an *input syntax*, and one stands for a `real' mathematical expression. In a case of a computer implementation, the syntax on the left is (as always) an AE syntax, and the `real' expression on the right is an expression in whatever language we use to implement our AE language. Like we said earlier, ambiguity is not a real problem until the actual parse tree matters. With `eval' it definitely matters, so we must not make it possible to derive any syntax in multiple ways or our evaluation will be non-deterministic. ======================================================================== Quick exercise: We can define a meaning for s and then s in a similar way: eval(0) = 0 eval(1) = 1 eval(2) = 2 ... eval(9) = 9 eval( ) = 10*eval() + eval() Is this exactly what we want? -- Depends on what we actually want... * There is a bug there -- having a BNF derivation like ::= | is unambiguous, but makes it hard to parse a number. Changing the order of the last rule works much better: ::= | and then: eval( ) = 10*eval() + eval() * Example for free stuff that looks trivial: if we were to define the meaning of this way, would it always work? Think an average language that does not give you bignums, making the above rules fail when the numbers are too big. In Scheme, we happen to be using an integer representation for the syntax of integers, and both are unlimited. But what if we wanted to write a Scheme compiler in C or a C compiler in Scheme? What about a C compiler in C, where the compiler runs on a 64 bit machine, and the result needs to run on a 32 bit machine? ======================================================================== >>> Implementing an Evaluator Now continue to implement the semantics of our syntax -- we express that through an `eval' function that evaluates an expression. We use a basic programming principle -- splitting the code into two layers, one for parsing the input, and one for doing the evaluation. Doing this avoids the mess we'd get into otherwise, for example: (define (eval sexpr) (match sexpr [(number: n) n] [(list '+ left right) (+ (eval left) (eval right))] [(list '- left right) (- (eval left) (eval right))] [else (error 'eval "bad syntax in ~s" sexpr)])) This is messy because it combines two very different things -- syntax and semantics -- into a single lump of code. For this particular kind of evaluator it looks simple enough, but this is only because it's simple enough that all we do is replace constructors by arithmetic operations. Later on things will get more complex, and bundling the evaluator with the parser will be more problematic. (Note: the fact that we can replace constructors with the run-time operators mean that we have a very simple, calculator-like language, and that we can, in face, "compile" all programs down to a number.) If we split the code, we can easily include decisions like making {+ 1 {- 3 "a"}} syntactically invalid. (Which is not, BTW, what Scheme does...) (Also, this is like the distinction between XML syntax and well-formed XML syntax.) An additional advantage is that by using two separate components, it is simple to replace each one, making it possible to change the input syntax, and the semantics independently -- we only need to keep the same interface data (the AST) and things will work fine. Our `parse' function converts an input syntax to an abstract syntax tree (AST). It is abstract exactly because it is independent of any actual concrete syntax that you type in, print out etc. ======================================================================== >>> Introduction to Typed Scheme The plan: * Why Types? * Why Typed Scheme? * What's Different about Typed Scheme? * Some Examples of Typed Scheme for Course Programs >> Types - Who has used a (statically) typed language? - Who has used a typed language that's not Java? Typed Scheme will be both similar to and very different from anything you've seen before. >> Why Types? - Types help structure programs. - Types provide enforced and mandatory documentation. - Types help catch errors. --> They *will* help you. A *lot*. >> Structuring Programs - Data definitions ;; An AE is one of: ; \ ;; (make-Num Number) ; > HtDP ;; (make-Add AE AE) ; / (define-type AE ; \ [Num (n number?)] ; > predicates (contracts) [Add (l AE?) (r AE?)]) ; / (define-type AE ; \ [Num (n Number)] ; > Typed Scheme [Add (l AE) (r AE)]) ; / - Data-first The structure of your program is derived from the structure of your data. You have seen this in 211 and 213 with the design recipe and with templates. In this class, we will see it extensively with type definitions and the (cases ...) form. Types make this pervasive -- we have to think about our data before our code. - A language for describing data Instead of having an informal language for describing types in contract lines, and a more formal description of predicates in a `define-type' form, we will have a single, unified language for both of these. Having such a language means that we get to be more precise and more expressive (since the typed language covers cases that you would otherwise dismiss with some hand waving, like "function"). >> Why Typed Scheme? Scheme is the language we all know, and it has the benefits that we discussed earlier. Mainly, it is an excellent language for experimenting with programming languages. - Typed Scheme allows us to take our Scheme programs and typecheck them, so we get the benefits of a statically typed language. - Types are an important programming language feature; Typed Scheme will help us understand them. [Also: the development of Typed Scheme is happening here in Northeastern, and will benefit from your feedback.] >> How is Typed Scheme different from Scheme - Typed Scheme will reject your program if there are type errors! - Typed Scheme files start like this: #lang typed-scheme ;; Program goes here. but we will use a variant of the Typed Scheme language, which has a few additional constructs: #lang pl ;; Program goes here. - Typed Scheme requires you to write the contracts on your functions. Scheme: ;; f : Number -> Number (define (f x) (* x (+ x 1))) Typed Scheme: #lang pl (: f : Number -> Number) (define (f x) (* x (+ x 1))) [In Typed Scheme you can also have the type annotations appear inside the definition: #lang pl (define: (f [x : Number]) : Number (* x (+ x 1))) but we will not use this form.] - As we've seen, Typed Scheme uses types, not predicates, in define-type. (define-type AE [Num (n Number)] [Add (l AE) (r AE)]) versus (define-type AE [Num (n number?)] [Add (l AE?) (r AE?)]) - There are other differences, but these will suffice for now. >> Examples (: digit-num : Number -> (U Number String)) (define (digit-num n) (cond [(<= n 9) 1] [(<= n 99) 2] [(<= n 999) 3] [(<= n 9999) 4] [else "a lot"])) (: fact : Number -> Number) (define (fact n) (if (zero? n) 1 (* n (fact (- n 1))))) (: helper : Number Number -> Number) (define (helper n acc) (if (zero? n) acc (helper (- n 1) (* acc n)))) (: fact : Number -> Number) (define (fact n) (helper n 1)) (: fact : Number -> Number) (define (fact n) (: helper : Number Number -> Number) (define (helper n acc) (if (zero? n) acc (helper (- n 1) (* acc n)))) (helper n 1)) (: every? : (All (A) ((A -> Boolean) (Listof A) -> Boolean))) ;; Returns false if any element of lst fails the given pred, true if ;; all pass pred. (define (every? pred lst) (or (null? lst) (and (pred (car lst)) (every? pred (cdr lst))))) (define-type AE [Num (n Number)] [Add (lhs AE) (rhs AE)] [Sub (lhs AE) (rhs AE)]) ;; the only difference in the following definition is ;; using (: : ) instead of ";; : " (: parse-sexpr : Sexpr -> AE) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(list '+ left right) (Add (parse-sexpr left) (parse-sexpr right))] [(list '- left right) (Sub (parse-sexpr left) (parse-sexpr right))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) >> More Interesting Examples * Typed Scheme is designed to be a language that is friendly to the kind of programs that people write in Scheme. For example, it has unions: (: foo : (U String Number) -> Number) (define (foo x) (if (string? x) (string-length x) ;; at this point it knows that `x' is not a string, therefore it ;; must be a number (+ 1 x))) This is not common in statically typed languages, which are usually limited to only "disjoint unions". For example, in OCaml you'd write this definition: type string_or_number = Str of string | Int of int ;; let foo x = match x with Str s -> String.length s | Int i -> i+1 ;; And use it with an explicit constructor: foo (Str "bar") ;; foo (Int 3) ;; * Typed Scheme has a concept of subtypes -- which is also something that most statically typed languages lack. In fact, the fact that it has (arbitrary) unions means that it must have subtypes too, since a type is always a subtype of a union that contains this type. * Another result of this feature is that there is an `Any' type that is the union of all other types. Note that you can always use this type since everything is in it -- but it gives you the *least* information about a value. * Another interesting thing to look at is the type of `error': it's a function that returns a type of (U) -- an *empty* union. This is a type that has no values in it -- it fits `error' because it doesn't return any value. In addition, it means that an `error' expression can be used anywhere you want because it is a subtype of anything at all. * An `else' clause in a `cond' expression is almost always needed, for example: (: digit-num : Number -> (U Number String)) (define (digit-num n) (cond [(<= n 9) 1] [(<= n 99) 2] [(<= n 999) 3] [(<= n 9999) 4] [(> n 9999) "a lot"])) (and if you think that the type checker should know what this is doing, then how about (> (* n 10) (/ (* (- 10000 1) 20) 2)) or (>= n 10000) for the last test?) * Note that typed scheme can keep track of information that is gathered via predicates: (: foo : (U String Number) -> Number) (define (foo x) (if (string? x) (string-length x) ;; at this point it knows that `x' is not a string, therefore it ;; must be a number (+ 1 x))) * In some rare cases you will run into one limitation of typed scheme: it is difficult (that is: a generic solution is not known at the moment) to do the right inference when polymorphic functions are passed around to higher-order functions. For example: (: call : (All (A B) ((A -> B) A -> B))) (define (call f x) (f x)) (call rest (list 4)) In such cases, we can use `inst' to "instantiate" a function with a polymorphic type to a given type -- in this case, we can use it to make it treat `rest' as a function that is specific for numeric lists: (call (inst rest Number) (list 4)) In other rare cases, typed scheme will infer a type that is not suitable for us -- there is another `ann' form that allows us to specify a certain type. Using this in the `call' example is more versbose: (call (ann rest : ((Listof Number) -> (Listof Number))) (list 4)) However, these are going to be rare and will be mentioned explicitly whenever they're needed. ======================================================================== >>> Implementing The AE Language Back to our `eval' -- this will be its (obvious) type: (: eval : AE -> Number) ;; consumes an AE and computes the corresponding number which leads to some obvious test cases: (equal? 3 (eval (parse "3"))) (equal? 7 (eval (parse "{+ 3 4}"))) (equal? 6 (eval (parse "{+ {- 3 4} 7}"))) which from now on we will write using the new `test' form that the `pl' language provides: (test (eval (parse "3")) => 3) (test (eval (parse "{+ 3 4}")) => 7) (test (eval (parse "{+ {- 3 4} 7}")) => 6) Like everything else, the structure of the recursive `eval' code follows the recursive structure of its input. The template is therefore: (: eval : AE -> Number) (define (eval expr) (cases expr [(Num n) ...] [(Add l r) ... (eval l) ... (eval r) ...] [(Sub l r) ... (eval l) ... (eval r) ...])) In this case, filling in the gaps is very simple (: eval : AE -> Number) (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))])) We can further combine `eval' and `parse' into a single `run' function that evaluates an AE string. (: run : String -> Number) ;; evaluate an AE program contained in a string (define (run str) (eval (parse str))) The resulting *full* code is: ---<<>>----------------------------------------------------------- #lang pl #| BNF for the AE language: ::= | { + } | { - } | { * } | { / } |# ;; AE abstract syntax trees (define-type AE [Num (n Number)] [Add (lhs AE) (rhs AE)] [Sub (lhs AE) (rhs AE)] [Mul (lhs AE) (rhs AE)] [Div (lhs AE) (rhs AE)]) (: parse-sexpr : Sexpr -> AE) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> AE) ;; parses a string containing an AE expression to an AE AST (define (parse str) (parse-sexpr (string->sexpr str))) (: eval : AE -> Number) ;; consumes an AE and computes the corresponding number (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))])) (: run : String -> Number) ;; evaluate an AE program contained in a string (define (run str) (eval (parse str))) ;; tests (test (run "3") => 3) (test (run "{+ 3 4}") => 7) (test (run "{+ {- 3 4} 7}") => 6) ---------------------------------------------------------------------- (Note that the tests are done with a `test' form, which we will talk about shortly.) For anyone who thinks that Scheme is a bad choice, this is a good point to think how much code would be needed in some other language to do the same as above. ========================================================================