2013-01-16 - Names are Important - Note on Types - BNF, Grammars, the Simple AE Language - Simple Parsing - The `match' Form - Semantics (= Evaluation) - Implementing an Evaluator - Implementing The AE Language - Introduction to Typed Racket ======================================================================== >>> Names are Important Side-note: An important "discovery" in computer science is that we *don't* need names for every intermediate sub-expression -- for example, in almost any language we can write the equivalent of: s = (-b + sqrt(b^2 - 4*a*c)) / 2a instead of x = b * b y = 4 * a y = y * c x = x - y x = sqrt(x) y = -b x = y + x y = 2 * a s = x / y Such languages are put in contrast to assembly languages, and were all put under the generic label of "high level languages". (Here's an interesting idea -- why not do the same for function values?) ======================================================================== The fact that in Racket we can use functions as values is very useful -- for example, `map', `foldl' & `foldr', many more. Example: ;; every? : (A -> Boolean) (Listof A) -> Boolean ;; Returns false if any element of lst fails the given pred, true if ;; all pass pred. (define (every? pred lst) (or (null? lst) (and (pred (first lst)) (every? pred (rest lst))))) ======================================================================== >>> Note on Types Types can become interesting when dealing with higher-order functions. For example, `map' receives a function and a list of some type, and applies the function over this list to accumulate its output, so its type is: ;; map : (A -> B) (Listof A) -> (Listof B) Actually, `map' can use more than a single list, it will apply the function on the first element in all lists, then the second and so on. So the type of `map' with two lists can be described as: ;; map : (A B -> C) (Listof A) (Listof B) -> (Listof C) Here's a hairy example -- what is the type of this function: (define (foo x y) (map map x y)) Begin by what we know -- both `map's, call them `map1' and `map2', have the double- and single-list types of `map' respectively, here they are, with different names for types: ;; the first `map', consumes a function and two lists map1 : (A B -> C) (Listof A) (Listof B) -> (Listof C) ;; the second `map', consumes a function and one list map2 : (X -> Y) (Listof X) -> (Listof Y) Now, we know that `map2' is the first argument to `map1', so the type of `map1's first argument should be the type of `map2': (A B -> C) = (X -> Y) (Listof X) -> (Listof Y) From here we can conclude that A = (X -> Y) B = (Listof X) C = (Listof Y) If we use these equations in `map1's type, we get: map1 : ((X -> Y) (Listof X) -> (Listof Y)) (Listof (X -> Y)) (Listof (Listof X)) -> (Listof (Listof Y)) Now, `foo's two arguments are the 2nd and 3rd arguments of `map1', and its result is `map1's result, so we can now write the type of `foo': ;; foo : (Listof (X -> Y)) ;; (Listof (Listof X)) ;; -> (Listof (Listof Y)) (define (foo x y) (map map x y)) This should help you understand why, for example, this will cause a type error: (foo (list add1 sub1 add1) (list 1 2 3)) and why this is valid: (foo (list add1 sub1 add1) (map list (list 1 2 3))) ======================================================================== >>> BNF, Grammars, the Simple AE Language Getting back to the theme of the course: we want to investigate programming languages, and we want to do that *using* a programming language. The first thing when we design a language is to specify the language. For this we use BNF (Backus-Naur Form). For example, here is the definition of a simple arithmetic language: ::= | + | - Explain the different parts. Specifically, this is a mixture of low-level (concrete) syntax definition with parsing. We use this to derive expressions in some language. We start with , which should be on of these: * a number * , the text "+", and another * the same but with "-" is a terminal: when we reach it in the derivation, we're done. is a non-terminal: when we reach it, we have to continue with one of the options. It should be clear that the "+" and the "-" are things we expect to find in the input -- because they are not wrapped in <>s. We could specify what is (turning it into a non-terminal): ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | But we don't -- why? Because in Racket we have numbers as primitives and we want to use Racket to implement our languages. This makes life a lot easier, and we get free stuff like floats, rationals etc. For example, we can use this to prove that "1-2+3" is a valid expression: + ; (2) + ; (1) - + ; (3) - + 3 ; (num) - + 3 ; (1) - + 3 ; (1) 1 - + 3 ; (num) 1 - 2 + 3 ; (num) This would be one way of doing this. Instead, we can can visualize the derivation using a tree, with the rules used at every node. (Leave this on -- later show that this removes some confusion but not all.) These specifications suffer from being ambiguous: an expression can be derived in multiple ways. Even the little syntax for a number is ambiguous -- a number like "123" can be derived in two ways that result in trees that look different. This ambiguity is not a "real" problem now, but it will become one very soon. We want to get rid of this ambiguity, so that there is a single (= deterministic) way to derive all expressions. There is a standard way to resolve that -- we add another non-terminal to the definition, and make it so that each rule can continue to exactly one of its alternatives. For example, this is what we can do with numbers: ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Similar solutions can be applied to the BNF -- we either restrict the way derivations can happen or we come up with new non-terminals to force a deterministic derivation trees. As an example of restricting derivations, we look at the current grammar: ::= | + | - and instead of allowing an on both sides of the operation, we force one to be a number: ::= | + | - Now there is a single way to derive any expression, and it is always associating operations to the right: an expression like "1+2+3" can only be derived as "1+(2+3)". To change this to left-association, we would use this: ::= | + | - But what if we want to force precedence? Say that our AE syntax has addition and multiplication: ::= | + | * We can do that same thing as above and add new non-terminals -- say one for "factors": ::= | + | ::= | * Now we must parse any AE expression as additions of multiplications (or numbers). First, note that if goes to and that goes to , then there is no need for an to go to a , so this is the same syntax: ::= + | ::= | * Now, if we want to still be able to multiply additions, we can force them to appear in parentheses: ::= + | ::= | * | ( ) Next, note that AE is still ambiguous about additions, which can be fixed by forcing the left hand side of an addition to be a factor: ::= + | ::= | * | ( ) We still have an ambiguity for multiplications, so we do the same thing and add another non-terminal for "atoms": ::= + | ::= * | ::= | ( ) And you can try to derive several expressions to be convinced that derivation is always deterministic now. But as you can see, this is exactly the cosmetics that we want to avoid -- it will lead us to things that might be interesting, but unrelated to the principles behind programming languages. It will also become much much worse when we have a real language rather such a tiny one. Is there a good solution? -- It is right in our face: do what Racket does -- always use fully parenthesized expressions: ::= | ( + ) | ( - ) To prevent confusing Racket code with code in our language(s), we also change the parentheses to curly ones: ::= | { + } | { - } But in Racket *everything* has a value -- including those `+'s and `-'s, which makes this extremely convenient with future operations that might have either more or less arguments than 2 as well as treating these arithmetic operators as plain functions. In our toy language we will not do this initially (that is, `+' and `-' are second order operators: they cannot be used as values). but since we will get to it later, we'll adopt the Racket solution and use a fully-parenthesized prefix notation: ::= | { + } | { - } (Remember that in a sense, Racket code is written in a form of already-parsed syntax...) ======================================================================== >>> Simple Parsing Implementing a "parser" Unrelated to what the syntax actually looks like, we want to parse it as soon as possible -- converting the concrete syntax to an abstract syntax tree. No matter how we write our syntax: - 3+4 (infix), - 3 4 + (postfix), - +(3,4) (prefix with args in parens), - (+ 3 4) (parenthesized prefix), we always mean the same abstract thing -- adding the number 3 and the number 4. The essence of this is basically a tree structure with an addition operation as the root and two leaves holding the two numerals. With the right data definition, we can describe this in Racket as the expression (Add (Num 3) (Num 4)) where `Add' and `Num' are constructors of a tree type for syntax, or in a C-like language, it could be something like Add(Num(3),Num(4)). Similarly, the expression (3-4)+7 will be described in Racket as the expression: (Add (Sub (Num 3) (Num 4)) (Num 7)) Important note: "expression" was used in two *different* ways in the above -- each way corresponds to a different language. To define the data type and the necessary constructors we will use this: (define-type AE [Num Number] [Add AE AE] [Sub AE AE]) * Note -- Racket follows the tradition of Lisp which makes syntax issues almost negligible -- the language we use is almost as if we are using the parse tree directly. Actually, it is a very simple syntax for parse trees, one that makes parsing extremely easy. [This has an interesting historical reason... Some Lisp history -- M-expressions vs. S-expressions, and the fact that we write code that is isomorphic to an AST. Later we will see some of the advantages that we get by doing this. See also "The Evolution of Lisp", section 3.5.1 (especially the last sentence).] To make things at a very simple level, we will use the above fact through a double-level approach: * we first "parse" our language into an intermediate representation -- a Racket list -- this is mostly done by a modified version of Racket's `read' that uses curly braces "{}"s instead of round parens "()"s, * then we write our own `parse' function that will parse the resulting list into an instance of the AE type -- an abstract syntax tree (AST). This is achieved by the following simple recursive function: (: parse-sexpr : Sexpr -> AE) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (cond [(number? sexpr) (Num sexpr)] [(and (list? sexpr) (= 3 (length sexpr))) (let ([make-node (match (first sexpr) ['+ Add] ['- Sub] [else (error 'parse-sexpr "don't know about ~s" (first sexpr))]) #| the above is the same as: (cond [(equal? '+ (first sexpr)) Add] [(equal? '- (first sexpr)) Sub] [else (error 'parse-sexpr "don't know about ~s" (first sexpr))]) |#]) (make-node (parse-sexpr (second sexpr)) (parse-sexpr (third sexpr))))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) This function is pretty simple, but as our languages grow, they will become more verbose and more difficult to write. So, instead, we use a new special form: `match', which is matching a value and binds new identifiers to different parts (try it with "Check Syntax"). Re-writing the above code using `match': (: parse-sexpr : Sexpr -> AE) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(list '+ left right) (Add (parse-sexpr left) (parse-sexpr right))] [(list '- left right) (Sub (parse-sexpr left) (parse-sexpr right))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) To make things less confusing, we will combine this with the function that parses a string into a sexpr so we can use strings to represent our programs: (: parse : String -> AE) ;; parses a string containing an AE expression to an AE (define (parse str) (parse-sexpr (string->sexpr str))) ======================================================================== >>> The `match' Form The syntax for `match' is (match value [pattern result-expr] ...) The value is matched against each pattern, possibly binding names in the process, and if a pattern matches it evaluates the result expression. The simplest form of a pattern is simply an identifier -- it always matches and binds that identifier to the value: (match (list 1 2 3) [x x]) ; evaluates to the list Another simple pattern is a quoted symbol, which matches that symbol. For example: (match foo ['x "yes"] [else "no"]) will evaluate to "yes" if `foo' is the symbol `x', and to "no" otherwise. Note that `else' is not a keyword here -- it happens to be a pattern that always succeeds, so it behaves like an else clause except that it binds `else' to the unmatched-so-far value. Many patterns look like function application -- but don't confuse them with applications. A `(list x y z)' pattern matches a list of exactly three items and binds the three identifiers; or if the "arguments" are themselves patterns, `match' will descend into the values and match them too. This means that the patterns can be nested: (match (list 1 2 3) [(list x y z) (+ x y z)]) ; evaluates to 6 (match '((1) (2) 3) [(list (list x) (list y) z) (+ x y z)]) ; evaluates to 6 There is also a `cons' pattern that matches a non-empty list and then matches the first part against the head for the list and the second part against the tail of the list. In a `list' pattern, you can use `...' to specify that the previous pattern is repeated zero or more times, and bound names get bound to the list of respective matching. One simple consequent is that the `(list hd tl ...)' pattern is exactly the same as `(cons hd tl)', but being able to repeat an arbitrary pattern is very useful: (match '((1 2) (3 4) (5 6) (7 8)) [(list (list x y) ...) (append x y)]) ; evaluates to (1 3 5 7 2 4 6 8) A few more useful patterns: id -- matches anything, binds `id' to it _ -- matches anything, but does not bind (number: n) -- matches any number and binds it to `n' (symbol: s) -- same for symbols (string: s) -- strings (sexpr: s) -- S-expressions (needed sometimes for Typed Racket) (and pat1 pat2) -- matches both patterns (or pat1 pat2) -- matches either pattern (careful with bindings) The patterns are tried one by one *in-order*, and if no pattern matches the value, an error is raised. Note that `...' in a `list' pattern can follow *any* pattern, including all of the above, and including nested list patterns. Here are a few examples -- you can try them out with "#lang pl untyped" at the top of the definitions window. This: (match x [(list (symbol: syms) ...) syms]) matches `x' against a pattern that accepts only a list of symbols, and binds `syms' to those symbols. And here's an example that matches a list of any number of lists, where each of the sub-lists begins with a symbol and then has any number of numbers. Note how the `n' and `s' bindings get values for a list of all symbols and a list of lists of the numbers: > (define (foo x) (match x [(list (list (symbol: s) (number: n) ...) ...) (list 'symbols: s 'numbers: n)])) > (foo (list (list 'x 1 2 3) (list 'y 4 5))) '(symbols: (x y) numbers: ((1 2 3) (4 5))) Here is a quick example for how `or' is used with two literal alternatives, how `and' is used to name a specific piece of data, and how `or' is used with a binding: > (define (foo x) (match x [(list (or 1 2 3)) 'single] [(list (and x (list 1 _)) 2) x] [(or (list 1 x) (list 2 x)) x])) > (foo (list 3)) 'single > (foo (list (list 1 99) 2)) '(1 99) > (foo (list 1 10)) 10 > (foo (list 2 10)) 10 ======================================================================== >>> Semantics (= Evaluation) [[[ PLAI Chapter 2 ]]] Back to BNF -- now, meaning. An important feature of these BNF specifications: we can use the derivations to specify *meaning* (and meaning in our context is "running" a program (or "interpreting", "compiling", but we will use "evaluating")). For example: ::= ; evaluates to the number | + ; evaluates to the sum of evaluating ; and | - ; ... the subtraction of from (... roughly!) To do this a little more formally: a. eval() = ; <-- special rule: moves syntax into a value b. eval( + ) = eval() + eval() c. eval( - ) = eval() - eval() Note the completely different roles of the two "+"s and "-"s. In fact, it might have been more correct to write: a. eval("") = b. eval(" + ") = eval("") + eval("") c. eval(" - ") = eval("") - eval("") or even using a marker to denote meta-holes in these strings: a. eval("$") = b. eval("$ + $") = eval("$") + eval("$") c. eval("$ - $") = eval("$") - eval("$") but we will avoid pretending that we're doing that kind of string manipulation. (For example, it will require specifying what does it mean to return for "$" (involves `string->number'), and the fragments on the right side mean that we need to specify these as substring operations.) Note that there's a similar kind of informality in our BNF specifications, where we assume that "" refers to some terminal or non-terminal. In texts where more formality is required (for example, in RFC specifications), each literal part of the BNF is usually marked with double quotes, so we'd get ::= | "+" | "-" An alternative popular notation for eval(X) is [[X]]: a. [[]] = b. [[ + ]] = [[]] + [[]] c. [[ - ]] = [[]] - [[]] Is there a problem with this definition? Ambiguity: eval(1 - 2 + 3) = ? Depending on the way the expression is parsed, we get either 2 or -4: eval(1 - 2 + 3) = eval(1 - 2) + eval(3) [b] = eval(1) - eval(2) + eval(3) [c] = 1 - 2 + 3 [a,a,a] = 2 eval(1 - 2 + 3) = eval(1) - eval(2 + 3) [c] = eval(1) - (eval(2) + eval(3)) [a] = 1 - (2 + 3) [a,a,a] = -4 Again, be very aware of confusing subtleties which are extremely important: We need parens around a sub-expression only in one side, why? -- When we write: eval(1 - 2 + 3) = ... = 1 - 2 + 3 we have two expressions, but one stands for an *input syntax*, and one stands for a `real' mathematical expression. In a case of a computer implementation, the syntax on the left is (as always) an AE syntax, and the `real' expression on the right is an expression in whatever language we use to implement our AE language. Like we said earlier, ambiguity is not a real problem until the actual parse tree matters. With `eval' it definitely matters, so we must not make it possible to derive any syntax in multiple ways or our evaluation will be non-deterministic. ======================================================================== Quick exercise: We can define a meaning for s and then s in a similar way: ::= | eval(0) = 0 eval(1) = 1 eval(2) = 2 ... eval(9) = 9 eval() = eval( ) = 10*eval() + eval() Is this exactly what we want? -- Depends on what we actually want... * First, there's a bug in this code -- having a BNF derivation like ::= | is unambiguous, but makes it hard to parse a number. We get: eval(123) = 10*eval(1) + eval(23) = 10*1 + 10*eval(2) + eval(3) = 10*1 + 10*2 + 3 = 33 Changing the order of the last rule works much better: ::= | and then: eval( ) = 10*eval() + eval() * As a concrete example see how you would make it work with "107", which demonstrates how the compositionality is important. * Example for free stuff that looks trivial: if we were to define the meaning of numbers this way, would it always work? Think an average language that does not give you bignums, making the above rules fail when the numbers are too big. In Racket, we happen to be using an integer representation for the syntax of integers, and both are unlimited. But what if we wanted to write a Racket compiler in C or a C compiler in Racket? What about a C compiler in C, where the compiler runs on a 64 bit machine, and the result needs to run on a 32 bit machine? ======================================================================== Side comment on compositionality The example of ::= | being a language that is easier to to write an evaluator for leads us to an important concept -- compositionality. This definition is easier to write an evaluator for, since the resulting language is compositional: the meaning of an expression -- for example `123' -- is composed out of the meaning of its two parts, which in this BNF are `12' and `3'. Specifically, the evaluation of ` ' is 10 * the evaluation of the first, plus the evaluation of the second. In the ` ' case this is more difficult -- the meaning of such a number depends not only on the *meaning* of the two parts, but also on the `' *syntax*: eval( ) = eval() * 10^length() + eval() This this case this can be tolerable, since the meaning of the expression is still made out of its parts -- but imperative programming (when you use side effects) is much more problematic since it is not compositional (at least not in the obvious sense). This is compared to functional programming, where the meaning of an expression is a combination of the meanings of its subexpressions. For example, every sub-expression in a functional program has some known meaning, and these all make up the meaning of the expression that contains them -- but in an imperative program we can have a part of the code be `x++' -- and that doesn't have a meaning by itself, at least not one that contributes to the meaning of the whole program in a direct way. (Actually, we can have a well-defined meaning for such an expression: the meaning is going from a world where `x' is a container of some value N, to a world where the same container has a different value N+1. You can probably see now how this can make things more complicated. On an intuitive level -- if we look at a random part of a functional program we can tell its meaning, so building up the meaning of the whole code is easy, but in an imperative program, the meaning of a random part is pretty much useless.) ======================================================================== >>> Implementing an Evaluator Now continue to implement the semantics of our syntax -- we express that through an `eval' function that evaluates an expression. We use a basic programming principle -- splitting the code into two layers, one for parsing the input, and one for doing the evaluation. Doing this avoids the mess we'd get into otherwise, for example: (define (eval sexpr) (match sexpr [(number: n) n] [(list '+ left right) (+ (eval left) (eval right))] [(list '- left right) (- (eval left) (eval right))] [else (error 'eval "bad syntax in ~s" sexpr)])) This is messy because it combines two very different things -- syntax and semantics -- into a single lump of code. For this particular kind of evaluator it looks simple enough, but this is only because it's simple enough that all we do is replace constructors by arithmetic operations. Later on things will get more complex, and bundling the evaluator with the parser will be more problematic. (Note: the fact that we can replace constructors with the run-time operators mean that we have a very simple, calculator-like language, and that we can, in face, "compile" all programs down to a number.) If we split the code, we can easily include decisions like making {+ 1 {- 3 "a"}} syntactically invalid. (Which is not, BTW, what Racket does...) (Also, this is like the distinction between XML syntax and well-formed XML syntax.) An additional advantage is that by using two separate components, it is simple to replace each one, making it possible to change the input syntax, and the semantics independently -- we only need to keep the same interface data (the AST) and things will work fine. Our `parse' function converts an input syntax to an abstract syntax tree (AST). It is abstract exactly because it is independent of any actual concrete syntax that you type in, print out etc. ======================================================================== >>> Implementing The AE Language Back to our `eval' -- this will be its (obvious) type: (: eval : AE -> Number) ;; consumes an AE and computes the corresponding number which leads to some obvious test cases: (equal? 3 (eval (parse "3"))) (equal? 7 (eval (parse "{+ 3 4}"))) (equal? 6 (eval (parse "{+ {- 3 4} 7}"))) which from now on we will write using the new `test' form that the `#lang pl' language provides: (test (eval (parse "3")) => 3) (test (eval (parse "{+ 3 4}")) => 7) (test (eval (parse "{+ {- 3 4} 7}")) => 6) Note that we're testing *only* at the interface level -- only running whole functions. For example, you could think about a test like: (test (parse "{+ {- 3 4} 7}") => (Add (Sub (Num 3) (Num 4)) (Num 7))) but the details of parsing and of the constructor names are things that nobody outside of our evaluator cares about -- so we're not testing them. In fact, we shouldn't even mention `parse' in these tests, since it is not part of the public interface of our users; they only care about using it as a compiler-like black box. (This is sometimes called "integration tests".) We'll address this shortly. Like everything else, the structure of the recursive `eval' code follows the recursive structure of its input. In HtDP terms, our template is: (: eval : AE -> Number) (define (eval expr) (cases expr [(Num n) ... n ...] [(Add l r) ... (eval l) ... (eval r) ...] [(Sub l r) ... (eval l) ... (eval r) ...])) In this case, filling in the gaps is very simple (: eval : AE -> Number) (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))])) We now further combine `eval' and `parse' into a single `run' function that evaluates an AE string. (: run : String -> Number) ;; evaluate an AE program contained in a string (define (run str) (eval (parse str))) This function becomes the single public entry point into our code, and the only thing that should be used in tests that verify our interface: (test (run "3") => 3) (test (run "{+ 3 4}") => 7) (test (run "{+ {- 3 4} 7}") => 6) The resulting *full* code is: ---<<>>----------------------------------------------------------- #lang pl #| BNF for the AE language: ::= | { + } | { - } | { * } | { / } |# ;; AE abstract syntax trees (define-type AE [Num Number] [Add AE AE] [Sub AE AE] [Mul AE AE] [Div AE AE]) (: parse-sexpr : Sexpr -> AE) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '* lhs rhs) (Mul (parse-sexpr lhs) (parse-sexpr rhs))] [(list '/ lhs rhs) (Div (parse-sexpr lhs) (parse-sexpr rhs))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) (: parse : String -> AE) ;; parses a string containing an AE expression to an AE AST (define (parse str) (parse-sexpr (string->sexpr str))) (: eval : AE -> Number) ;; consumes an AE and computes the corresponding number (define (eval expr) (cases expr [(Num n) n] [(Add l r) (+ (eval l) (eval r))] [(Sub l r) (- (eval l) (eval r))] [(Mul l r) (* (eval l) (eval r))] [(Div l r) (/ (eval l) (eval r))])) (: run : String -> Number) ;; evaluate an AE program contained in a string (define (run str) (eval (parse str))) ;; tests (test (run "3") => 3) (test (run "{+ 3 4}") => 7) (test (run "{+ {- 3 4} 7}") => 6) ---------------------------------------------------------------------- (Note that the tests are done with a `test' form, which we mentioned above.) For anyone who thinks that Racket is a bad choice, this is a good point to think how much code would be needed in some other language to do the same as above. ======================================================================== >>> Introduction to Typed Racket The plan: * Why Types? * Why Typed Racket? * What's Different about Typed Racket? * Some Examples of Typed Racket for Course Programs >> Types - Who has used a (statically) typed language? - Who has used a typed language that's not Java? Typed Racket will be both similar to and very different from anything you've seen before. >> Why types? - Types help structure programs. - Types provide enforced and mandatory documentation. - Types help catch errors. --> They *will* help you. A *lot*. >> Structuring programs - Data definitions ;; An AE is one of: ; \ ;; (make-Num Number) ; > HtDP ;; (make-Add AE AE) ; / (define-type AE ; \ [Num number?] ; > Predicates =~= contracts (PLAI) [Add AE? AE?]) ; / (actually, the PLAI form has names too) (define-type AE ; \ [Num Number] ; > Typed Racket (our PL) [Add AE AE]) ; / - Data-first The structure of your program is derived from the structure of your data. You have seen this in Fundamentals with the design recipe and with templates. In this class, we will see it extensively with type definitions and the (cases ...) form. Types make this pervasive -- we have to think about our data before our code. - A language for describing data Instead of having an informal language for describing types in contract lines, and a more formal description of predicates in a `define-type' form, we will have a single, unified language for both of these. Having such a language means that we get to be more precise and more expressive (since the typed language covers cases that you would otherwise dismiss with some hand waving, like "function"). >> Why Typed Racket? Racket is the language we all know, and it has the benefits that we discussed earlier. Mainly, it is an excellent language for experimenting with programming languages. - Typed Racket allows us to take our Racket programs and typecheck them, so we get the benefits of a statically typed language. - Types are an important programming language feature; Typed Racket will help us understand them. [Also: the development of Typed Racket is happening here in Northeastern, and will benefit from your feedback.] >> How is Typed Racket different from Racket - Typed Racket will reject your program if there are type errors! This means that it does that at compile-time, *before* any code gets to run. - Typed Racket files start like this: #lang typed/racket ;; Program goes here. but we will use a variant of the Typed Racket language, which has a few additional constructs: #lang pl ;; Program goes here. - Typed Racket requires you to write the contracts on your functions. Racket: ;; f : Number -> Number (define (f x) (* x (+ x 1))) Typed Racket: #lang pl (: f : Number -> Number) (define (f x) (* x (+ x 1))) [In Typed Racket you can also have the type annotations appear inside the definition: #lang pl (define: (f [x : Number]) : Number (* x (+ x 1))) but we will not use this form.] - As we've seen, Typed Racket uses types, not predicates, in `define-type'. (define-type AE [Num Number] [Add AE AE]) versus (define-type AE [Num number?] [Add AE? AE?]) - There are other differences, but these will suffice for now. >> Examples (: digit-num : Number -> (U Number String)) (define (digit-num n) (cond [(<= n 9) 1] [(<= n 99) 2] [(<= n 999) 3] [(<= n 9999) 4] [else "a lot"])) (: fact : Number -> Number) (define (fact n) (if (zero? n) 1 (* n (fact (- n 1))))) (: helper : Number Number -> Number) (define (helper n acc) (if (zero? n) acc (helper (- n 1) (* acc n)))) (: fact : Number -> Number) (define (fact n) (helper n 1)) (: fact : Number -> Number) (define (fact n) (: helper : Number Number -> Number) (define (helper n acc) (if (zero? n) acc (helper (- n 1) (* acc n)))) (helper n 1)) (: every? : (All (A) (A -> Boolean) (Listof A) -> Boolean)) ;; Returns false if any element of lst fails the given pred, true if ;; all pass pred. (define (every? pred lst) (or (null? lst) (and (pred (first lst)) (every? pred (rest lst))))) (define-type AE [Num Number] [Add AE AE] [Sub AE AE]) ;; the only difference in the following definition is ;; using (: : ) instead of ";; : " (: parse-sexpr : Sexpr -> AE) ;; to convert s-expressions into AEs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(list '+ left right) (Add (parse-sexpr left) (parse-sexpr right))] [(list '- left right) (Sub (parse-sexpr left) (parse-sexpr right))] [else (error 'parse-sexpr "bad syntax in ~s" sexpr)])) >> More interesting examples * Typed Racket is designed to be a language that is friendly to the kind of programs that people write in Racket. For example, it has unions: (: foo : (U String Number) -> Number) (define (foo x) (if (string? x) (string-length x) ;; at this point it knows that `x' is not a string, therefore it ;; must be a number (+ 1 x))) This is not common in statically typed languages, which are usually limited to only "disjoint unions". For example, in OCaml you'd write this definition: type string_or_number = Str of string | Int of int ;; let foo x = match x with Str s -> String.length s | Int i -> i+1 ;; And use it with an explicit constructor: foo (Str "bar") ;; foo (Int 3) ;; * Note that in the Typed Racket case, the language keeps track of information that is gathered via predicates -- which is why it knows that one `x' is a String, and the other is a Number. * Typed Racket has a concept of subtypes -- which is also something that most statically typed languages lack. In fact, the fact that it has (arbitrary) unions means that it must have subtypes too, since a type is always a subtype of a union that contains this type. * Another result of this feature is that there is an `Any' type that is the union of all other types. Note that you can always use this type since everything is in it -- but it gives you the *least* information about a value. * Another interesting thing to look at is the type of `error': it's a function that returns a type of `Nothing' -- a type that is the same as an *empty* union: (U). It's a type that has no values in it -- it fits `error' because it is a function that doesn't return any value, in fact, it doesn't return at all. In addition, it means that an `error' expression can be used anywhere you want because it is a subtype of anything at all. * An `else' clause in a `cond' expression is almost always needed, for example: (: digit-num : Number -> (U Number String)) (define (digit-num n) (cond [(<= n 9) 1] [(<= n 99) 2] [(<= n 999) 3] [(<= n 9999) 4] [(> n 9999) "a lot"])) (and if you think that the type checker should know what this is doing, then how about (> (* n 10) (/ (* (- 10000 1) 20) 2)) or (>= n 10000) for the last test?) * In some rare cases you will run into one limitation of Typed Racket: it is difficult (that is: a generic solution is not known at the moment) to do the right inference when polymorphic functions are passed around to higher-order functions. For example: (: call : (All (A B) (A -> B) A -> B)) (define (call f x) (f x)) (call rest (list 4)) In such cases, we can use `inst' to "instantiate" a function with a polymorphic type to a given type -- in this case, we can use it to make it treat `rest' as a function that is specific for numeric lists: (call (inst rest Number) (list 4)) In other rare cases, Typed Racket will infer a type that is not suitable for us -- there is another `ann' form that allows us to specify a certain type. Using this in the `call' example is more verbose: (call (ann rest : ((Listof Number) -> (Listof Number))) (list 4)) However, these are going to be rare and will be mentioned explicitly whenever they're needed. ========================================================================