2013-04-05 - Implementing Picky (contd.) - Typing Recursion - Typing Data - Type soundness ======================================================================== One thing that is very obvious when you look at the examples is that this language is way too verbose to be practical -- types are repeated over and over again. If you look carefully at the typechecking fragments for the two relevant expressions -- `fun' and `with' -- you can see that we can actually get rid of almost all of the type annotations. The following version does that, there are no types mentioned except for the input type for a function. Note that we can do that at this point because our language is so simple that many pieces of code have a specific type. (For example, if we add polymorphism things get more complicated.) ---<<>>------------------------------------------------------- ;; The Picky interpreter, almost no explicit types #lang pl #| The grammar: ::= | | { + } | { - } | { = } | { < } | { fun { : } } | { call } | { with { } } | { if } ::= Num | Number | Bool | Boolean | { -> } Evaluation rules: eval(N,env) = N eval(x,env) = lookup(x,env) eval({+ E1 E2},env) = eval(E1,env) + eval(E2,env) eval({- E1 E2},env) = eval(E1,env) - eval(E2,env) eval({= E1 E2},env) = eval(E1,env) = eval(E2,env) eval({< E1 E2},env) = eval(E1,env) < eval(E2,env) eval({fun {x} E},env) = <{fun {x} E}, env> eval({call E1 E2},env1) = eval(Ef,extend(x,eval(E2,env1),env2)) if eval(E1,env1) = <{fun {x} Ef}, env2> = error! otherwise -- but this doesn't happen eval({with {x E1} E2},env) = eval(E2,extend(x,eval(E1,env),env)) eval({if E1 E2 E3},env) = eval(E2,env) if eval(E1,env) is true = eval(E3,env) otherwise Type checking rules (note how implicit types are made): Γ ⊢ n : Number Γ ⊢ x : Γ(x) Γ ⊢ a : Number Γ ⊢ b : Number ——————————————————————————————— Γ ⊢ {+ a b} : Number Γ ⊢ a : Number Γ ⊢ b : Number ——————————————————————————————— Γ ⊢ {< a b} : Boolean Γ[x:=τ₁] ⊢ E : τ₂ ————————————————————————————————— Γ ⊢ {fun {x : τ₁} E} : (τ₁ -> τ₂) Γ ⊢ F : (τ₁ -> τ₂) Γ ⊢ V : τ₁ —————————————————————————————— Γ ⊢ {call F V} : τ₂ Γ ⊢ V : τ₁ Γ[x:=τ₁] ⊢ E : τ₂ —————————————————————————————— Γ ⊢ {with {x V} E} : τ₂ Γ ⊢ C : Boolean Γ ⊢ T : τ Γ ⊢ E : τ ——————————————————————————————————————— Γ ⊢ {if C T E} : τ |# (define-type PICKY [Num Number] [Id Symbol] [Add PICKY PICKY] [Sub PICKY PICKY] [Equal PICKY PICKY] [Less PICKY PICKY] [Fun Symbol TYPE PICKY] ; no output type [Call PICKY PICKY] [With Symbol PICKY PICKY] ; no types here [If PICKY PICKY PICKY]) (define-type TYPE [NumT] [BoolT] [FunT TYPE TYPE]) (: parse-sexpr : Sexpr -> PICKY) ;; to convert s-expressions into PICKYs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '= lhs rhs) (Equal (parse-sexpr lhs) (parse-sexpr rhs))] [(list '< lhs rhs) (Less (parse-sexpr lhs) (parse-sexpr rhs))] [(list 'call fun arg) (Call (parse-sexpr fun) (parse-sexpr arg))] [(list 'if c t e) (If (parse-sexpr c) (parse-sexpr t) (parse-sexpr e))] [(cons 'fun more) (match sexpr [(list 'fun (list (symbol: name) ': itype) body) (Fun name (parse-type-sexpr itype) (parse-sexpr body))] [else (error 'parse-sexpr "bad `fun' syntax in ~s" sexpr)])] [(cons 'with more) (match sexpr [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [else (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)])] [else (error 'parse-sexpr "bad expression syntax in ~s" sexpr)])) (: parse-type-sexpr : Sexpr -> TYPE) ;; to convert s-expressions into TYPEs (define (parse-type-sexpr sexpr) (match sexpr ['Number (NumT)] ['Boolean (BoolT)] ;; allow shorter names too ['Num (NumT)] ['Bool (BoolT)] [(list itype '-> otype) (FunT (parse-type-sexpr itype) (parse-type-sexpr otype))] [else (error 'parse-type-sexpr "bad type syntax in ~s" sexpr)])) (: parse : String -> PICKY) ;; parses a string containing a PICKY expression to a PICKY AST (define (parse str) (parse-sexpr (string->sexpr str))) ;; Typechecker and related types and helpers ;; this is similar to ENV, but it holds type information for the ;; identifiers during typechecking (define-type TYPEENV [EmptyTypeEnv] [ExtendTypeEnv Symbol TYPE TYPEENV]) (: type-lookup : Symbol TYPEENV -> TYPE) ;; similar to `lookup' for type environments; note that the error is ;; phrased as a typecheck error, since this indicates a failure at the ;; type checking stage (define (type-lookup name typeenv) (cases typeenv [(EmptyTypeEnv) (error 'typecheck "no binding for ~s" name)] [(ExtendTypeEnv id type rest-env) (if (eq? id name) type (type-lookup name rest-env))])) (: typecheck : PICKY TYPE TYPEENV -> Void) ;; Checks that the given expression has the specified type. Used only ;; for side-effects (to throw a type error), so return a void value. (define (typecheck expr type type-env) (unless (equal? type (typecheck* expr type-env)) (error 'typecheck "type error for ~s: expecting a ~s" expr type))) (: typecheck* : PICKY TYPEENV -> TYPE) ;; Returns the type of the given expression (which also means that it ;; checks it). This is a helper for the real typechecker that also ;; checks a specific return type. (define (typecheck* expr type-env) (: two-nums : PICKY PICKY -> Void) (define (two-nums e1 e2) (typecheck e1 (NumT) type-env) (typecheck e2 (NumT) type-env)) (cases expr [(Num n) (NumT)] [(Id name) (type-lookup name type-env)] [(Add l r) (two-nums l r) (NumT)] [(Sub l r) (two-nums l r) (NumT)] [(Equal l r) (two-nums l r) (BoolT)] [(Less l r) (two-nums l r) (BoolT)] [(Fun bound-id in-type bound-body) (FunT in-type (typecheck* bound-body (ExtendTypeEnv bound-id in-type type-env)))] [(Call fun arg) (cases (typecheck* fun type-env) [(FunT in-type out-type) (typecheck arg in-type type-env) out-type] [else (error 'typecheck "type error for ~s: expecting a function" expr)])] [(With bound-id named-expr bound-body) (typecheck* bound-body (ExtendTypeEnv bound-id (typecheck* named-expr type-env) type-env))] [(If cond-expr then-expr else-expr) (typecheck cond-expr (BoolT) type-env) (let ([type (typecheck* then-expr type-env)]) (typecheck else-expr type type-env) ; enforce same type type)])) ;; Evaluator and related types and helpers (define-type ENV [EmptyEnv] [Extend Symbol VAL ENV]) (define-type VAL [NumV Number] [BoolV Boolean] [FunV Symbol PICKY ENV]) (: lookup : Symbol ENV -> VAL) (define (lookup name env) (cases env [(EmptyEnv) (error 'lookup "no binding for ~s" name)] [(Extend id val rest-env) (if (eq? id name) val (lookup name rest-env))])) (: strip-numv : Symbol VAL -> Number) ;; converts a VAL to a Racket number if possible, throws an error if ;; not using the given name for the error message (define (strip-numv name val) (cases val [(NumV n) n] [else (error name "expects a number, got: ~s" val)])) (: arith-op : (Number Number -> Number) VAL VAL -> VAL) ;; gets a Racket numeric binary operator, and uses it within a NumV ;; wrapper (define (arith-op op val1 val2) (NumV (op (strip-numv 'arith-op val1) (strip-numv 'arith-op val2)))) (: bool-op : (Number Number -> Boolean) VAL VAL -> VAL) ;; gets a Racket numeric binary predicate, and uses it within a BoolV ;; wrapper (define (bool-op op val1 val2) (BoolV (op (strip-numv 'bool-op val1) (strip-numv 'bool-op val2)))) (: eval : PICKY ENV -> VAL) ;; evaluates PICKY expressions by reducing them to values (define (eval expr env) (cases expr [(Num n) (NumV n)] [(Id name) (lookup name env)] [(Add l r) (arith-op + (eval l env) (eval r env))] [(Sub l r) (arith-op - (eval l env) (eval r env))] [(Equal l r) (bool-op = (eval l env) (eval r env))] [(Less l r) (bool-op < (eval l env) (eval r env))] [(Fun bound-id in-type bound-body) ;; note that types are not used at runtime, so they're not stored ;; in the closure (FunV bound-id bound-body env)] [(Call fun-expr arg-expr) (let ([fval (eval fun-expr env)]) (cases fval [(FunV bound-id bound-body f-env) (eval bound-body (Extend bound-id (eval arg-expr env) f-env))] ;; `cases' requires complete coverage of all variants, but ;; this `else' is never used since we typecheck programs [else (error 'eval "`call' expects a function, got: ~s" fval)]))] [(With bound-id named-expr bound-body) (eval bound-body (Extend bound-id (eval named-expr env) env))] [(If cond-expr then-expr else-expr) (let ([bval (eval cond-expr env)]) (if (cases bval [(BoolV b) b] ;; same as above: this case is never reached [else (error 'eval "`if' expects a boolean, got: ~s" bval)]) (eval then-expr env) (eval else-expr env)))])) (: run : String -> Number) ;; evaluate a PICKY program contained in a string (define (run str) (let ([prog (parse str)]) (typecheck prog (NumT) (EmptyTypeEnv)) (let ([result (eval prog (EmptyEnv))]) (cases result [(NumV n) n] ;; this is another error that is never reached, since we make ;; sure that the program always evaluates to a number above [else (error 'run "evaluation returned a non-number: ~s" result)])))) ;; tests -- including translations of the FLANG tests (test (run "5") => 5) (test (run "{fun {x : Num} {+ x 1}}") =error> "type error") (test (run "{call {fun {x : Num} {+ x 1}} 4}") => 5) (test (run "{with {x 3} {+ x 1}}") => 4) (test (run "{with {identity {fun {x : Num} x}} {call identity 1}}") => 1) (test (run "{with {add3 {fun {x : Num} {+ x 3}}} {call add3 1}}") => 4) (test (run "{with {add3 {fun {x : Num} {+ x 3}}} {with {add1 {fun {x : Num} {+ x 1}}} {with {x 3} {call add1 {call add3 x}}}}}") => 7) (test (run "{with {identity {fun {x : {Num -> Num}} x}} {with {foo {fun {x : Num} {+ x 1}}} {call {call identity foo} 123}}}") => 124) (test (run "{with {x 3} {with {f {fun {y : Num} {+ x y}}} {with {x 5} {call f 4}}}}") => 7) (test (run "{call {with {x 3} {fun {y : Num} {+ x y}}} 4}") => 7) (test (run "{call {call {fun {x : {Num -> {Num -> Num}}} {call x 1}} {fun {x : Num} {fun {y : Num} {+ x y}}}} 123}") => 124) (test (run "{call {fun {x : Num} {if {< x 2} {+ x 10} {+ x 20}}} 1}") => 11) (test (run "{call {fun {x : Num} {if {< x 2} {+ x 10} {+ x 20}}} 2}") => 22) ---------------------------------------------------------------------- Finally, an obvious question is whether we can get rid of *all* of the type declarations. The main point here is that we need to somehow be able to typecheck expressions and assign "temporary types" to them that will later on change -- for example, when we typecheck this: {with {identity {fun {x} x}} {call identity 1}} we need to somehow decide that the named expression has a general function type, with no commitment on the actual input and output types -- and then change them after we typecheck the body. (We could try to resolve that somehow by typechecking the body first, but that will not work, since the body must be checked with *some* type assigned to the identifier, or it will fail.) This can be done using "type variables" -- things that contain boxes that can be used to change types as typecheck progresses. The following version does that. (Also, it gets rid of the `typecheck*' thing, since it can be achieved by using a type-variable and a call to `typecheck'.) Note the interesting tests at the end. ---<<>>------------------------------------------------------- ;; The Picky interpreter, no explicit types #lang pl #| The grammar: ::= | | { + } | { - } | { = } | { < } | { fun { } } | { call } | { with { } } | { if } The types are no longer part of the input syntax. Evaluation rules: eval(N,env) = N eval(x,env) = lookup(x,env) eval({+ E1 E2},env) = eval(E1,env) + eval(E2,env) eval({- E1 E2},env) = eval(E1,env) - eval(E2,env) eval({= E1 E2},env) = eval(E1,env) = eval(E2,env) eval({< E1 E2},env) = eval(E1,env) < eval(E2,env) eval({fun {x} E},env) = <{fun {x} E}, env> eval({call E1 E2},env1) = eval(Ef,extend(x,eval(E2,env1),env2)) if eval(E1,env1) = <{fun {x} Ef}, env2> = error! otherwise -- but this doesn't happen eval({with {x E1} E2},env) = eval(E2,extend(x,eval(E1,env),env)) eval({if E1 E2 E3},env) = eval(E2,env) if eval(E1,env) is true = eval(E3,env) otherwise Type checking rules (note the ambiguity of the `fun' rule): Γ ⊢ n : Number Γ ⊢ x : Γ(x) Γ ⊢ a : Number Γ ⊢ b : Number ——————————————————————————————— Γ ⊢ {+ a b} : Number Γ ⊢ a : Number Γ ⊢ b : Number ——————————————————————————————— Γ ⊢ {< a b} : Boolean Γ[x:=τ₁] ⊢ E : τ₂ —————————————————————————————————————— Γ ⊢ {fun {x} E} : (τ₁ -> τ₂) Γ ⊢ F : (τ₁ -> τ₂) Γ ⊢ V : τ₁ —————————————————————————————— Γ ⊢ {call F V} : τ₂ Γ ⊢ C : Boolean Γ ⊢ T : τ Γ ⊢ E : τ ——————————————————————————————————————— Γ ⊢ {if C T E} : τ Γ ⊢ V : τ₁ Γ[x:=τ₁] ⊢ E : τ₂ —————————————————————————————— Γ ⊢ {with {x V} E} : τ₂ |# (define-type PICKY [Num Number] [Id Symbol] [Add PICKY PICKY] [Sub PICKY PICKY] [Equal PICKY PICKY] [Less PICKY PICKY] [Fun Symbol PICKY] ; no types even here [Call PICKY PICKY] [With Symbol PICKY PICKY] [If PICKY PICKY PICKY]) (: parse-sexpr : Sexpr -> PICKY) ;; to convert s-expressions into PICKYs (define (parse-sexpr sexpr) (match sexpr [(number: n) (Num n)] [(symbol: name) (Id name)] [(list '+ lhs rhs) (Add (parse-sexpr lhs) (parse-sexpr rhs))] [(list '- lhs rhs) (Sub (parse-sexpr lhs) (parse-sexpr rhs))] [(list '= lhs rhs) (Equal (parse-sexpr lhs) (parse-sexpr rhs))] [(list '< lhs rhs) (Less (parse-sexpr lhs) (parse-sexpr rhs))] [(list 'call fun arg) (Call (parse-sexpr fun) (parse-sexpr arg))] [(list 'if c t e) (If (parse-sexpr c) (parse-sexpr t) (parse-sexpr e))] [(cons 'fun more) (match sexpr [(list 'fun (list (symbol: name)) body) (Fun name (parse-sexpr body))] [else (error 'parse-sexpr "bad `fun' syntax in ~s" sexpr)])] [(cons 'with more) (match sexpr [(list 'with (list (symbol: name) named) body) (With name (parse-sexpr named) (parse-sexpr body))] [else (error 'parse-sexpr "bad `with' syntax in ~s" sexpr)])] [else (error 'parse-sexpr "bad expression syntax in ~s" sexpr)])) (: parse : String -> PICKY) ;; parses a string containing a PICKY expression to a PICKY AST (define (parse str) (parse-sexpr (string->sexpr str))) ;; Typechecker and related types and helpers ;; this is not a part of the AST now, and it also has a new variant ;; for type variables (see `same-type' for how it's used) (define-type TYPE [NumT] [BoolT] [FunT TYPE TYPE] [?T (Boxof (U TYPE #f))]) ;; this is similar to ENV, but it holds type information for the ;; identifiers during typechecking (define-type TYPEENV [EmptyTypeEnv] [ExtendTypeEnv Symbol TYPE TYPEENV]) (: type-lookup : Symbol TYPEENV -> TYPE) ;; similar to `lookup' for type environments; note that the error is ;; phrased as a typecheck error, since this indicates a failure at the ;; type checking stage (define (type-lookup name typeenv) (cases typeenv [(EmptyTypeEnv) (error 'typecheck "no binding for ~s" name)] [(ExtendTypeEnv id type rest-env) (if (eq? id name) type (type-lookup name rest-env))])) (: typecheck : PICKY TYPE TYPEENV -> Void) ;; Checks that the given expression has the specified type. Used only ;; for side-effects, so return a void value. There are two ;; side-effects that it can do: throw an error if the input expression ;; doesn't typecheck, and type variables can be mutated once their ;; values are known -- this is done by the `types=' utility function ;; that follows. (define (typecheck expr type type-env) ;; convenient helpers (: type= : TYPE -> Void) (define (type= type2) (types= type type2 expr)) (: two-nums : PICKY PICKY -> Void) (define (two-nums e1 e2) (typecheck e1 (NumT) type-env) (typecheck e2 (NumT) type-env)) (cases expr [(Num n) (type= (NumT))] [(Id name) (type= (type-lookup name type-env))] [(Add l r) (two-nums l r) (type= (NumT))] [(Sub l r) (two-nums l r) (type= (NumT))] [(Equal l r) (two-nums l r) (type= (BoolT))] [(Less l r) (two-nums l r) (type= (BoolT))] [(Fun bound-id bound-body) (let (;; the identity of these type variables is important! [itype (?T (box #f))] [otype (?T (box #f))]) (type= (FunT itype otype)) (typecheck bound-body otype (ExtendTypeEnv bound-id itype type-env)))] [(Call fun arg) (let ([type2 (?T (box #f))]) ; same here (typecheck arg type2 type-env) (typecheck fun (FunT type2 type) type-env))] [(With bound-id named-expr bound-body) (let ([type2 (?T (box #f))]) ; and here (typecheck named-expr type2 type-env) (typecheck bound-body type (ExtendTypeEnv bound-id type2 type-env)))] [(If cond-expr then-expr else-expr) (typecheck cond-expr (BoolT) type-env) (typecheck then-expr type type-env) (typecheck else-expr type type-env)])) (: types= : TYPE TYPE PICKY -> Void) ;; Compares the two input types, and throw an error if they don't ;; match. This function is the core of `typecheck', and it is used ;; only for its side-effect. Another side effect in addition to ;; throwing an error is when type variables are present -- they will ;; be mutated in an attempt to make the typecheck succeed. Note that ;; the two type arguments are not symmetric: the first type is the ;; expected one, and the second is the one that the code implies -- ;; but this matters only for the error messages. Also, the expression ;; input is used only for these errors. As the code clearly shows, ;; the main work is done by `same-type' below. (define (types= type1 type2 expr) (unless (same-type type1 type2) (error 'typecheck "type error for ~s: expecting ~a, got ~a" expr (type->string type1) (type->string type2)))) (: type->string : TYPE -> String) ;; Convert a TYPE to a human readable string, used for error messages (define (type->string type) (format "~s" type) ;; The code below would be useful, but unfortunately it doesn't work ;; in some cases. To see the problem, try to run the example below ;; that applies identity on itself. It's left here so you can try ;; it out when you're not running into this problem. #| (cases type [(NumT) "Num"] [(BoolT) "Bool"] [(FunT i o) (string-append (type->string i) " -> " (type->string o))] [(?T box) (let ([t (unbox box)]) (if t (type->string t) "?"))]) |#) ;; Convenience type to make it possible to have a single `cases' ;; dispatch on two types instead of nesting `cases' in each branch (define-type 2TYPES [PairT TYPE TYPE]) (: same-type : TYPE TYPE -> Boolean) ;; Compares the two input types, return true or false whether they're ;; the same. The process might involve mutating type variables. (define (same-type type1 type2) ;; the `PairT' type is only used to conveniently match on both types ;; in a single `cases', it's not used in any other way (cases (PairT type1 type2) ;; flatten the first type, or set it to the second if it's unset [(PairT (?T box) type2) (let ([t1 (unbox box)]) (if t1 (same-type t1 type2) (begin (set-box! box type2) #t)))] ;; do the same for the second (reuse the above case) [(PairT type1 (?T box)) (same-type type2 type1)] ;; the rest are obvious [(PairT (NumT) (NumT)) #t] [(PairT (BoolT) (BoolT)) #t] [(PairT (FunT i1 o1) (FunT i2 o2)) (and (same-type i1 i2) (same-type o1 o2))] [else #f])) ;; Evaluator and related types and helpers (define-type ENV [EmptyEnv] [Extend Symbol VAL ENV]) (define-type VAL [NumV Number] [BoolV Boolean] [FunV Symbol PICKY ENV]) (: lookup : Symbol ENV -> VAL) (define (lookup name env) (cases env [(EmptyEnv) (error 'lookup "no binding for ~s" name)] [(Extend id val rest-env) (if (eq? id name) val (lookup name rest-env))])) (: strip-numv : Symbol VAL -> Number) ;; converts a VAL to a Racket number if possible, throws an error if ;; not using the given name for the error message (define (strip-numv name val) (cases val [(NumV n) n] [else (error name "expects a number, got: ~s" val)])) (: arith-op : (Number Number -> Number) VAL VAL -> VAL) ;; gets a Racket numeric binary operator, and uses it within a NumV ;; wrapper (define (arith-op op val1 val2) (NumV (op (strip-numv 'arith-op val1) (strip-numv 'arith-op val2)))) (: bool-op : (Number Number -> Boolean) VAL VAL -> VAL) ;; gets a Racket numeric binary predicate, and uses it within a BoolV ;; wrapper (define (bool-op op val1 val2) (BoolV (op (strip-numv 'bool-op val1) (strip-numv 'bool-op val2)))) (: eval : PICKY ENV -> VAL) ;; evaluates PICKY expressions by reducing them to values (define (eval expr env) (cases expr [(Num n) (NumV n)] [(Id name) (lookup name env)] [(Add l r) (arith-op + (eval l env) (eval r env))] [(Sub l r) (arith-op - (eval l env) (eval r env))] [(Equal l r) (bool-op = (eval l env) (eval r env))] [(Less l r) (bool-op < (eval l env) (eval r env))] [(Fun bound-id bound-body) (FunV bound-id bound-body env)] [(Call fun-expr arg-expr) (let ([fval (eval fun-expr env)]) (cases fval [(FunV bound-id bound-body f-env) (eval bound-body (Extend bound-id (eval arg-expr env) f-env))] ;; `cases' requires complete coverage of all variants, but ;; this `else' is never used since we typecheck programs [else (error 'eval "`call' expects a function, got: ~s" fval)]))] [(With bound-id named-expr bound-body) (eval bound-body (Extend bound-id (eval named-expr env) env))] [(If cond-expr then-expr else-expr) (let ([bval (eval cond-expr env)]) (if (cases bval [(BoolV b) b] ;; same as above: this case is never reached [else (error 'eval "`if' expects a boolean, got: ~s" bval)]) (eval then-expr env) (eval else-expr env)))])) (: run : String -> Number) ;; evaluate a PICKY program contained in a string (define (run str) (let ([prog (parse str)]) (typecheck prog (NumT) (EmptyTypeEnv)) (let ([result (eval prog (EmptyEnv))]) (cases result [(NumV n) n] ;; this is another error that is never reached, since we make ;; sure that the program always evaluates to a number above [else (error 'run "evaluation returned a non-number: ~s" result)])))) ;; tests -- including translations of the FLANG tests (test (run "5") => 5) (test (run "{fun {x} {+ x 1}}") =error> "type error") (test (run "{call {fun {x} {+ x 1}} 4}") => 5) (test (run "{with {x 3} {+ x 1}}") => 4) (test (run "{with {identity {fun {x} x}} {call identity 1}}") => 1) (test (run "{with {add3 {fun {x} {+ x 3}}} {call add3 1}}") => 4) (test (run "{with {add3 {fun {x} {+ x 3}}} {with {add1 {fun {x} {+ x 1}}} {with {x 3} {call add1 {call add3 x}}}}}") => 7) (test (run "{with {identity {fun {x} x}} {with {foo {fun {x} {+ x 1}}} {call {call identity foo} 123}}}") => 124) (test (run "{with {x 3} {with {f {fun {y} {+ x y}}} {with {x 5} {call f 4}}}}") => 7) (test (run "{call {with {x 3} {fun {y} {+ x y}}} 4}") => 7) (test (run "{call {call {fun {x} {call x 1}} {fun {x} {fun {y} {+ x y}}}} 123}") => 124) (test (run "{call {fun {x} {if {< x 2} {+ x 10} {+ x 20}}} 1}") => 11) (test (run "{call {fun {x} {if {< x 2} {+ x 10} {+ x 20}}} 2}") => 22) ;; Note that we still have a language with the same type system, even ;; though it looks like it could be more flexible -- for example, the ;; following two examples work: (test (run "{with {identity {fun {x} x}} {call identity 1}}") => 1) (test (run "{with {identity {fun {x} x}} {if {call identity {< 1 2}} 1 2}}") => 1) ;; but this doesn't, since identity can not be used with different ;; types: (test (run "{with {identity {fun {x} x}} {+ {call identity 1} {if {call identity {< 1 2}} 1 2}}}") =error> "type error") ;; and this doesn't work either -- with an interesting error message: (test (run "{with {identity {fun {x} x}} {call {call identity identity} 1}}") =error> "type error") ;; ... but these two work fine: (test (run "{with {identity1 {fun {x} x}} {with {identity2 {fun {x} x}} {+ {call identity1 1} {if {call identity2 {< 1 2}} 1 2}}}}") => 2) (test (run "{with {identity1 {fun {x} x}} {with {identity2 {fun {x} x}} {call {call identity1 identity2} 1}}}") => 1) ;; Here's another interesting thing to try out: ;; (define t (?T (box #f))) ;; (typecheck (parse "{fun {x} x}") t (EmptyTypeEnv)) ;; t ;; And another: ;; (define t (?T (box #f))) ;; (define expr "{call {fun {x} {call x x}} {fun {x} {call x x}}}") ;; (typecheck (parse expr) t (EmptyTypeEnv)) ---------------------------------------------------------------------- ======================================================================== >>> Typing Recursion We already know that without recursion life can be very boring... So we obviously want to be able to have recursive functions -- but the question is how will they interact with our type system. One thing that we have seen is that by just having functions we get recursion. This was achieved by the Y combinator function. It seems like the same should apply to our simple typed language. The core of the Y combinator was using an expression similar to Omega that generates the infinite loop that is needed. In our language: {call {fun {x} {call x x}} {fun {x} {call x x}}} This expression was impossible to evaluate completely since it never terminates, but it served as a basis for the Y combinator so we need to be able to perform this kind of infinite loop. Now, consider the type of the first `x' -- it's used in a `call' expression as a function, so its type must be a function type, say τ₁->τ₂. In addition, its argument is `x' itself so its type is also τ₁ -- this means that we have: τ₁ -> τ₂ = τ₁ and from this we get: => τ₁ = τ₁ -> τ₂ = (τ₁ -> τ₂) -> τ₂ = ((τ₁ -> τ₂) -> τ₂) -> τ₂ = ... And this is a type that does not exist in our type system, since we can only have finite types. Therefore, we have a proof by contradiction that this expression cannot be typed in our system. This is closely related to the fact that the typed language we have described so far is "strongly normalizing": no matter what program you write, it will always terminate! To see this, very informally, consider this language without functions -- this is clearly a language where all programs terminate, since the only way to create a loop is through function applications. Now add functions and function application -- in the typing rules for the resulting language, each `fun' creates a function type (creates an arrow), and each function application consumes a function type (deletes one arrow) -- since types are finite, the number of arrows is finite, which means that the number of possible applications is finite, so all programs must run in finite time. [Note that when we discussed how to type the Y combinator we needed to use a `Rec' constructor -- something that the current type system has. Using that, we could have easily solve the `τ₁ = τ₁ -> τ₂' equation with (Rec τ₁ τ₁ -> τ₂).] In the our language, therefore, the "halting problem" doesn't even exist, since all programs (that are properly typed) are guaranteed to halt. This property is useful in many real-life situations (consider firewall rules, configuration files, devices with embedded code). But the language that we get is very limited as a result -- we really want the power to shoot our feet... ======================================================================== >> Extending Picky with recursion As we have seen, our language is strongly normalizing, which means that to get general recursion, we must introduce a new construct (unlike previously, when we didn't really need one). We can do this as we previously did -- by adding a new construct to the language, or we can somehow extend the (sub) language of type descriptions to allow a new kind of type that can be used to solve the `τ₁ = τ₁ -> τ₂' equation. An example of this solution would be similar to the `Rec' type constructor in Typed Racket: a new type constructor that allows a type to refer to itself -- and using (Rec τ₁ τ₁ -> τ₂) as the solution. However, this will make things a more complicated: type descriptions are no longer unique, since we have Num, (Rec this Num), and (Rec this (Rec that Num)) that are all equal. So for simplicity we will now take the first route and add `rec' -- an explicit recursive binder form to the language (as with `with', we're going back to `rec' rather than `bindrec' to keep things simple). First, the new BNF: ::= | | { + } | { < } | { fun { : } : } | { call } | { with { : } } | { if } | { rec { : } } ::= Number | Boolean | ( -> ) We now need to add a typing judgment for `rec' expressions. What should it look like? ??? ——————————————————————————— Γ ⊢ {rec {x : τ₁ V} E} : τ₂ `rec' is similar to all the other local binding forms, like `with', it can be seen as a combination of a function and an application. So we need to check the two things that those rules checked -- first, check that the body expression has the right type assuming that the type annotation given to `x' is valid: Γ[x:=τ₁] ⊢ E : τ₂ ??? ——————————————————————————— Γ ⊢ {rec {x : τ₁ V} E} : τ₂ Now, we also want to add the other side -- making sure that the τ₁ type annotation is valid: Γ[x:=τ₁] ⊢ E : τ₂ Γ ⊢ V : τ₁ —————————————————————————————— Γ ⊢ {rec {x : τ₁ V} E} : τ₂ But that will not be possible in general -- `V' is an expression that can include `x' itself -- that's the whole point. The conclusion is that we should use a similar trick to the one that we used to specify evaluation of recursive binders -- the same environment is used for both the named expression and for the body expression: Γ[x:=τ₁] ⊢ E : τ₂ Γ[x:=τ₁] ⊢ V : τ₁ ————————————————————————————————————— Γ ⊢ {rec {x : τ₁ V} E} : τ₂ You can also see now that this rule adds an arrow type to the Γ type environment, in a way that makes it possible to use it over and over, making it possible to run infinite loops in this language. Our complete language specification is below. ---------------------------------------------------------------------- ::= | | { + } | { < } | { fun { : } : } | { call } | { with { : } } | { rec { : } } | { if } ::= Number | Boolean | ( -> ) Γ ⊢ n : Number Γ ⊢ x : Γ(x) Γ ⊢ a : Number Γ ⊢ b : Number ——————————————————————————————— Γ ⊢ {+ a b} : Number Γ ⊢ a : Number Γ ⊢ b : Number ——————————————————————————————— Γ ⊢ {< a b} : Boolean Γ[x:=τ₁] ⊢ E : τ₂ —————————————————————————————————————— Γ ⊢ {fun {x : τ₁} : τ₂ E} : (τ₁ -> τ₂) Γ ⊢ F : (τ₁ -> τ₂) Γ ⊢ V : τ₁ —————————————————————————————— Γ ⊢ {call F V} : τ₂ Γ ⊢ C : Boolean Γ ⊢ T : τ Γ ⊢ E : τ ——————————————————————————————————————— Γ ⊢ {if C T E} : τ Γ ⊢ V : τ₁ Γ[x:=τ₁] ⊢ E : τ₂ —————————————————————————————— Γ ⊢ {with {x : τ₁ V} E} : τ₂ Γ[x:=τ₁] ⊢ V : τ₁ Γ[x:=τ₁] ⊢ E : τ₂ ————————————————————————————————————— Γ ⊢ {rec {x : τ₁ V} E} : τ₂ ---------------------------------------------------------------------- ======================================================================== >>> Typing Data [[[ PLAI Chapter 27 ]]] An important concept that we have avoided so far is user-defined types. This issue exists in practically all languages, including the ones we did so far, since a language without the ability to create new user-defined types is a language with a major problem. (As a side note, we did talk about mimicking an object system using plain closures, but it turns out that this is insufficient as a replacement for true user-defined types -- you can kind of see that in the Schlac language, where the lack of all types mean that there is no type error.) In the context of a statically typed language, this issue is even more important. Specifically, we talked about typing recursive code, but we should also consider typing recursive data. For example, we will start with a `length' function in an extension of the language that has `empty?', `rest', and `NumCons' and `NumEmpty' constructors: {rec {length : ??? {fun {l : ???} : Number {if {empty? l} 0 {+ 1 {call length {rest l}}}}}} {call length {NumCons 1 {NumCons 2 {NumCons 3 {NumEmpty}}}}}} But adding all of these new functions as built-ins is getting messy: we want our language to have a form for defining new kinds of data. In this example -- we want to be able to define the `NumList' type for lists of numbers. We therefore extend the language with a new `with-type' form for creating new user-defined types, using variants in a similar way to our own course language: {with-type {NumList [NumEmpty] [NumCons {fst : Number} {rst : ???}]} {rec {length : ??? {fun {l : ???} : Number ...}} ...}} We assume here that the `NumList' definition provides us with a number of "new built-ins" -- `NumEmpty' and `NumCons' constructors, and assume also a `cases' form that can be used to both test a value and access its components (with the constructors serving as patterns). This makes the code a little different than what we started with: {with-type {NumList [NumEmpty] [NumCons {fst : Number} {rst : ???}]} {rec {length : ??? {fun {l : ???} : Number {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]}}} {call length {NumCons 1 {NumCons 2 {NumCons 3 {NumEmpty}}}}}}} The question is what should the "???" be filled with? Clearly, recursive data types are very common and we need to support them. The scope of `with-type' should therefore be similar to `rec', except that it works at the type level: the new type is available for its own definition. This is the complete code now: {with-type {NumList [NumEmpty] [NumCons {fst : Number} {rst : NumList}]} {rec {length : (NumList -> Number) {fun {l : NumList} : Number {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]}}} {call length {NumCons 1 {NumCons 2 {NumCons 3 {NumEmpty}}}}}}} (Note that in the course language we can do just that, and in addition we can use the `Rec' type constructor can be used to make up recursive types.) An important property that we would like this type to have is for it to be "well founded": that we'd never get stuck in some kind of type-level infinite loop. To see that this holds in this example, note that some of the variants are self-referential (only `NumCons' here), but there is at least one that is not (`NumEmpty') -- if there wasn't any simple variant, then we would have no way to construct instances of this type to begin with! [As a side note, if the language has lazy semantics, we could use such types -- for example: {with-type {NumList [NumCons {fst : Number} {rst : NumList}]} {rec {ones : NumList {NumCons 1 ones}} ...}} Reasoning about such programs requires more than just induction though.] ======================================================================== >> Judgments for recursive types If we want to have a language that is basically similar to the course language, then -- as seen above -- we'd use a similar `cases' expression. How should we type-check such expressions? In this case, we want to verify this: Γ ⊢ {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]} : Number Similarly to the judgment for `if' expressions, we require that the two result expressions are numbers. Indeed, you can think about `cases' as a more primitive tool that has the functionality of `if' -- in other words, given such user-defined types we could implement booleans as a new type and and implement `if' using `cases'. For example, wrap programs with: {with-type {Bool [True] [False]} ...} and now translate {if E1 E2 E3} to {cases E1 [{True} E2] [{False} E3]}. Continuing with typing `cases', we now have: Γ ⊢ 0 : Number Γ ⊢ {+ 1 {call length r}} : Number ———————————————————————————————————————————————————————————— Γ ⊢ {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]} : Number But this will not work -- we have no type for `r' here, so we can't prove the second subgoal. We need to consider the `NumList' type definition as something that, in addition to the new built-ins, provides us with type judgments for these built-ins. In the case of the `NumCons' variant, we know that using {NumCons x r} is a pattern that matches `NumList' values that are a result of this variant constructor but it also binds `x' and `r' to the values of the two fields, and since all uses of the constructor are verified, the fields have the declared types. This means that we need to extend Γ in this rule so we're able to prove the two subgoals. Note that we do the same for the `NumEmpty' case, except that there are no new bindings there. Γ ⊢ 0 : Number Γ[x:=Number; r:=NumList] ⊢ {+ 1 {call length r}} : Number ———————————————————————————————————————————————————————————— Γ ⊢ {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]} : Number Finally, we need to verify that the value itself -- `l' -- has the right type: that it is a `NumList'. Γ ⊢ l : NumList Γ ⊢ 0 : Number Γ[x:=Number; r:=NumList] ⊢ {+ 1 {call length r}} : Number ———————————————————————————————————————————————————————————— Γ ⊢ {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]} : Number But why `NumList' and not some other defined type? This judgment needs to do a little more work: it should inspect all of the variants that are used in the branches, find the type that defines them, then use that type as the subgoal. Furthermore, to make the type checker more useful, it can check that we have complete coverage of the variants, and that no variant is used twice: Γ ⊢ l : NumList (also need to show that NumEmpty and NumCons are all of the variants of NumList, with no repetition.) Γ ⊢ 0 : Number Γ[x:=Number; r:=NumList] ⊢ {+ 1 {call length r}} : Number ———————————————————————————————————————————————————————————— Γ ⊢ {cases l [{NumEmpty} 0] [{NumCons x r} {+ 1 {call length r}}]} : Number Note that how this is different from the version in the textbook -- it has a `type-case' expression with the type name mentioned explicitly -- for example: {type-case l NumList {{NumEmpty} 0} ...}. This is essentially the same as having each defined type come with its own `cases' expression. Our rule needs to do a little more work, but overall it is a little easier to use. (And the same goes for the actual implementation of the two languages.) In addition to `cases', we should also have typing judgments for the constructors. These are much simpler, for example: Γ ⊢ x : Number Γ ⊢ r : NumList ———————————————————————————————— Γ ⊢ {NumCons x r} : NumList Alternatively, we could add the constructors as new functions instead of new special forms -- so in the Picky language they'd be used in `call' expressions. The `with-type' will then create the bindings for its scope at runtime, and for the typechecker it will add the relevant types to Γ: Γ[NumCons:=(Number NumList -> NumList); NumEmpty:=(-> NumList)] (This requires functions of any arity, of course.) Using accessor functions could be similarly simpler than `cases', but less convenient for users. Note about representation: a by-product of our type checker is that whenever we have a `NumList' value, we know that it *must* be an instance of either `NumEmpty' or `NumCons'. Therefore, we could represent such values as a wrapped value container, with a single bit that distinguishes the two. This is in contrast to dynamically typed languages like Racket, where every new type needs to have its own globally unique tag. ======================================================================== >> "Runaway" instances Consider this code: {with-type {NumList [NumEmpty] ...} {NumEmpty}} We now know how to type check its validity, but what about the type of this whole expression? The obvious choice would be `NumList': {with-type {NumList [NumEmpty] ...} {NumEmpty}} : NumList There is a subtle but important problem here: the expression evaluates to a `NumList', but we can no longer use this value, since we're out of the scope of the `NumList' type definition! In other words, we would typecheck a program that is pretty much useless. Even if we were to allow such a value to flow to a different context with a `NumList' type definition, we wouldn't want the two to be confused -- following the principle of lexical scope, we'd want each type definition to be unique to its own scope even if it has the same concrete name. For example, using `NumList' as the type of the inner `with-type' here: {with-type {NumList something-completely-different} {with-type {NumList [NumEmpty] ...} {NumEmpty}}} would make it wrong. (In fact, we might want to have a new type even if the value goes outside of this scope and back in. The default struct definitions in Racket have exactly this property -- they're "generative" -- which means that each "call" to `define-struct' creates a new type, so: (define (two-foos) (define (foo x) (struct foo (x)) (foo x)) (list (foo 1) (foo 2))) returns two instances of two *different* `foo' types!) One way to resolve this is to just forbid the type from escaping the scope of its definition -- so we would forbid the type of the expression from being `NumList', which makes {with-type {NumList [NumEmpty] ...} {NumEmpty}} : NumList invalid. But that's not enough -- what about returning a compound value that *contains* an instance of `NumList'? For example -- what if we return a list or a function with a `NumList' instance? {with-type {NumList [NumEmpty] ...} {fun {x} {NumEmpty}}} : Num -> NumList?? Obviously, we would need to extend this restriction: the resulting type should not mention the defined type *at all* -- not even in lists or functions or anything else. This is actually easy to do: if the overall expression is type-checked in the surrounding lexical scope, then it is type-checked in the surrounding type environment (Γ), and that environment has nothing in it about `NumList' (well, nothing about *this* `NumList'). Note that this is, very roughly speaking, what our course language does: `define-type' can only define new types when it is used at the top-level. This works fine with the above assumption that such a value would be completely useless -- but there are aspects of such values that are useful. Such types are close to things that are known as "existential types", and they are for defining opaque values that you can do nothing with except pass them around, and only code in a specific lexical context can actually use them. For example, you could lump together the value with a function that can work on this value. If it wasn't for the `define-type' top-level restriction, we could write the following: (: foo : Integer -> (List ??? (??? -> Integer))) (define (foo x) (define-type FOO [Foo Integer]) (list (Foo 1) (lambda (f) (cases f [(Foo n) (* n n)])))) There is nothing that we can do with resulting `Foo' instance (we don't even have a way to name it) -- but in the result of the above function we get also a function that could work on such values, even ones from different calls: ((second (foo 1)) (first (foo 2))) -> 4 Since such kind of values are related to hiding information, they're useful (among other things) when talking about module systems (and object systems), where you want to have a local scope for a piece of code with bindings that are not available outside it. ======================================================================== >>> Type soundness [[[ PLAI Chapter 28 ]]] Having a type checker is obviously very useful -- but to be able to *rely* on it, we need to provide some kind of a formal account of the kind of guarantees that we get by using one. Specifically, we want to guarantee that a program that type-checks is guaranteed to *never* fail with a type error. Such type errors in Racket result in an exception -- but in C they can result in anything. In our simple Picky implementation, we still need to check the resulting value in `run': (typecheck prog (NumT) (EmptyTypeEnv)) (let ([result (eval prog (EmptyEnv))]) (cases result [(NumV n) n] ;; this is another error that is never reached, since we make ;; sure that the program always evaluates to a number above [else (error 'run "evaluation returned a non-number: ~s" result)])) A soundness proof for this would show that checking the result (in `cases') is not needed. However, the check must be there since Typed Racket (or any other typechecker) is far from making up and verifying such a proof by itsef. In this context we have a specific meaning for "fail with a type error", but these failures can be very different based on the kind of properties that your type checker verifies. This property of a type system is called "soundness": a *sound* type system is one that will never allow such errors for type-checked code: For any program `p', if we can type-check `p : τ', then `p' will evaluate to a value that is in the type `τ'. The importance of this can be seen in that it is the *only* connection between the type system and code execution. Without it, a type system is a bunch of syntactic rules that are completely disconnected from how the program runs. [Note also that -- "in the type" -- works only for the (very common) case where types are sets of values.] But this statement isn't exactly what we need -- it states a property that is too strong: what if execution gets stuck in an infinite loop? (That wasn't needed before we introduced `rec', where we could extend the conclusion part to: "... then `p' will terminate and evaluate to a value that is in the type `τ'".) We therefore need to revise it: For any program `p', if we can type-check `p : τ', and if `p' terminates and returns `v', then `v' is in the type `τ'. But there are still problems with this. Some programs evaluate to a value, some get stuck in an infinite loop, and some ... throw an error. Even with type checking, there are still cases when we get runtime errors. For example, in practically all statically typed languages the length of a list is not encoded in its type, so {first null} would throw an error. (It's possible to encode more information like that in types, but there is a downside to this too: putting more information in the type system means that things get less flexible, and it becomes more difficult to write programs since you're moving towards proving more facts about them.) Even if we were to encode list lengths in the type, we would still have runtime errors: opening a missing file, writing to a read-only file fetching a non-existent URL, etc, so we must find some way to account for these errors. Some "solutions" are: * For all cases where an error should be raised, just return some value (of the appropriate type). For example, (first l) could return 0 if the list is empty; (substring "foo" 10 20) would return "huh?", etc. It seems like a dangerous way to resolve the issue, but in fact that's what most C library calls do: return some bogus value (for example, malloc() returns NULL when there is no available memory), and possibly set some global flag that specifies the exact error. (The main problem with this is that C programmers often don't check all of these conditions, leading to propagating undetected errors further down -- and all of this is a very rich source of security issues.) * For all cases where an error should be raised, just get stuck into an infinite loop. This approach is obviously impractical -- but it is actually popular in some theoretical circles. The reason for that is that theory people will often talk about "domain", and to express facts about computation on these domains, they're extended with a "bottom" value that represents a diverging computation. Since this introduction is costly in terms of work that it requires, adding one more such value can lead to more effort than re-using the same "bottom" value. * Raise an exception. This works out better than the above two extremes, and it is the approach taken by practically all modern languages. So, assuming exceptions, we need to further refine what it means for a type system to be sound: For any program `p', if we can type-check `p : τ', and if `p' terminates without exceptions and returns `v', then `v' is in the type `τ'. An important thing to note here is that languages can have very different ideas about where to raise an exception. For example, Scheme implementations often have a trivial type-checker and throw runtime exceptions when there is a type error. On the other hand, there are systems that express much more in their type system, leaving much less room for runtime exceptions. A soundness proof ties together a particular type system with the statement that it is sound. As such, it is where you tie the knot between type checking (which happens at the syntactic level) and execution (dealing with runtime values). These are two things that are usually separate -- we've seen throughout the course many examples for things that could be done only at runtime, and things that should happen completely on the syntax. `eval' is the important "semantic function" that connects the two worlds (`compile' also did this, when we converted our evaluator to a compiler) -- and in here, it is the soundness proof that makes the connection. To demonstrate the kind of differences between the two sides, consider an `if' expression -- when it is executed, only one branch is evaluated, and the other is irrelevant, but when we check its type, *both* sides need to be verified. The same goes for a function whose execution get stuck in an infinite loop: the type checker will not get into a loop since it is not executing the code, only scans the (finite) syntax. The bottom line here is that type soundness is really a claim that the type system provides some guarantees about the runtime behavior of programs, and its proof demonstrates that these guarantees do hold. A fundamental problem with the type system of C and C++ is that it is not sound: these languages *have* a type system, but it does not provide such runtime guarantees. (In fact, C is even worse in that it really has two type systems: there is the system that C programmers usually interact with, which has a conventional set of type -- including even higher-order function types; and there is the machine-level type system, which only talks about various bit lengths of data. For example, using "%s" in a printf() format string will blindly copy characters from the address pointed to by the argument until it reaches a 0 character -- even if the actual argument is really a floating point number or a function.) Note that people often talk about "strongly typed languages". This term is often meaningless in that different people take it to mean different things: it is sometimes used for a language that "has a static type checker", or a language that "has a non-trivial type checker", and sometimes it means that a language has a sound type system. For most people, however, it means some vague idea like "a language like C or Pascal or Java" rather than some concrete definition. ========================================================================