2010-04-13 - What are Our Types -- The Picky Language - Typing control - Typing Recursion ======================================================================== >>> What are Our Types -- The Picky Language The first thing we need to do is to agree on what types are. Earlier, we talked about two types: numbers and functions (ignore booleans or anything else for now), we will use these two types for now. A type system is presented as a collection of rules called "type judgments", which describe how to determine the type of an expression. Beside the types and the judgments, a type system specification needs a (decidable) algorithm that can assign types to expressions. Such a specification should have one rule for every kind of syntactic construct, so when we get a program we can determine the precise type of any expression. Also, these judgments are usually recursive since a type judgment will almost always rely on the types of sub-expressions (if any). For our restricted system, we have two rules (=judgments) that we can easily specify: n : Number (any numeral `n' is a number) {fun {x} E} : Function And what about an identifier? Well, it is clear that we need to keep some form of an environment that will keep an account of types assigned to identifiers (note: all of this is not at run-time). This environment is used in all type judgments, and usually written as gamma (use G in this text). The conventional way to write the two rules above is: G |- n : Number G |- {fun {x} E} : Function The first one is read as "gamma proves that `n' has the type `number'". Note that this is a syntactic environment, much like DE-ENVs that you have seen in homework. So, we can write a rule for identifiers that simply has the type assigned by the environment: G |- x : G(x) We now need a rule for addition and a rule for application (note: we're using a very limited subset of our old language, where arithmetic operators are not function applications). Addition is easy: if we can prove that both `a' and `b' are numbers in some environment G, then we know that `{+ a b}' is a number in the same environment. We write this as follows: G |- a : Number G |- b : Number --------------------------------- G |- {+ a b} : Number Now, what about application? G |- f : Function G |- v : t_a (use `t' for `tau', convention for -------------------------------- type meta-variables) G |- {call f v} : ??? that is -- if we can prove that `f' is a function, and that `v' is a value of some type `t_a', then ... ??? Well, we need to know more about `f': we need to know what type it consumes and what type it returns. So a simple `function' is not enough -- we need some sort of a function type that specifies both input and output types. We will use the notation that was seen throughout the semester and dump `function'. Now we can write: G |- f : (t1 -> t2) G |- v : t1 -------------------------------- G |- {call f v} : t2 which makes sense -- if you take a function of type `t1->t2' and you feed it what it expects, you get the obvious output type. But going back to the language -- where do we get these new arrow types from? We will modify the language and require that every function specifies its input and output type (and assume we have only one argument functions). For example, we will write something like this for a function that is the curried version of addition: {fun {x : Number} : (Number -> Number) {fun {y : Number} : Number {+ x y}}} So: the revised syntax for the limited language that contains only additions, applications and single-argument functions, and for fun -- go back to using the `call' keyword is. The syntax we get is: ::= | | { fun { : } : } | { + } | { call } ::= Number | ( -> ) and the typing rules are: G |- n : Number G |- {fun {x : t1} : t2 E} : (t1 -> t2) G |- x : G(x) G |- a : Number G |- b : Number --------------------------------- G |- {+ a b} : Number G |- f : (t1 -> t2) G |- v : t1 -------------------------------- G |- {call f v} : t2 But we're still missing a big part -- the current rule for a `fun' expression is too weak, it does not allow us to conclude that this expression: {fun {x : Number} : (Number -> Number) 3} is invalid. Instead, it will make us think that this program: {call {call {fun {x : Number} : (Number -> Number) 3} 5} 7} is valid, and should return a number. What's missing? We need to check that the body part of the function is correct, so the rule for typing a `fun' is no longer a simple one. Here is how we check the body instead of blindly believing program annotations: G[x:=t1] |- E : t2 --------------------------------------- G |- {fun {x : t1} : t2 E} : (t1 -> t2) That is -- we want to make sure that if `x' has type `t1', then the body expression `E' has type `t2', and if we can prove this, then we can trust these annotations. There is an important relationship between this rule and the `call' rule for application: * In this rule we assume that the input will have the right type and guarantee (via a proof) that the output will have the right type. * In the application rule, we guarantee (by a proof) an input of the right type and assume a result of the right type. (Side note: PLT comes with a contract system that can identify type errors dynamically, and assign blame to either the caller or the callee -- and these correspond to these two sides.) Note that, as we said, `number' is really just a property of a certain kind of values, we don't know exactly what numbers are actually used. In the same way, the arrow function types don't tell us exactly what function it is, for example, `(Number -> Number)' can indicate a function that adds three to its argument, subtracts seven, or multiplies it by 7619. But it certainly contains much more than the previous naive `function' type. (Consider also typed scheme here: it goes much further in expressing facts about code.) For reference, here is the complete BNF and typing rules: ---------------------------------------------------------------------- ::= | | { fun { : } : } | { + } | { call } ::= Number | ( -> ) G |- n : Number G |- x : G(x) G |- a : Number G |- b : Number --------------------------------- G |- {+ a b} : Number G[x:=t1] |- E : t2 --------------------------------------- G |- {fun {x : t1} : t2 E} : (t1 -> t2) G |- f : (t1 -> t2) G |- v : t1 -------------------------------- G |- {call f v} : t2 ---------------------------------------------------------------------- ======================================================================== Examples of using types (abbreviate `Number' as `Num'): {} |- 5 : Num {} |- 7 : Num {} |- 2 : Num ----------------------------- {} |- {+ 5 7} : Num --------------------------------------------- {} |- {+ 2 {+ 5 7}} : Num [x:=Num] |- x : Num [x:=Num] |- 3 : Num ----------------------------------------- [x:=Num] |- {+ x 3} : Num ------------------------------------------------ {} |- 5 : Num {} |- {fun {x : Num} : Num {+ x 3}} : Num -> Num ---------------------------------------------------------------- {} |- {call {fun {x : Num} : Num {+ x 3}} 5} : Num Finally, try a buggy program like {+ 3 {fun {x : Number} : Number x}} and see where it is impossible to continue. The main thing here is that to know that this is a type error, we have to prove that there is no judgment for a certain type (in this case, any way to prove that a `fun' expression has a `num' type), which we (humans) can only do by inspecting all of the rules. Because of this, we need to also add an algorithm to our type system, one that we can follow and determine when it gives up. ======================================================================== >>> Typing control We will now extend our typed Picky language to have a conditional expression, and predicates. First, we extend the BNF with a predicate expression, and we also need a type for the results: ---------------------------------------------------------------------- ::= | | { fun { : } : } | { + } | { < } | { call } | { if } ::= Number | Boolean | ( -> ) ---------------------------------------------------------------------- Initially, we use the same rules, and add the obvious type for the predicate: G |- a : Number G |- b : Number --------------------------------- G |- {< a b} : Boolean And what should the rule for `if' look like? Well, to make sure that the condition is a boolean, it should be something of this form: G |- c : Boolean G |- t : ??? G |- e : ??? ---------------------------------------------- G |- {if c t e} : ??? What would be the types of `t' and `e'? A natural choice would be to let the programmer use any two types: G |- c : Boolean G |- t : t1 G |- e : t2 -------------------------------------------- G |- {if c t e} : ??? But what would the return type be? This is still a problem. (BTW, some kind of a union would be nice, but it has some strong implications that we will not discuss.) In addition, we will have a problem detecting possible errors like: {+ 2 {if 3 {fun {x} x}}} Since we know nothing about the condition, we can just as well be conservative and force both arms to have the same type. The rule is therefore: G |- c : Boolean G |- t : t G |- e : t ------------------------------------------ G |- {if c t e} : t -- using the same letter indicates that we expect the types to be identical, unlike the previous attempt. Consequentially, this type system is fundamentally weaker than Typed Scheme which we use in this class. ======================================================================== >> Extending Picky In general, we can extend this language in one of two ways. For example, lets say that we want to add the `with' form. One way to add it is what we did above -- simply add it to the language, and write the rule for it. In this case, we get: G |- v : t1 G[x:=t1] |- E : t2 -------------------------------- G |- {with {x : t1 v} E} : t2 Note how this rule encapsulates information about the scope of `with'. Also note that we need to specify the types for the bound values. Another way to achieve this extension is if we add `with' as a derived rule. We know that when we see a {with {x v} E} expression, we can just translate it into {call {fun {x} E} v} So we could achieve this extension by using a rewrite rule to translate all `with' expressions into `call's of anonymous functions (eg, using the `with-stx' facility that we have seen recently). This could be done formally: begin with the `with' form, translate to the `call' form, and finally show the necessary goals to prove its type. The only thing to be aware of is the need to translate the types too, and there is one type that is missing from the typed-with version above -- the output type of the function. This is an indication that we don't really need to specify function output types -- we can just deduce them from the code, provided that we know the input type to the function. Indeed, if we do this on a general template for a `with' expression, then we end up with the same goals that need to be proved as in the above rule: G[x:=t1] |- E : t2 ---------------------------------- G |- {fun {x : t1} E} : (t1 -> t2) G |- v : t1 ----------------------------------------------------- G |- {call {fun {x : t1} E} v} : t2 ----------------------------------- G |- {with {x : t1 v} E} : t2 ======================================================================== Conclusion -- we've seen type judgment rules, and using them in proof trees. Note that in these trees there is a clear difference between rules that have no preconditions -- there are axioms that are always true (eg, a numeral is always of type `num'). The general way of proving a type seems similar to evaluation of an expression, but there is a huge difference -- *nothing* is really getting evaluated. As an example, we always go into the body of a function expression, which is done to get the function's type, and this is later used anywhere this function is used -- when you evaluate this: {with {f {fun {x : Number} : Number x}} {+ {call f 1} {call f 2}}} you first create a closure which means that you don't touch the body of the function, and later you use it twice. In contrast, when you prove the type of this expression, you immediately go into the body of the function which you have to do to prove that it has the expected `Number->Number' type, and then you just use this type twice. Finally, we have seen the importance of using the same type letters to enforce types, and in the case of typing an `if' statement this had a major role: specifying that the two arms can be any two types, or the same type. ======================================================================== >>> Typing Recursion We already know that without recursion life can be very boring... So we obviously want to be able to have recursive functions -- but the question is how will they interact with our type system. One thing that we have seen is that by just having functions we get recursion. This was achieved by the Y combinator function. It seems like the same should apply to our simple typed language. The core of the Y combinator was using an expression similar to Omega that generates the infinite loop that is needed. In our language: {call {fun {x} {call x x}} {fun {x} {call x x}}} This expression was impossible to evaluate completely since it never terminates, but it served as a basis for the Y combinator. Lets examine its type: we begin with the internal function -- it is a function, so it must have an arrow type, assume that the input type is t_i and the output type is t_o: {fun {x} {call x x}} : t_i -> t_o Now, the body of the function is `{call x x}', so `x' must be a function itself, which means that t_i must also be an arrow type, say t_1->t_2 (a) for some t_1 and t_2, and the type of the whole function is now: t_i -> t_o = (t_1->t_2) -> t_o What can we say about t_1? Well, it is the input type of `x', but we see that `x' is used with `x' as its input, therefore t_1 must be the type of `x' (b) which is the argument for the function which is t_i. What we get is: type-of(x) = t_1 -> t_2 (a) type-of(x) = t_1 (b) and from this we get: => t_1 = t_1 -> t_2 = (t_1 -> t_2) -> t_2 = ((t_1 -> t_2) -> t_2) -> t_2 = ... And this is a type that does not exist in our type system, since we can only have finite types. Therefore, we have a proof by contradiction that this expression cannot be typed in our system. This is closely related to the fact that the typed language we have described so far is "strongly normalizing": no matter what program you write, it will always terminate! To see this, very informally, consider the language without functions -- this is clearly a language where all programs terminate, since the only way to create a loop is through function applications. Now add functions and function application -- in the typing rules for the resulting language, each `fun' creates a function type (creates an arrow), and each function application consumes a function type (deletes one arrow) -- since types are finite, the number of arrows is finite, which means that the number of possible applications is finite, so all programs must run in finite time. In the our language, therefore, the "halting problem" doesn't even exist, since all programs (that are properly typed) are guaranteed to halt. This property is useful in many real-life situations (consider firewall rules, configuration files, devices with embedded code). But the language that we get is very limited as a result -- we really want the power to shoot our feet... ======================================================================== >> Extending the Typed Language with Recursion As we have seen, our language is strongly normalizing, which means that to get general recursion, we must introduce a new construct (unlike previously, when we didn't really need one). We can do this as we previously did -- by adding a new construct to the language, or we can somehow extend the (sub) language of type descriptions to allow a new kind of type that can be used to solve the `t_1 = t_1 -> t_2' equation. (Note the `Rec' type constructor in Typed Scheme.) For simplicity, we will now take the first route and add `rec' -- an explicit recursive binder form to the language (as with `with', we're going back to `rec' rather than `bindrec' to keep things simple). First, the new BNF: ::= | | { fun { : } : } | { with { : } } | { + } | { < } | { call } | { if } | { rec { : } } ::= Number | Boolean | ( -> ) We now need to add a typing judgment for `rec' expressions. What should it look like? ??? ---------------------------- G |- {rec {x : t_x v} E} : t `rec' is similar to all the other local binding forms, like `with', it can be seen as a combination of a function and an application. So we need to check the two things that those rules checked -- first, check that the body expression has the right type assuming that the type annotation given to `x' is valid: G[x:=t_x] |- E : t ??? ---------------------------- G |- {rec {x : t_x v} E} : t Now, we also want to add the other side -- making sure that the t_x type annotation is valid: G[x:=t_x] |- E : t G |- v : t_x --------------------------------- G |- {rec {x : t_x v} E} : t But that will not be possible in general -- `v' is an expression that can include `x' itself -- that's the whole point. The conclusion is that we should use a similar trick to the one that we used to specify evaluation of recursive binders -- the same environment is used for both the named expression and for the body expression: G[x:=t_x] |- E : t G[x:=t_x] |- v : t_x ----------------------------------------- G |- {rec {x : t_x v} E} : t Our complete language specification is below. ---------------------------------------------------------------------- ::= | | { fun { : } : } | { with { : } } | { rec { : } } | { + } | { < } | { call } | { if } ::= Number | Boolean | ( -> ) G |- n : Number G |- x : G(x) G |- a : Number G |- b : Number --------------------------------- G |- {+ a b} : Number G |- a : Number G |- b : Number --------------------------------- G |- {< a b} : Boolean G[x:=t1] |- E : t2 --------------------------------------- G |- {fun {x : t1} : t2 E} : (t1 -> t2) G |- f : (t1 -> t2) G |- v : t1 -------------------------------- G |- {call f v} : t2 G |- c : Boolean G |- t : t G |- e : t ------------------------------------------ G |- {if c t e} : t G |- v : t1 G[x:=t1] |- E : t2 -------------------------------- G |- {with {x : t1 v} E} : t2 G[x:=t_x] |- E : t G[x:=t_x] |- v : t_x ----------------------------------------- G |- {rec {x : t_x v} E} : t ---------------------------------------------------------------------- ========================================================================