Lecture #22, Tuesday, March 24th
================================
 Types
 What is a Type?
 Our Types  The Picky Language

# Types
> [PLAI §24]
In our Toy language implementation, there are certain situations that
are not covered. For example,
{< {+ 1 2} 3}
is not a problem, but
{+ {< 1 2} 3}
will eventually use Racket's addition function on a boolean value, which
will crash our evaluator. Assuming that we go back to the simple
language we once had, where there were no booleans, we can still run
into errors  except now these are the errors that our code raises:
{+ {fun {} 1} 2}
or
{1 2 3}
or
{{fun {x y} {+ x y}} 5}
In any case, it would be good to avoid such errors right from the start
 it seems like we should be able to identify such bad code and not
even try to run it. One thing that we can do is do a little more work
at parse time, and declare the `{1 2 3}` program fragment as invalid.
We can even try to forbid
{bind {{x 1}} {x 2 3}}
in the same way, but what should we do with this? 
{fun {x} {x 2 3}}
The validity of this depends on how it is used. The same goes for some
invalid expressions  the above bogus expression can be fine if it's
in a context that shadows `<`:
{bind {{< *}}
{+ {< 1 2} 3}}
Finally, consider this:
{+ 3 {if 5 {fun {x} x}}}
where mystery contains something like `random` or `read`. In general,
knowing whether a piece of code will run with no errors is a problem
that is equivalent to the halting problem  and because of this, there
is no way to create an "exact" type system: they are all either too
restrictive (rejecting programs that would run with no errors) or too
permissive (accepting programs that might crash). This is a very
practical issue  type safety means a lot less bugs in the system. A
good type system is still an actively researched problem.

# What is a Type?
> [PLAI §25]
A type is any property of a program (or an expression) that can be
determined without running the program. (This is different than what is
considered a `type` in Racket which is a property that is known only at
runtime, which means that before runtime we know nothing so in essence
we have a single type (in the static sense).) Specifically, we want to
use types in a way that predicts some aspects of the program's behavior,
for example, whether a program will crash.
Usually, types are being used as the kind of value that an expression
can evaluate to, not the precise value itself. For example, we might
have two kinds of values  functions and numbers, and we know that
addition always operates on numbers, therefore
{+ 1 {fun {x} x}}
is a type error. Note that to determine this we don't care about the
actual function, just the fact that it is a function.
Important: types can discriminate certain programs as invalid, but they
cannot discriminate correct programs from incorrect ones. For example,
there is no way for any type system to know that this:
{fun {x} {+ x 1}}
is an incorrect decreasebyone function.
In general, type systems try to get to the optimal point where as much
information as possible is known, yet the language is not too
restricted, no significant computing resources are wasted, and
programmers don't spend much time annotating their code.
Why would you want to use a type system?
* Catch errors even in code that you don't execute, for example, when
your tests are too weak (but they do *not* substitute proper test
suites).
* They help reduce the time spent on debugging (when they detect
legitimate errors, rather than force you to change your code).
* As we have seen, they help in documenting code (but they do *not*
substitute proper documentation).
* Compilers can use type information to make programs run much faster.
* They encourage more organized code (for example, our use of
`definetype` and `cases` helps in writing code; these two constructs
are inspired by ML).

# Our Types  The Picky Language
The first thing we need to do is to agree on what types are. Earlier,
we talked about two types: numbers and functions (ignore booleans or
anything else for now), we will use these two types for now.
> In general, this means that we are using the *Types are Sets* meaning
> for types, and specifically, we will be implmenting a type system
> known as a *HindleyMilner* system. This is *not* what Typed Racket
> is using. In fact, one of the main differences is that in our type
> system each binding has exactly one type, whereas in Typed Racket an
> identifier can have different types in different places in the code.
> An example of this is something that we've talked about earlier:
>
> (: foo : (U String Number) > Number)
> (define (foo x) ; \ these `x`s have a
> (if (number? x) ; / (U Number String) type
> (+ x 1) ; > this one is a Number
> (stringlength x))) ; > and this one is a String
A type system is presented as a collection of rules called "type
judgments", which describe how to determine the type of an expression.
Beside the types and the judgments, a type system specification needs a
(decidable) algorithm that can assign types to expressions.
Such a specification should have one rule for every kind of syntactic
construct, so when we get a program we can determine the precise type of
any expression. Also, these judgments are usually recursive since a
type judgment will almost always rely on the types of subexpressions
(if any).
For our restricted system, we have two rules (= judgments) that we can
easily specify:
n : Number (any numeral `n' is a number)
{fun {x} E} : Function
And what about an identifier? Well, it is clear that we need to keep
some form of an environment that will keep an account of types assigned
to identifiers (note: all of this is not at runtime). This environment
is used in all type judgments, and usually written as a capital Greek
Gamma character (in some places `G` is used to stick to ASCII texts).
The conventional way to write the two rules above is:
Γ ⊢ n : Number
Γ ⊢ {fun {x} E} : Function
The first one is read as "Gamma proves that `n` has the type `Number`".
Note that this is a syntactic environment, much like DEENVs that you
have seen in homework.
So, we can write a rule for identifiers that simply has the type
assigned by the environment:
Γ ⊢ x : Γ(x) ; "Γ(x)" is similar to a "lookup(x, Γ)"
We now need a rule for addition and a rule for application (note: we're
using a very limited subset of our old language, where arithmetic
operators are not function applications). Addition is easy: if we can
prove that both `a` and `b` are numbers in some environment Γ, then we
know that `{+ a b}` is a number in the same environment. We write this
as follows:
Γ ⊢ A : Number Γ ⊢ B : Number
———————————————————————————————
Γ ⊢ {+ A B} : Number
Now, what about application? We need to refer to some arbitrary type
now, and the common letter for that is a Greek lowercase tau:
Γ ⊢ F : Function Γ ⊢ V : τᵥ
—————————————————————————————
Γ ⊢ {call F V} : ???
that is  if we can prove that `f` is a function, and that `v` is a
value of some type `τₐ`, then ... ??? Well, we need to know more about
`f`: we need to know what type it consumes and what type it returns. So
a simple `function` is not enough  we need some sort of a function
type that specifies both input and output types. We will use the
notation that was seen throughout the semester and dump `function`. Now
we can write:
Γ ⊢ F : (τ₁ > τ₂) Γ ⊢ V : τ₁
——————————————————————————————
Γ ⊢ {call F V} : τ₂
which makes sense  if you take a function of type `τ₁>τ₂` and you
feed it what it expects, you get the obvious output type. But going
back to the language  where do we get these new arrow types from? We
will modify the language and require that every function specifies its
input and output type (and assume we have only one argument functions).
For example, we will write something like this for a function that is
the curried version of addition:
{fun {x : Number} : (Number > Number)
{fun {y : Number} : Number
{+ x y}}}
So: the revised syntax for the limited language that contains only
additions, applications and singleargument functions, and for fun 
go back to using the `call` keyword is. The syntax we get is:
::=

 { + }
 { fun { : } : }
 { call }
::= Number
 ( > )
and the typing rules are:
Γ ⊢ n : Number
Γ ⊢ {fun {x : τ₁} : τ₂ E} : (τ₁ > τ₂)
Γ ⊢ x : Γ(x)
Γ ⊢ A : Number Γ ⊢ B : Number
———————————————————————————————
Γ ⊢ {+ A B} : Number
Γ ⊢ F : (τ₁ > τ₂) Γ ⊢ V : τ₁
——————————————————————————————
Γ ⊢ {call F V} : τ₂
But we're still missing a big part  the current rule for a `fun`
expression is too weak, if we use it, we conclude that these
expressions:
{fun {x : Number} : (Number > Number)
3}
{fun {x : Number} : Number
{call x 2}}
are valid, as well concluding that this program:
{call {call {fun {x : Number} : (Number > Number)
3}
5}
7}
is valid, and should return a number. What's missing? We need to check
that the body part of the function is correct, so the rule for typing a
`fun` is no longer a simple one. Here is how we check the body instead
of blindly believing program annotations:
Γ[x:=τ₁] ⊢ E : τ₂ ; Γ[x:=τ₁] is similar to
—————————————————————————————————————— ; extend(Γ, x, τ₁)
Γ ⊢ {fun {x : τ₁} : τ₂ E} : (τ₁ > τ₂)
That is  we want to make sure that if `x` has type `τ₁`, then the
body expression `E` has type `τ₂`, and if we can prove this, then we can
trust these annotations.
There is an important relationship between this rule and the `call` rule
for application:
* In this rule we assume that the input will have the right type and
guarantee (via a proof) that the output will have the right type.
* In the application rule, we guarantee (by a proof) an input of the
right type and assume a result of the right type.
(Side note: Racket comes with a contract system that can identify type
errors dynamically, and assign blame to either the caller or the callee
 and these correspond to these two sides.)
Note that, as we said, `number` is really just a property of a certain
kind of values, we don't know exactly what numbers are actually used.
In the same way, the arrow function types don't tell us exactly what
function it is, for example, `(Number > Number)` can indicate a
function that adds three to its argument, subtracts seven, or multiplies
it by 7619. But it certainly contains much more than the previous naive
`function` type. (Consider also Typed Racket here: it goes much further
in expressing facts about code.)
For reference, here is the complete BNF and typing rules:
::=

 { + }
 { fun { : } : }
 { call }
::= Number
 ( > )
Γ ⊢ n : Number
Γ ⊢ x : Γ(x)
Γ ⊢ A : Number Γ ⊢ B : Number
———————————————————————————————
Γ ⊢ {+ A B} : Number
Γ[x:=τ₁] ⊢ E : τ₂
——————————————————————————————————————
Γ ⊢ {fun {x : τ₁} : τ₂ E} : (τ₁ > τ₂)
Γ ⊢ F : (τ₁ > τ₂) Γ ⊢ V : τ₁
—————————————————————————————
Γ ⊢ {call F V} : τ₂

Examples of using types (abbreviate `Number` as `Num`)  first, a
simple example:
{} ⊢ 5 : Num {} ⊢ 7 : Num
———————————————————————————
{} ⊢ 2 : Num {} ⊢ {+ 5 7} : Num
—————————————————————————————————————————————
{} ⊢ {+ 2 {+ 5 7}} : Num
and a little more involved one:
[x:=Num] ⊢ x : Num [x:=Num] ⊢ 3 : Num
———————————————————————————————————————
[x:=Num] ⊢ {+ x 3} : Num
———————————————————————————————————————————————
{} ⊢ {fun {x : Num} : Num {+ x 3}} : Num > Num {} ⊢ 5 : Num
——————————————————————————————————————————————————————————————
{} ⊢ {call {fun {x : Num} : Num {+ x 3}} 5} : Num
Finally, try a buggy program like
{+ 3 {fun {x : Number} : Number x}}
and see where it is impossible to continue.
The main thing here is that to know that this is a type error, we have
to prove that there is no judgment for a certain type (in this case, no
way to prove that a `fun` expression has a `Num` type), which we
(humans) can only do by inspecting all of the rules. Because of this,
we need to also add an algorithm to our type system, one that we can
follow and determine when it gives up.