Lecture #5, Tuesday, January 24th
=================================
 Intro to Typed Racket
 Bindings & Substitution
 WAE: Adding Bindings to AE
 Evaluation of `with`

# Intro to Typed Racket
The plan:
* Why Types?
* Why Typed Racket?
* What's Different about Typed Racket?
* Some Examples of Typed Racket for Course Programs
### Types ##############################################################
 Who has used a (statically) typed language?
 Who has used a typed language that's not Java?
Typed Racket will be both similar to and very different from anything
you've seen before.
### Why types? #########################################################
 Types help structure programs.
 Types provide enforced and mandatory documentation.
 Types help catch errors.
Types ***will*** help you. A *lot*.
### Structuring programs ###############################################
 Data definitions
;; An AE is one of: ; \
;; (makeNum Number) ; > HtDP
;; (makeAdd AE AE) ; /
(definetype AE ; \
[Num number?] ; > Predicates =~= contracts (PLAI)
[Add AE? AE?]) ; / (has names of defined types too)
(definetype AE ; \
[Num Number] ; > Typed Racket (our PL)
[Add AE AE]) ; /
 Datafirst
The structure of your program is derived from the structure of your
data.
You have seen this in Fundamentals with the design recipe and with
templates. In this class, we will see it extensively with type
definitions and the (cases ...) form. Types make this pervasive  we
have to think about our data before our code.
 A language for describing data
Instead of having an informal language for describing types in
contract lines, and a more formal description of predicates in a
`definetype` form, we will have a single, unified language for both
of these. Having such a language means that we get to be more precise
and more expressive (since the typed language covers cases that you
would otherwise dismiss with some hand waving, like "a function").
### Why Typed Racket? ##################################################
Racket is the language we all know, and it has the benefits that we
discussed earlier. Mainly, it is an excellent language for experimenting
with programming languages.
 Typed Racket allows us to take our Racket programs and typecheck them,
so we get the benefits of a statically typed language.
 Types are an important programming language feature; Typed Racket will
help us understand them.
[Also: the development of Typed Racket is happening here in
Northeastern, and will benefit from your feedback.]
### How is Typed Racket different from Racket ##########################
 Typed Racket will reject your program if there are type errors! This
means that it does that at compiletime, *before* any code gets to
run.
 Typed Racket files start like this:
#lang typed/racket
;; Program goes here.
but we will use a variant of the Typed Racket language, which has a
few additional constructs:
#lang pl
;; Program goes here.
 Typed Racket requires you to write the contracts on your functions.
Racket:
;; f : Number > Number
(define (f x)
(* x (+ x 1)))
Typed Racket:
#lang pl
(: f : Number > Number)
(define (f x)
(* x (+ x 1)))
[In the "real" Typed Racket the preferred style is with prefix arrows:
#lang typed/racket
(: f (> Number Number))
(define (f x) : Number
(* x (+ x 1)))
and you can also have the type annotations appear inside the
definition:
#lang typed/racket
(define (f [x : Number]) : Number
(* x (+ x 1)))
but we will not use these form.]
 As we've seen, Typed Racket uses types, not predicates, in
`definetype`.
(definetype AE
[Num Number]
[Add AE AE])
versus
(definetype AE
[Num number?]
[Add AE? AE?])
 There are other differences, but these will suffice for now.
### Examples ###########################################################
(: digitnum : Number > (U Number String))
(define (digitnum n)
(cond [(<= n 9) 1]
[(<= n 99) 2]
[(<= n 999) 3]
[(<= n 9999) 4]
[else "a lot"]))
(: fact : Number > Number)
(define (fact n)
(if (zero? n)
1
(* n (fact ( n 1)))))
(: helper : Number Number > Number)
(define (helper n acc)
(if (zero? n)
acc
(helper ( n 1) (* acc n))))
(: fact : Number > Number)
(define (fact n)
(helper n 1))
(: fact : Number > Number)
(define (fact n)
(: helper : Number Number > Number)
(define (helper n acc)
(if (zero? n)
acc
(helper ( n 1) (* acc n))))
(helper n 1))
(: every? : (All (A) (A > Boolean) (Listof A) > Boolean))
;; Returns false if any element of lst fails the given pred,
;; true if all pass pred.
(define (every? pred lst)
(or (null? lst)
(and (pred (first lst))
(every? pred (rest lst)))))
(definetype AE
[Num Number]
[Add AE AE]
[Sub AE AE])
;; the only difference in the following definition is
;; using (: : ) instead of ";; : "
(: parsesexpr : Sexpr > AE)
;; parses sexpressions into AEs
(define (parsesexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(list '+ left right)
(Add (parsesexpr left) (parsesexpr right))]
[(list ' left right)
(Sub (parsesexpr left) (parsesexpr right))]
[else (error 'parsesexpr "bad syntax in ~s" sexpr)]))
### More interesting examples ##########################################
* Typed Racket is designed to be a language that is friendly to the kind
of programs that people write in Racket. For example, it has unions:
(: foo : (U String Number) > Number)
(define (foo x)
(if (string? x)
(stringlength x)
;; at this point it knows that `x' is not a
;; string, therefore it must be a number
(+ 1 x)))
This is not common in statically typed languages, which are usually
limited to only *disjoint* unions. For example, in OCaml you'd write
this definition:
type string_or_number = Str of string  Int of int ;;
let foo x = match x with Str s > String.length s
 Int i > i+1 ;;
And use it with an explicit constructor:
foo (Str "bar") ;;
foo (Int 3) ;;
* Note that in the Typed Racket case, the language keeps track of
information that is gathered via predicates  which is why it knows
that one `x` is a String, and the other is a Number.
* Typed Racket has a concept of subtypes  which is also something
that most statically typed languages lack. In fact, the fact that it
has (arbitrary) unions means that it must have subtypes too, since a
type is always a subtype of a union that contains this type.
* Another result of this feature is that there is an `Any` type that is
the union of all other types. Note that you can always use this type
since everything is in it  but it gives you the *least* information
about a value. In other words, Typed Racket gives you a choice: *you*
decide which type to use, one that is very restricted but has a lot of
information about its values to a type that is very permissive but has
almost no useful information. This is in contrast to other type system
(HM systems) where there is always exactly one correct type.
To demonstrate, consider the identity function:
(define (id x) x)
You could use a type of `(: id : Integer > Integer)` which is very
restricted, but you know that the function always returns an integer
value.
Or you can make it very permissive with a `(: id : Any > Any)`, but
then you know nothing about the result  in fact, `(+ 1 (id 2))`
will throw a type error. It *does* return `2`, as expected, but the
type checker doesn't know the type of that `2`. If you wanted to use
this type, you'd need to check that the result is a number, eg:
(let ([x (id 123)]) (if (number? x) (+ x 10) 999))
This means that for this particular function there is no good
*specific* type that we can choose  but there are *polymorphic*
types. These types allow propagating their input type(s) to their
output type. In this case, it's a simple "my output type is the same
as my input type":
(: id : (All (A) A > A))
This makes the output preserve the same level of information that you
had on its input.
* Another interesting thing to look at is the type of `error`: it's a
function that returns a type of `Nothing`  a type that is the same
as an *empty* union: `(U)`. It's a type that has no values in it 
it fits `error` because it *is* a function that doesn't return any
value, in fact, it doesn't return at all. In addition, it means that
an `error` expression can be used anywhere you want because it is a
subtype of anything at all.
* An `else` clause in a `cond` expression is almost always needed, for
example:
(: digitnum : Number > (U Number String))
(define (digitnum n)
(cond [(<= n 9) 1]
[(<= n 99) 2]
[(<= n 999) 3]
[(<= n 9999) 4]
[(> n 9999) "a lot"]))
(and if you think that the type checker should know what this is
doing, then how about
(> (* n 10) (/ (* ( 10000 1) 20) 2))
or
(>= n 10000)
for the last test?)
* In some rare cases you will run into one limitation of Typed Racket:
it is difficult (that is: a generic solution is not known at the
moment) to do the right inference when polymorphic functions are
passed around to higherorder functions. For example:
(: call : (All (A B) (A > B) A > B))
(define (call f x)
(f x))
(call rest (list 4))
In such cases, we can use `inst` to *instantiate* a function with a
polymorphic type to a given type  in this case, we can use it to
make it treat `rest` as a function that is specific for numeric lists:
(call (inst rest Number) (list 4))
In other rare cases, Typed Racket will infer a type that is not
suitable for us  there is another form, `ann`, that allows us to
specify a certain type. Using this in the `call` example is more
verbose:
(call (ann rest : ((Listof Number) > (Listof Number))) (list 4))
However, these are going to be rare and will be mentioned explicitly
whenever they're needed.

# Bindings & Substitution
We now get to an important concept: substitution.
Even in our simple language, we encounter repeated expressions. For
example, if we want to compute the square of some expression:
{* {+ 4 2} {+ 4 2}}
Why would we want to get rid of the repeated subexpression?
* It introduces a redundant computation. In this example, we want to
avoid computing the same subexpression a second time.
* It makes the computation more complicated than it could be without the
repetition. Compare the above with:
with x = {+ 4 2},
{* x x}
* This is related to a basic fact in programming that we have already
discussed: duplicating information is always a bad thing. Among other
bad consequences, it can even lead to bugs that could not happen if we
wouldn't duplicate code. A toy example is "fixing" one of the numbers
in one expression and forgetting to fix the corresponding one:
{* {+ 4 2} {+ 4 1}}
Real world examples involve much more code, which make such bugs very
difficult to find, but they still follow the same principle.
* This gives us more expressive power  we don't just say that we want
to multiply two expressions that both happen to be `{+ 4 2}`, we say
that we multiply the `{+ 4 2}` expression by *itself*. It allows us to
express identity of two values as well as using two values that happen
to be the same.
So, the normal way to avoid redundancy is to introduce an identifier.
Even when we speak, we might say: "let x be 4 plus 2, multiply x by x".
(These are often called "variables", but we will try to avoid this name:
what if the identifier does not change (vary)?)
To get this, we introduce a new form into our language:
{with {x {+ 4 2}}
{* x x}}
We expect to be able to reduce this to:
{* 6 6}
by substituting 6 for `x` in the body subexpression of `with`.
A little more complicated example:
{with {x {+ 4 2}}
{with {y {* x x}}
{+ y y}}}
[add] = {with {x 6} {with {y {* x x}} {+ y y}}}
[subst]= {with {y {* 6 6}} {+ y y}}
[mul] = {with {y 36} {+ y y}}
[subst]= {+ 36 36}
[add] = 72

# WAE: Adding Bindings to AE
> [PLAI ยง3]
To add this to our language, we start with the BNF. We now call our
language "WAE" (With+AE):
::=
 { + }
 {  }
 { * }
 { / }
 { with { } }

Note that we had to introduce *two* new rules: one for introducing an
identifier, and one for using it. This is common in many language
specifications, for example `definetype` introduces a new type, and it
comes with `cases` that allows us to destruct its instances.
For `` we need to use some form of identifiers, the natural choice
in Racket is to use symbols. We can therefore write the corresponding
type definition:
(definetype WAE
[Num Number]
[Add WAE WAE]
[Sub WAE WAE]
[Mul WAE WAE]
[Div WAE WAE]
[Id Symbol]
[With Symbol WAE WAE])
The parser is easily extended to produce these syntax objects:
(: parsesexpr : Sexpr > WAE)
;; parses sexpressions into WAEs
(define (parsesexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(symbol: name) (Id name)]
[(list 'with (list (symbol: name) named) body)
(With name (parsesexpr named) (parsesexpr body))]
[(list '+ lhs rhs) (Add (parsesexpr lhs) (parsesexpr rhs))]
[(list ' lhs rhs) (Sub (parsesexpr lhs) (parsesexpr rhs))]
[(list '* lhs rhs) (Mul (parsesexpr lhs) (parsesexpr rhs))]
[(list '/ lhs rhs) (Div (parsesexpr lhs) (parsesexpr rhs))]
[else (error 'parsesexpr "bad syntax in ~s" sexpr)]))
But note that this parser is inconvenient  if any of these
expressions:
{* 1 2 3}
{foo 5 6}
{with x 5 {* x 8}}
{with {5 x} {* x 8}}
would result in a "bad syntax" error, which is not very helpful. To make
things better, we can add another case for `with` expressions that are
malformed, and give a more specific message in that case:
(: parsesexpr : Sexpr > WAE)
;; parses sexpressions into WAEs
(define (parsesexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(symbol: name) (Id name)]
[(list 'with (list (symbol: name) named) body)
(With name (parsesexpr named) (parsesexpr body))]
[(cons 'with more)
(error 'parsesexpr "bad `with' syntax in ~s" sexpr)]
[(list '+ lhs rhs) (Add (parsesexpr lhs) (parsesexpr rhs))]
[(list ' lhs rhs) (Sub (parsesexpr lhs) (parsesexpr rhs))]
[(list '* lhs rhs) (Mul (parsesexpr lhs) (parsesexpr rhs))]
[(list '/ lhs rhs) (Div (parsesexpr lhs) (parsesexpr rhs))]
[else (error 'parsesexpr "bad syntax in ~s" sexpr)]))
and finally, to group all of the parsing code that deals with `with`
expressions (both valid and invalid ones), we can use a single case for
both of them:
(: parsesexpr : Sexpr > WAE)
;; parses sexpressions into WAEs
(define (parsesexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(symbol: name) (Id name)]
[(cons 'with more)
;; go in here for all sexpr that begin with a 'with
(match sexpr
[(list 'with (list (symbol: name) named) body)
(With name (parsesexpr named) (parsesexpr body))]
[else (error 'parsesexpr "bad `with' syntax in ~s" sexpr)])]
[(list '+ lhs rhs) (Add (parsesexpr lhs) (parsesexpr rhs))]
[(list ' lhs rhs) (Sub (parsesexpr lhs) (parsesexpr rhs))]
[(list '* lhs rhs) (Mul (parsesexpr lhs) (parsesexpr rhs))]
[(list '/ lhs rhs) (Div (parsesexpr lhs) (parsesexpr rhs))]
[else (error 'parsesexpr "bad syntax in ~s" sexpr)]))
And now we're done with the syntactic part of the `with` extension.
> Quick note  why would we indent `With` like a normal function in
> code like this
>
> (With 'x
> (Num 2)
> (Add (Id 'x) (Num 4)))
>
> instead of an indentation that looks like a `let`
>
> (With 'x (Num 2)
> (Add (Id 'x) (Num 4)))
>
> ?
>
> The reason for this is that the second indentation looks like a
> binding construct (eg, the indentation used in a `let` expression),
> but `With` is *not* a binding form  it's a *plain function* because
> it's at the Racket level. You should therefore keep in mind the huge
> difference between that `With` and the `with` that appears in WAE
> programs:
>
> {with {x 2}
> {+ x 4}}
>
> Another way to look at it: imagine that we intend for the language to
> be used by Spanish/Chinese/German/French speakers. In this case we
> would translate "`with`":
>
> {con {x 2} {+ x 4}}
> {he {x 2} {+ x 4}}
> {mit {x 2} {+ x 4}}
> {avec {x 2} {+ x 4}}
> {c {x 2} {+ x 4}}
>
> but we will *not* do the same for `With` if we (the language
> implementors) are English speakers.

# Evaluation of `with`
Now, to make this work, we will need to do some substitutions.
We basically want to say that to evaluate:
{with {id WAE1} WAE2}
we need to evaluate `WAE2` with id substituted by `WAE1`. Formally:
eval( {with {id WAE1} WAE2} )
= eval( subst(WAE2,id,WAE1) )
There is a more common syntax for substitution (quick: what do I mean by
this use of "syntax"?):
eval( {with {id WAE1} WAE2} )
= eval( WAE2[WAE1/id] )
> Sidenote: this syntax originates with logicians who used `[x/v]e`,
> and later there was a convention that mimicked the more natural order
> of arguments to a function with `e[x>v]`, and eventually both of
> these got combined into `e[v/x]` which is a little confusing in that
> the lefttoright order of the arguments is not the same as for the
> `subst` function.
Now all we need is an exact definition of substitution.
> Note that substitution is not the same as evaluation, it's only a part
> of the evaluation process. In the previous examples, when we evaluated
> the expression we did substitutions as well as the usual arithmetic
> operations that were already part of the AE evaluator. In this last
> definition there is still a missing evaluation step, see if you can
> find it.
So let us try to define substitution now:
> Substitution (take 1): `e[v/i]` \
> To substitute an identifier `i` in an expression `e` with an
> expression `v`, replace all identifiers in `e` that have the same
> name `i` by the expression `v`.
This seems to work with simple expressions, for example:
{with {x 5} {+ x x}} > {+ 5 5}
{with {x 5} {+ 10 4}} > {+ 10 4}
however, we crash with an invalid syntax if we try:
{with {x 5} {+ x {with {x 3} 10}}}
> {+ 5 {with {5 3} 10}} ???
 we got to an invalid expression.
To fix this, we need to distinguish normal occurrences of identifiers,
and ones that are used as new bindings. We need a few new terms for
this:
1. Binding Instance: a binding instance of an identifier is one that is
used to name it in a new binding. In our `` syntax, binding
instances are only the `` position of the `with` form.
2. Scope: the scope of a binding instance is the region of program text
in which instances of the identifier refer to the value bound in the
binding instance. (Note that this definition actually relies on a
definition of substitution, because that is what is used to specify
how identifiers refer to values.)
3. Bound Instance (or Bound Occurrence): an instance of an identifier is
bound if it is contained within the scope of a binding instance of
its name.
4. Free Instance (or Free Occurrence): An identifier that is not
contained in any binding instance of its name is said to be free.
Using this we can say that the problem with the previous definition of
substitution is that it failed to distinguish between bound instances
(which should be substituted) and binding instances (which should not).
So we try to fix this:
> Substitution (take 2): `e[v/i]` \
> To substitute an identifier `i` in an expression `e` with an
> expression `v`, replace all instances of `i` that are not themselves
> binding instances with the expression `v`.
First of all, check the previous examples:
{with {x 5} {+ x x}} > {+ 5 5}
{with {x 5} {+ 10 4}} > {+ 10 4}
still work, and
{with {x 5} {+ x {with {x 3} 10}}}
> {+ 5 {with {x 3} 10}}
> {+ 5 10}
also works. However, if we try this:
{with {x 5}
{+ x {with {x 3}
x}}}
we get:
> {+ 5 {with {x 3} 5}}
> {+ 5 5}
> 10
but we want that to be `8`: the inner `x` should be bound by the closest
`with` that binds it.
The problem is that the new definition of substitution that we have
respects binding instances, but it fails to deal with their scope. In
the above example, we want the inner `with` to *shadow* the outer
`with`'s binding for `x`.
> Substitution (take 3): `e[v/i]` \
> To substitute an identifier `i` in an expression `e` with an
> expression `v`, replace all instances of `i` that are not themselves
> binding instances, and that are not in any nested scope, with the
> expression `v`.
This avoids bad substitution above, but it is now doing things too
carefully:
{with {x 5} {+ x {with {y 3} x}}}
becomes
> {+ 5 {with {y 3} x}}
> {+ 5 x}
which is an error because `x` is unbound (and there is reasonable no
rule that we can specify to evaluate it).
The problem is that our substitution halts at every new scope, in this
case, it stopped at the new `y` scope, but it shouldn't have because it
uses a different name. In fact, that last definition of substitution
cannot handle any nested scope.
Revise again:
> Substitution (take 4): `e[v/i]` \
> To substitute an identifier `i` in an expression `e` with an
> expression `v`, replace all instances of `i` that are not themselves
> binding instances, and that are not in any nested scope of `i`, with
> the expression `v`.
which, finally, is a good definition. This is just a little too
mechanical. Notice that we actually refer to all instances of `i` that
are not in a scope of a binding instance of `i`, which simply means all
*free occurrences* of `i`  free in `e` (why?  remember the
definition of "free"?):
> Substitution (take 4b): `e[v/i]` \
> To substitute an identifier `i` in an expression `e` with an
> expression `v`, replace all instances of `i` that are free in `e`
> with the expression `v`.
Based on this we can finally write the code for it:
(: subst : WAE Symbol WAE > WAE)
;; substitutes the second argument with the third argument in the
;; first argument, as per the rules of substitution; the resulting
;; expression contains no free instances of the second argument
(define (subst expr from to) ; returns expr[to/from]
(cases expr
[(Num n) expr]
[(Add l r) (Add (subst l from to) (subst r from to))]
[(Sub l r) (Sub (subst l from to) (subst r from to))]
[(Mul l r) (Mul (subst l from to) (subst r from to))]
[(Div l r) (Div (subst l from to) (subst r from to))]
[(Id name) (if (eq? name from) to expr)]
[(With boundid namedexpr boundbody)
(if (eq? boundid from)
expr ;*** don't go in!
(With boundid
namedexpr
(subst boundbody from to)))]))
... and this is just the same as writing a formal "paper version" of the
substitution rule.
We still have bugs: but we'll need some more work to get to them.