2010-04-09 - Types - What is a Type? ======================================================================== >>> Types In our Toy language implementation, there are certain situations that are not covered. For example, {< {+ 1 2} 3} is not a problem, but {+ {< 1 2} 3} will eventually use Scheme's addition function on a boolean value, which will crash our evaluator. Assuming that we go back to the simple language we once had, where there were no booleans, we can still run into errors -- except now these are the errors that our code raises: {+ {fun {} 1} 2} or {1 2 3} or {{fun {x y} {+ x y}} 5} In any case, it would be good to avoid such errors right from the start -- it seems like we should be able to identify such bad code and not even try to run it. One thing that we can do is do a little more work at parse time, and declare the {1 2 3} program fragment as invalid. We can even try to forbid {bind {{x 1}} {x 2 3}} in the same way, but what should we do with this? -- {fun {x} {x 2 3}} The validity of this depends on how it is used. The same goes for some invalid expressions -- the above bogus expression can be fine if it's in a context that shadows `<': {bind {{< *}} {+ {< 1 2} 3}} Finally, consider this: {+ 3 {if 5 {fun {x} x}}} where mystery contains something like `random' or `read'. In general, knowing whether a piece of code will run with no errors is a problem that is equivalent to the halting problem -- and because of this, there is no way to create an "exact" type system: they are all either too restrictive (rejecting programs that would run with no errors) or too permissive (accepting programs that might crash). This is a very practical issue -- type safety means a lot less bugs in the system. A good type system is still an actively researched problem. ======================================================================== >>> What is a Type? A type is any property of a program (or an expression) that can be determined without running the program. (This is different than what is considered a `type' in Scheme which is a property that is known only at run-time, which means that before run-time we know nothing so in essence we have a single type (in the static sense).) Specifically, we want to use types in a way that predicts some aspects of the program's behavior, for example, whether a program will crash. Usually, types are being used as the kind of value that an expression can evaluate to, not the precise value itself. For example, we might have two kinds of values -- functions and numbers, and we know that addition always operates on numbers, therefore {+ 1 {fun {x} x}} is a type error. Note that to determine this we don't care about the actual function, just the fact that it is a function. Important: types can discriminate certain programs as invalid, but they cannot discriminate correct programs from incorrect ones. For example, there is no way for any type system to know that this: {fun {x} {+ x 1}} is an incorrect decrease-by-one function. In general, type systems try to get to the optimal point where as much information as possible is known, yet the language is not too restricted, no significant computing resources are wasted, and programmers don't spend much time annotating their code. Why would you want to use a type system? * They help reduce the time spent on debugging (when they detect legitimate errors, rather than force you to change your code). * Catch errors even in code that you don't execute, for example, when your tests are too weak (but they do *not* substitute proper test suites). * As we have seen, they help in documenting code (but they do *not* substitute proper documentation). * Compilers can use type information to make programs run much faster. * They encourage more organized code (for example, our use of `define-type' and `cases' helps in writing code; these two constructs are inspired by ML). ========================================================================