Why is Fundies / HtDP / the Design Recipe relevant?
For this class, it is usually recommended to follow the same process you went through in Fundies (which is described in How to Design Programs, the Fundies textbook). There is a very technical reason for this: the main approach HtDP takes for designing your data structure and then your code follows the principle of types consisting of a union of a few variants with different names, each with a few “fields” or “slots” with their own types. (See in particular this section for HtDP’s take on implementing a language.)
This approach (look for “algebraic data types”) is well-suited for many
problems in general, and even more so for representing the main subject of
most of the work in this class: the ASTs which list the kind of
expressions we’re dealing with. It will make our type definitions extremely
close to the BNF they are based on, and in many cases the restriction of
cases
being the only thing you can do with these values, will lead to
code that would almost “write itself”.
However, when it gets to actually do the work that you need to do, there are certain adjustments that you need to do for your work to be effective.
How to approach homework: a highlevel view
The dangerous pothole
You will likely remember that in Fundies, there were no compiler-checked types — all you had was some semi-formal descriptions that were in comments. As you might imagine, the authors of HtDP are well-aware of the huge benefits of having proper type declaration and a static type checker that will verify them for you. After all, the design recipe follows the idea that the types are hugely useful in forming a solid “spine” which will derive how your code is organized at the high level, and later how to materialize it further into working solutions. So how come HtDP sticks to contracts being “just comments”?
As useful as types can be, static type checking has a major disadvantage: as long as you have some problem with your types and the type checker throws some error — as long as that happens, you cannot run any code. Inexperienced programmers can trip on this and fall into a deep hole of trying to “make everything work” and as long as “everything” includes the types, you cannot even try running any of your code: no experiments on the REPL, and no tests! I have seen many students that get very frustrated with all of that, and even worse — by the time you get to sort out your types you’re so exhausted that you can easily just declare victory and not suspect that the actual code that you produced has any kind of bugs.
How to avoid it
⇒ Code first, types later:
One possibility is to get the code right, leaving the types for later. You
can do this by switching the language to #lang pl untyped
(or un
, for
short), which is a language that is identical to #lang pl
, except taht it
turns off the type checker. You proceed to make the code work, and once
you’re happy with it you turn the type checker on (by removing the un
)
and deal with possible type errors. If you did get the code to work for
whatever you needed to do, then the types are likely to be fine. (Note that
even with the type checker off, you will still see errors if a cases
form
does not handle new variants.)
This mode of work is extremely close to how you worked in Fundies: the type
declarations are effectively “commented out”, which means that you’re using
them as guidelines for your work, so you’re losing the benefits of a static
checker. But of course, you completely lose the benefit of writing checked
code, which means that some mistakes will linger on and get caught later
when revising your code is more difficult. For example, consider the common
confusion between a symbol value 'x
, and an (Id 'x)
identifier, or a
number n
vs a wrapped (Num n)
value. You might end up working around a
mistake in the wrong place, adding more work for the stage of re-enabling
the type checker.
⇒ Types first, code later:
A better mode of work is to use the types and the type checker first. This is the very thing that students often try, only to fall into the trap of making all code work first, leading to the above frustration which is seemingly unavoidable. The thing that is easy to miss (even though it’s kind of obvious when you think about it) is writing small code stubs just to make the type checker happy. This is perhaps best demonstrated with a concrete example.
Consider starting with the WAE language, and adding one foo
form to
extend it. You’ll quickly add the form to the BNF and translate that to a
new variant in the AST type definition. At this point, the type checker
barfs at missing bits in the cases
in both subst
and eval
. You’re
obviously start with subst
: add the Foo
case, and obviously write the
code. You’re now fighting with the type checker while not being able to
verify that your code actually makes sense since you cannot run it or test
it as long as you didn’t also add the Foo
bit in eval
.
Instead, the stub approach means that after you extend the type definition,
you add some minimal bits just to make the types work. subst
needs to
return a WAE? — Add a stub case, possibly with a comment:
or if you have multiple extensions, you can add an else
which is a
dangerous thing to do, but remember that this is only a temporary tool to
use while you’re working:
[else (Num 999)] ; TODO: implement Foo, Bar, Baz
Next, you do the same for eval
, for example: [(Foo x y z) 999]
(if
eval
is required to return a number).
Using a number like 999
here is useful in another way: if you run the
code and see 999 in some result, then you can guess that this touched some
of the code that wasn’t implemented yet. A more extreme approach is to use
an error:
which will always work (since it doesn’t return a value).
And now that you have code that passes the type checker, you can add tests
or play with the code as you’re working on it. You might run into type
errors still, but now that’s a local issue of whatever code you’re working
on. If you discover a problem in your types (for example, maybe Foo
needs
a list of things instead of three things), you can fix the type, then
quickly add stubs for the revised type (preserve whatever code you wrote in
a comment), and then proceed to fix the code.
The bottom line in all of this is the main principle of getting as fast as possible to a state where you can try out functions, write tests, etc, instead of working blindly towards “making everything work” where you can only veryfy it at the end.
On type checking in general
All of the above does not come in a vacuum of “how to be productive in this class” — it has some deep roots. For decades, there were silly flamewars between proponents of statically typed languages and those of dynamically typed ones. The arguments were mostly revolving around the issues described here: static type checking makes better and more organized code, while dynamic type checking makes it easier to make things work quickly (and perhaps roughly) — and each of these advantages are missing from the other side.
Some saner PL work came from people on either side admitting the benefits of the other (with the PLT group being a good example of this). There were various directions — most converged on the idea of “gradual typing”: the idea that you can incrementally add types to your code, to get as much of the static benefits as you need.
Typed Racket is pioneered this approach. The idea is that you write your
code in several modules, and you can gradually migrate parts of the code
by migrating modules from #lang racket
to #lang typed/racket
. The
problem with this is that you’re working at a coarse level of whole
modules. Specifically for this class, each homework (= each file that
starts with #lang
) is one module, so you don’t have a smooth path for
migrating code. The main challenge in the development of Typed Racket is
handling a language with a more messy underlying language: a classic
statically typed language like ML (eg, OCaml) enjoys the simplicity of all
types being completely disjoint, whereas Racket has a lot of overlap
between different values, most notably a very rich “numerical tower” with
various types that overlap in different ways.
TypeScript, which started happening several years following Typed Racket, uses a very similar approach with a few tweaks. Unlike (typed) Racket, it has a very strong emphasis on “real-world needs”, with a bunch of implications that make it “less typed” than TR. Some examples:
- Catering for code completion is a major motivation.
- TS doesn’t care about values that come from untyped code whereas TR defaults to checking them.
- There’s a bunch of cases where you can get TS to bless types that are
completely broken…
But the approach that it has for gradual typing can be more convenient
since it works at smaller chunks of code: it uses an
any
type. The TSany
type is somewhat similar to TR’sAny
: a type that stands at the top of the type hierarchy, it’s the most permissive type (any value is anAny
) but has the least information about the value (since we know nothing about it). But TS’sany
goes beyond just a toplevel type: using it actually turns off the typechecker. The idea is that you start with code without any types — TS will assume that everything is anany
, so you’re essentially working without a type checker, and later you add types as needed, with an ultimate goal of having noany
leftovers. (Note that TS also has a similarunknown
type which doesn’t turn off checking, making it the actual equivalent of TR’sAny
.)
In both the TR and TS cases (which were developed in roughly the same time), the main point is that we start with a dynamically-typed language, and instead of trying to change it into a “classic” statically typed language, they use the dynamic nature of the underlying language to get a more flexible (and more expressive) type system, which gets most of the benefits of a static language but in some aspects is even more convenient to use.
There’s a bunch more that can be said on this subject: the parallel
direction some static languages take to get a more dynamic language (eg,
C#'s dynamic
type); other practical benefits of a static type checker
that motivate this (IDE tools like code completion) and more efficient
compilation; the inherent tradeoff in the more limited type inference that
TR and TS can do when encountering polymorphic higher-order functions; the
extra expressivenessyou get with TR/TS; etc.
We will talk about some of these when we get to talk about types and type checking later in the semester.