PL: #lang pl work vs FundiesWork with #lang pl vs Fundies

Why is Fundies / HtDP / the Design Recipe relevant?

For this class, it is usually recommended to follow the same process you went through in Fundies (which is described in How to Design Programs, the Fundies textbook). There is a very technical reason for this: the main approach HtDP takes for designing your data structure and then your code follows the principle of types consisting of a union of a few variants with different names, each with a few “fields” or “slots” with their own types. (See in particular this section for HtDP’s take on implementing a language.)

This approach (look for “algebraic data types”) is well-suited for many problems in general, and even more so for representing the main subject of most of the work in this class: the ASTs which list the kind of expressions we’re dealing with. It will make our type definitions extremely close to the BNF they are based on, and in many cases the restriction of cases being the only thing you can do with these values, will lead to code that would almost “write itself”.

However, when it gets to actually do the work that you need to do, there are certain adjustments that you need to do for your work to be effective.

How to approach homework: a highlevel view

The dangerous pothole

You will likely remember that in Fundies, there were no compiler-checked types — all you had was some semi-formal descriptions that were in comments. As you might imagine, the authors of HtDP are well-aware of the huge benefits of having proper type declaration and a static type checker that will verify them for you. After all, the design recipe follows the idea that the types are hugely useful in forming a solid “spine” which will derive how your code is organized at the high level, and later how to materialize it further into working solutions. So how come HtDP sticks to contracts being “just comments”?

As useful as types can be, static type checking has a major disadvantage: as long as you have some problem with your types and the type checker throws some error — as long as that happens, you cannot run any code. Inexperienced programmers can trip on this and fall into a deep hole of trying to “make everything work” and as long as “everything” includes the types, you cannot even try running any of your code: no experiments on the REPL, and no tests! I have seen many students that get very frustrated with all of that, and even worse — by the time you get to sort out your types you’re so exhausted that you can easily just declare victory and not suspect that the actual code that you produced has any kind of bugs.

How to avoid it

Code first, types later:

One possibility is to get the code right, leaving the types for later. You can do this by switching the language to #lang pl untyped (or un, for short), which is a language that is identical to #lang pl, except taht it turns off the type checker. You proceed to make the code work, and once you’re happy with it you turn the type checker on (by removing the un) and deal with possible type errors. If you did get the code to work for whatever you needed to do, then the types are likely to be fine. (Note that even with the type checker off, you will still see errors if a cases form does not handle new variants.)

This mode of work is extremely close to how you worked in Fundies: the type declarations are effectively “commented out”, which means that you’re using them as guidelines for your work, so you’re losing the benefits of a static checker. But of course, you completely lose the benefit of writing checked code, which means that some mistakes will linger on and get caught later when revising your code is more difficult. For example, consider the common confusion between a symbol value 'x, and an (Id 'x) identifier, or a number n vs a wrapped (Num n) value. You might end up working around a mistake in the wrong place, adding more work for the stage of re-enabling the type checker.

Types first, code later:

A better mode of work is to use the types and the type checker first. This is the very thing that students often try, only to fall into the trap of making all code work first, leading to the above frustration which is seemingly unavoidable. The thing that is easy to miss (even though it’s kind of obvious when you think about it) is writing small code stubs just to make the type checker happy. This is perhaps best demonstrated with a concrete example.

Consider starting with the WAE language, and adding one foo form to extend it. You’ll quickly add the form to the BNF and translate that to a new variant in the AST type definition. At this point, the type checker barfs at missing bits in the cases in both subst and eval. You’re obviously start with subst: add the Foo case, and obviously write the code. You’re now fighting with the type checker while not being able to verify that your code actually makes sense since you cannot run it or test it as long as you didn’t also add the Foo bit in eval.

Instead, the stub approach means that after you extend the type definition, you add some minimal bits just to make the types work. subst needs to return a WAE? — Add a stub case, possibly with a comment:

[(Foo x y z) (Num 999)] ; TODO: implement

or if you have multiple extensions, you can add an else which is a dangerous thing to do, but remember that this is only a temporary tool to use while you’re working:

...
[else (Num 999)] ; TODO: implement Foo, Bar, Baz

Next, you do the same for eval, for example: [(Foo x y z) 999] (if eval is required to return a number).

Using a number like 999 here is useful in another way: if you run the code and see 999 in some result, then you can guess that this touched some of the code that wasn’t implemented yet. A more extreme approach is to use an error:

[(Foo x y z) (error 'subst "Foo unimplemented")]

which will always work (since it doesn’t return a value).

And now that you have code that passes the type checker, you can add tests or play with the code as you’re working on it. You might run into type errors still, but now that’s a local issue of whatever code you’re working on. If you discover a problem in your types (for example, maybe Foo needs a list of things instead of three things), you can fix the type, then quickly add stubs for the revised type (preserve whatever code you wrote in a comment), and then proceed to fix the code.

The bottom line in all of this is the main principle of getting as fast as possible to a state where you can try out functions, write tests, etc, instead of working blindly towards “making everything work” where you can only veryfy it at the end.

On type checking in general

All of the above does not come in a vacuum of “how to be productive in this class” — it has some deep roots. For decades, there were silly flamewars between proponents of statically typed languages and those of dynamically typed ones. The arguments were mostly revolving around the issues described here: static type checking makes better and more organized code, while dynamic type checking makes it easier to make things work quickly (and perhaps roughly) — and each of these advantages are missing from the other side.

Some saner PL work came from people on either side admitting the benefits of the other (with the PLT group being a good example of this). There were various directions — most converged on the idea of “gradual typing”: the idea that you can incrementally add types to your code, to get as much of the static benefits as you need.

Typed Racket is pioneered this approach. The idea is that you write your code in several modules, and you can gradually migrate parts of the code by migrating modules from #lang racket to #lang typed/racket. The problem with this is that you’re working at a coarse level of whole modules. Specifically for this class, each homework (= each file that starts with #lang) is one module, so you don’t have a smooth path for migrating code. The main challenge in the development of Typed Racket is handling a language with a more messy underlying language: a classic statically typed language like ML (eg, OCaml) enjoys the simplicity of all types being completely disjoint, whereas Racket has a lot of overlap between different values, most notably a very rich “numerical tower” with various types that overlap in different ways.

TypeScript, which started happening several years following Typed Racket, uses a very similar approach with a few tweaks. Unlike (typed) Racket, it has a very strong emphasis on “real-world needs”, with a bunch of implications that make it “less typed” than TR. Some examples:

In both the TR and TS cases (which were developed in roughly the same time), the main point is that we start with a dynamically-typed language, and instead of trying to change it into a “classic” statically typed language, they use the dynamic nature of the underlying language to get a more flexible (and more expressive) type system, which gets most of the benefits of a static language but in some aspects is even more convenient to use.

There’s a bunch more that can be said on this subject: the parallel direction some static languages take to get a more dynamic language (eg, C#'s dynamic type); other practical benefits of a static type checker that motivate this (IDE tools like code completion) and more efficient compilation; the inherent tradeoff in the more limited type inference that TR and TS can do when encountering polymorphic higher-order functions; the extra expressivenessyou get with TR/TS; etc.

We will talk about some of these when we get to talk about types and type checking later in the semester.