Consider web programming as a demonstration of a frequent problem. The HTTP protocol is stateless: each HTTP query can be thought of as running a program (or a function), getting a result, then killing it. This makes interactive applications hard to write.
For example, consider this behavior (which is based on a real story of a probably not-so-real bug known as “the ITA bug”):
You go on a flight reservation website, and look at flights to Paris or London for a vacation.
You get a list of options, and choose one for Paris and one for London, ctrl-click the first and then the second to open them in new tabs.
You look at the descriptions and decide that you like the first one best, so you click the button to buy the ticket.
A month later you go on your plane, and when you land you realize that you’re in the wrong country — the ticket you paid for was the second one after all…
Obviously there is some fundamental problem here — especially given that this problem plagued many websites early on (and these days these kind of problems can still be found in some places (like the registrar’s system), except that people are much more aware of it, and are much more prepared to deal with it). In an attempt to clarify what it is exactly that went wrong, we might require that each interaction will result in something that is deterministically based on what the browser window shows when the interaction is made — but even that is not always true. Consider the same scenario except with a bookstore and an “add to my cart” button. In this case you want to be able to add one item to the cart in the first window, then switch to the second window and click “add” there too: you want to end up with a cart that has both items.
The basic problem here is HTTP’s statelessness, something that both web servers and web browsers use extensively. Browsers give you navigation buttons and sometimes will not even communicate with the web server when you use them (instead, they’ll show you cached pages), they give you the ability to open multiple windows or tabs from the current one, and they allow you to “clone” the current tab. If you view each set of HTTP queries as a session — this means that web browsers allow you to go back and forth in time, explore multiple futures in parallel, and clone your current world.
These are features that the HTTP protocol intentionally allows by being stateless, and that people have learned to use effectively. A stateful protocol (like ssh, or ftp) will run in a single process (or a program, or a function) that is interacting with you directly, and this process dies only when you disconnect. A big advantage of stateful protocols is their ability to be very interactive and rely on state (eg, an editor updates a character on the screen, relying on the rest of it showing the same text); but stateless protocols can scale up better, and deal with a more hectic kind of interaction (eg, open a page on an online store, keep it open and buy the item a month later; or any of the above “time manipulation” devices).
Side-note: Some people think that Ajax is the answer to all of these problems. In reality, Ajax is layered on top of (asynchronous) web queries, so in fact it is the exact same situation. You do have an option of creating an application that works completely on the client side, but that wouldn’t be as attractive — and even if you do so, you’re still working inside a browser that can play the same time tricks.
Obviously, writing programs to run on a web server is a profitable activity, and therefore highly desirable. But when we do so, we need to somehow cope with the web’s statelessness. To see the implications from a PL point of view we’ll use a small “toy” example that demonstrates the basic issues — an “addition” service:
[Such a small application is not realistic, of course: you can obviously ask for both numbers on the same page. We still use it, though, to minimize the general interaction problem to a more comprehensible core problem.]
Starting from just that, consider how you’d want to write the code for such a service. (If you have experience writing web apps, then try to forget all of that now, and focus on just this basic problem.) The plain version of what we want to implement is:
which is trivially “translated” to:
But this is never going to work. The interaction is limited to
presenting the user with some data and that’s all — you cannot do any
kind of interactive querying. For the purpose of making this more
concrete, imagine that web-read
and web-display
both communicate
information to the user via something like error
: the information is
sent and at the same time the whole computation is aborted. With this,
the above code will just manage to ask for the first number and nothing
else happens.
We therefore must turn this server function into three separate functions: one that shows the prompt for the first number, one that gets the value entered and shows the second prompt, and a third that shows the results page. The first two of these functions would send the information (and the respective computation dies) to the browser, including a form submission URL that will invoke the next function.
Assuming a generic “query argument” that represents the browser request, and a return value that represents a page for the browser to render, we have:
Note that f2
receives the first number directly, but f3
doesn’t.
Yet, it is obviously needed to show the sum. A typical hack to get
around this is to use a “hidden field” in the HTML form that f2
generates, where that field holds the second result. To make things more
concrete, we’ll use some imaginary web API functions:
Which would (supposedly) result in something like the following html forms when the user enters 1 and 2:
This is often a bad solution: it gets very difficult to manage with real services where the “state” of the server consists of much more than just a single number — and it might even include values that are not expressible as part of the form (for example an open database connection or a running process). Worse, the state is all saved in the client browser — if it dies, then the interaction is gone. (Imagine doing your taxes, and praying that the browser won’t crash a third time.)
Another common approach is to store the state information on the server, and use a small handle (eg, in a cookie) to identify the state, then each function can use the cookie to retrieve the current state of the service — but this is exactly how we get to the above bugs. It will fail with any of the mentioned time-manipulation features.
To try and get a better solution, we’ll re-start with the original expression:
and assuming that web-read
works as a regular function, we need to
begin with executing the first read:
We then need to take that result and plug it into an expression that
will read the second number and sum the results — that’s the same as
the first expression, except that instead of the first web-read
we use
a “hole”:
where <*>
marks the point where we need to plug the result of the
first question into. A better way to explain this hole is to make the
expression into a function:
We can split the second and third interactions in the same way. First we can assemble the above two bits of code into an expression that has the same meaning as the original one:
And now we can continue doing this and split the body of the consumer:
into a “reader” and the rest of the computation (using a new hole):
Doing all of this gives us:
And now we can proceed to the main trick. Conceptually, we’d like to
think about web-read
as something that is implemented in a simple way:
except that the “real” thing would throw an error and die once the
prompt is printed. The trick is one that we’ve already seen: we can turn
the code inside-out by making the above “hole functions” be an argument
to the reading function — a consumer callback for what needs to be
done once the number is read. This callback is called a continuation,
and we’ll use a /k
suffix for names of functions that expect a
continuation (k
is a common name for a continuation argument):
This is not too different from the previous version — the only
difference is that we make the function take a consumer function as an
input, and hand it what we read instead of just returning it. It makes
things a little easier, since we pass the hole function to web-read/k
,
and it will invoke it when needed:
You might notice that this looks too complicated; we could get exactly the same result with:
but then there’s not much point to having web-read/k
at all… So why
have it? Remember that the main problem is that in the context of a web
server we think of web-read
as something that throws an error and
kills the computation. So if we use such a web-read/k
with a
continuation, we can make it save this continuation in some global
state, so it can be used later when there is a value.
As a side note, all of this might start looking very familiar to you if
you have any experience working with callback-heavy code. In fact,
consider the fact that the continuation (or k
) is basically just a
callback, so the above is roughly:
if you follow the JS convention of having a plain name be a
callback-able function (vs the fooSync
variants).
We can now actually try all of this in plain Racket by simulating web
interactions. This is useful to look at the core problem while avoiding
the whole web mess that is uninteresting for the purpose of our
discussion. The main feature that we need to emulate is statelessness
— and as we’ve discussed, we can simulate that using error
to
guarantee that the process is properly killed for each interaction. We
will do this in web-display
which simulates sending the results to the
client and therefore terminates the server process:
More importantly, we need to do it in web-read/k
— but in this case,
we need more than just an error
— we need a way to store the k
so
the computation can be resumed later. To continue with the web analogy
we do this in two steps: error
is used to display the information (the
input prompt), and the user action of entering a number and submitting
it will be simulated by calling a function. Since the computation is
killed after we show the prompt, the way to implement this is by making
the user call a toplevel submit
function — and before throwing the
interaction error, we’ll save the k
continuation in a global box:
submit
uses the saved continuation:
For safety, we’ll initialize resumer
with a function that throws an
error (a real one, not intended for interactions), make web-display
reset it to the same function, and also make submit
do so after
grabbing its value — meaning that submit
can only be used after a
web-read/k
. And for convenience, we’ll use raise-user-error
instead
of error
, which is a Racket function that throws an error without a
stack trace (since our errors are intended). It’s also helpful to
disable debugging in DrRacket, so it won’t take us back to the code over
and over again.
We can now try out our code for the addition server, using plain
argument names instead of <*>
s:
and see how everything works. You can also try now the bogus expression that we mentioned:
and see how it breaks: the first web-read/k
saves the identity
function as the global resumer, losing the rest of the computation.
Again, this should be familiar: we’ve taken a simple compound expression and “linearized” it as a sequence of an input operation and a continuation receiver for its result. This is essentially the same thing that we used for dealing with inputs in the lazy language — and the similarity is not a coincidence. The problem that we faced there was very different (representing IO as values that describe it), but it originates from a similar situation — some computation goes on (in whatever way the lazy language decides to evaluate it), and when we have a need to read something we must return a description of this read that contains “the rest of the computation” to the eager part of the interpreter that executes the IO. Once we get the user input, we send it to this computation remainder, which can return another read request, and so on.
Based on this intuition, we can guess that this can work for any piece of code, and that we can even come up with a nicer “flat” syntax for it. For example, here is a simple macro that flattens a sequence of reads and a final display:
and using it:
However, we’ll avoid such cuteness to make the transformation more explicit for the sake of the discussion. Eventually, we’ll see how things can become even better than that (as done in Racket): we can get to write plain-looking Racket expressions and avoid even the need for an imperative form for the code. In fact, it’s easy to write this addition server using Racket’s web server framework, and the core of the code looks very simple:
There is not much more than that — it has two utilities, page
creates a well-formed web page, and web-read
performs the reading. The
main piece of magic there is in send/suspend
which makes the web
server capture the computation’s continuation and store it in a hash
table, to be retrieved when the user visits the given URL. Here’s the
full code: