Lecture #23, Tuesday, March 25th
================================

- Side Effects in a Lazy Language
- Designing Domain Specific Languages (DSLs)

------------------------------------------------------------------------
# Side Effects in a Lazy Language

We've seen that a lazy language without the call-by-need optimization is
too slow to be practical, but the optimization makes using side-effects
extremely confusing. Specifically, when we deal with side-effects (I/O,
mutation, errors, etc) the order of evaluation matters, but in our
interpreter expressions are getting evaluated as needed. (Remember
tracing the prime-numbers code in Lazy Racket --- numbers are tested as
needed, not in order.) If we can't do these things, the question is
whether there is any point in using a purely functional lazy language at
all --- since computer programs often interact with an imperative world.

There is a solution for this: the lazy language does not have any (sane)
facilities for *doing* things (like `printf` that prints something in
plain Racket), but it can use a data structure that *describes* such
operations. For example, in Lazy Racket we cannot print stuff sanely
using `printf`, but we can construct a string using `format` (which is
just like `printf`, except that it returns the formatted string instead
of printing it). So (assuming Racket syntax for simplicity), instead of:

    (define (foo n)
      (printf "~s + 1 = ~s\n" n (+ n 1)))

we will write:

    (define (foo n)
      (format "~s + 1 = ~s\n" n (+ n 1)))

and get back a string. We can now change the way that our interpreter
deals with the output value that it receives after evaluating a lazy
expression: if it receives a string, then it can take that string as
denoting a request for printout, and simply print it. Such an evaluator
will do the printout when the lazy evaluation is done, and everything
works fine because we don't try to use any side-effects in the lazy
language --- we just describe the desired side-effects, and constructing
such a description does not require *performing* side-effects.

But this only solves printing a single string, and nothing else. If we
want to print two strings, then the only thing we can do is concatenate
the two strings --- but that is not only inefficient, it cannot describe
infinite output (since we will not be able to construct the infinite
string in memory). So we need a better way to chain several printout
representations. One way to do so is to use a list of strings, but to
make things a little easier to manage, we will create a type for I/O
descriptions --- and populate it with one variant holding a string (for
plain printout) and one for holding a chain of two descriptions (which
can be used to construct an arbitrarily long sequence of descriptions):

    (define-type IO
      [Print  String]
      [Begin2 IO IO])

Now we can use this to chain any number of printout representations by
turning them into a single `Begin2` request, which is very similar to
simply using a loop to print the list. For example, the eager printout
code:

    (: print-list : (Listof A) -> Void)
    (define (print-list l)
      (if (null? l)
        (printf "\n")
        (begin (printf "~s " (first l))
               (print-list (rest l)))))

turns to the following code:

    (: print-list : (Listof A) -> IO)
    (define (print-list l)
      (if (null? l)
        (Print "\n")
        (Begin2 (Print (format "~s " (first l)))
                (print-list (rest l)))))

This will basically scan an input list like the eager version, but
instead of printing the list, it will convert it into a single output
request that forms a recipe for this printout. Note that within the lazy
world, the result of `print-list` is just a value, there are no side
effects involved. Turning this value into the actual printout is
something that needs to be done on the eager side, which must be part of
the implementation. In the case of Lazy Racket, we have no access to the
implementation, but we can do so in our Sloth implementation: again,
`run` will inspect the result and either print a given string (if it
gets a `Print` value), or print two things recursively (if it gets a
`Begin2` value). (To implement this, we will add an `IOV` variant to the
`VAL` type definition, and have it contain an `IO` description of the
above type.)

Because the sequence is constructed in the lazy world, it will not
require allocating the whole sequence in memory --- it can be forced
bits by bits (using `strict`) as the imperative back-end (the `run` part
of the implementation) follows the instructions in the resulting IO
description. More concretely, it will also work on an infinite list: the
translation of an infinite-loop printout function will be one that
returns an infinite IO description tree of `Begin2` values. This loop
will also force only what it needs to print and will go on recursively
printing the whole sequence (possibly not terminating). For example
(again, using Racket syntax), the infinite printout loop

    (: print-loop : -> Void)
    (define (print-loop)
      (printf "foo\n")
      (print-loop))

is translated into a function that returns an infinite tree of print
operations:

    (: print-loop : -> IO)
    (define (print-loop)
      (Begin2 (Print "foo\n")
              (print-loop)))

When this tree is converted to actions, it will result in an infinite
loop that produces the same output --- it is essentially the same
infinite loop, only now it's derived by an infinite description rather
than an infinite process.

Finally, how should we deal with inputs? We can add another variant to
our type definition that represents a `read-line` operation, assuming
that like `read-line` it does not require any arguments:

    (define-type IO
      [Print    String]
      [ReadLine ]
      [Begin2   IO IO])

Now the eager implementation can invoke `read-line` when it encounters a
`ReadLine` value --- but what should it do with the resulting string?
Even worse, naively binding a value to `ReadLine`

    (let ([name (ReadLine)])
      (Print (format "Your name is ~a" name)))

doesn't get us the string that is read --- instead, the value is a
*description* of a read operation, which is very different from the
actual string value that we want in the binding.

The solution is to take the "code that acts on the string value" and
make *it* be the argument to `ReadLine`. In the above example, that
would be the `let` expression without the `(ReadLine)` part --- and as
you remember from the time we introduced `fun` into `WAE`, taking away a
named expression from a binding expression leads to a function. With
this in mind, it makes sense to make `ReadLine` take a function value
that represents what to do in the future, once the reading is actually
done.

    (ReadLine (lambda (name)
                (Print (format "Your name is ~a" name))))

This receiver value is a kind of a *continuation* of the computation,
provided as a callback value --- it will get the string that was read on
the terminal, and will return a new description of side-effects that
represents the rest of the process:

    (define-type IO
      [Print    String]
      [ReadLine (String -> IO)]
      [Begin2   IO IO])

Now, when the eager side sees a `ReadLine` value, it will read a line,
and invoke the callback function with the string that it has read. By
doing this, the control goes back to the lazy world to process the value
and get back another IO value to continue the processing. This results
in a process where the lazy code generates some IO descriptions, then
the imperative side will execute it and control goes back to the lazy
code, then back to the imperative side, etc.

As a more verbose example of all of the above, this silly loop:

    (: silly-loop : -> Void)
    (define (silly-loop)
      (printf "What is your name? ")
      (let ([name (read-line)])
        (if (equal? name "quit")
          (printf "bye\n")
          (begin (printf "Your name is ~s\n" name)
                 (silly-loop)))))

is now translated to:

    (: silly-loop : -> IO)
    (define (silly-loop)
      (Begin2 (Print "What is your name? ")
              (ReadLine
               (lambda (name)
                 (if (equal? name "quit")
                   (Print "bye\n")
                   (Begin2 (Print (format "Your name is ~s\n" name))
                           (silly-loop)))))))

Using this strategy to implement side-effects is possible, and you will
do that in the homework --- some technical details are going to be
different but the principle is the same as discussed above. The last
problem is that the above code is difficult to work with --- in the
homework you will see how to use syntactic abstractions to make things
much simpler.


------------------------------------------------------------------------
# Designing Domain Specific Languages (DSLs)

> [PLAI §35]

Programming languages differ in numerous ways:

1. Each uses different notations for writing down programs. As we've
   observed, however, syntax is only partially interesting. (This is,
   however, less true of languages that are trying to mirror the
   notation of a particular domain.)

2. Control constructs: for instance, early languages didn't even support
   recursion, while most modern languages still don't have
   continuations.

3. The kinds of data they support. Indeed, sophisticated languages like
   Racket blur the distinction between control and data by making
   fragments of control into data values (such as first-class functions
   and continuations).

4. The means of organizing programs: do they have functions, modules,
   classes, namespaces, ...?

5. Automation such as memory management, run-time safety checks, and so
   on.

Each of these items suggests natural questions to ask when you design
your own languages in particular domains.

Obviously, there are a lot of domain specific languages these days ---
and that's not new. For example, four of the oldest languages were
conceived as domain specific languages:

 * **Fortran** --- *Formula Translator*
 * **Algol**   --- *Algorithmic Language*
 * **Cobol**   --- *Common Business Oriented Language*
 * **Lisp**    --- *List Processing*

Only in the late 60s / early 70s languages began to get free from their
special purpose domain and become *general purpose* languages (GPLs).
These days, we usually use some GPL for our programs and often come up
with small *domain specific* languages (DSLs) for specific jobs. The
problem is designing such a specific language. There are lots of
decisions to make, and as should be clear now, many ways of shooting
your self in the foot. You need to know:

* What is your domain?

* What are the common notations in this domain (need to be convenient
  both for the machine and for humans)?

* What do you expect to get from your DSL? (eg, performance gains when
  you know that you're dealing with a certain limited kind of
  functionality like arithmetics.)

* Do you have any semantic reason for a new language? (For example,
  using special scoping rules, or a mixture of lazy and eager
  evaluation, maybe a completely different way of evaluation (eg,
  makefiles).)

* Is your language expected to envelope other functionality (eg, shell
  scripts, TCL), perhaps throwing some functionality on a different
  language (makefiles and shell scripts), or is it going to be embedded
  in a bigger application (eg, PHP), or embedded in a way that exposes
  parts of an application to user automation (Emacs Lisp, Word Basic,
  Visual Basic for Office Application or Some Other Long List of
  Buzzwords).

* If you have one language embedded in another enveloping language ---
  how do you handle syntax? How can they communicate (eg, share
  variables)?

And very important:

* Is there a benefit for implementing a DSL over using a GPL --- how
  much will your DSL grow (usually more than you think)? Will it get to
  a point where it will need the power of a full GPL? Do you want to
  risk doing this just to end up admitting that you need a "Real
  Language" and dump your solution for "Visual Basic for Applications"?
  (It might be useful to think ahead about things that you know you
  don't need, rather than things you need.)

To clarify why this can be applicable in more situations than you think,
consider what programming languages are used for. One example that
should not be ignored is using a programming language to implement a
programming language --- for example, what we did so far (or any other
interpreter or compiler). In the same way that some piece of code in a
PL represent functions about the "real world", there are other programs
that represent things in a language --- possibly even the same one. To
make a side-effect-full example, the meaning of `one-brick` might
abstract over laying a brick when making a wall --- it abstracts all the
little details into a function:

    (define (one-brick wall brick-pile)
      (move-eye (location brick-pile))
      (let ([pos (find-available-brick-position brick-pile)])
        (move-hand pos)
        (grab-object))
      (move-eye wall)
      (let ([pos (find-next-brick-position wall)])
        (move-hand pos)
        (drop-object)))

and we can now write

    (one-brick my-wall my-brick-pile)

instead of all of the above. We might use that in a loop:

    (define (build-wall wall pile)
      (define (loop n)
        (when (< n 500)
          (one-brick wall pile)
          (loop (add1 n))))
      (loop 0))

This is a common piece of looping code that we've seen in many forms,
and a common complaint of newcomers to functional languages is the lack
of some kind of a loop. But once you know the template, writing such
loops is easy --- and in fact, you can write code that would take
something like:

    (define (build-wall wall pile)
      (loop-for i from 1 to 500
        (one-brick wall pile)))

and produce the previous code. Note the main point here: we switch from
code that deals with bricks to code that deals with code.

Now, a viable option for implementing a new DSL is to do so by
transforming it into an existing language. Such a process is usually
tedious and error prone --- tedious because you need to deal with the
boring parts of a language (making a parser etc), and error prone
because it's easy to generate bad code (especially when you're dealing
with strings) and you get bad errors in terms of the translated code
instead of the actual code, resorting to debugging the intermediate
generated programs. Lisp languages traditionally have taken this idea
one level further than other languages: instead of writing a new
transformer for your language, you use the host language, but you extend
and customize it by adding you own forms.