PL: Lecture #17  Tuesday, March 22nd
(text)

Compilation and Partial Evaluation (contd.)

Getting our thing closer to a compiler is done in a similar way — we push the (lambda (env) ...) inside the various cases. (Note that compile* depends on the env argument, so it also needs to move inside — this is done for all cases that use it, and will eventually go away.) We actually need to use (lambda ([env : ENV]) ...) though, to avoid upsetting the type checker:

(: compile : TOY -> ENV -> VAL)
;; compiles TOY expressions to Racket functions.
(define (compile expr)
  (cases expr
    [(Num n)  (lambda ([env : ENV]) (RktV n))]
    [(Id name) (lambda ([env : ENV]) (lookup name env))]
    [(Bind names exprs bound-body)
    (lambda ([env : ENV])
      (: compile* : TOY -> VAL)
      (define (compile* expr) ((compile expr) env))
      ((compile bound-body)
        (extend names (map compile* exprs) env)))]
    [(Fun names bound-body)
    (lambda ([env : ENV]) (FunV names bound-body env))]
    [(Call fun-expr arg-exprs)
    (lambda ([env : ENV])
      (: compile* : TOY -> VAL)
      (define (compile* expr) ((compile expr) env))
      (let ([fval (compile* fun-expr)]
            [arg-vals (map compile* arg-exprs)])
        (cases fval
          [(PrimV proc) (proc arg-vals)]
          [(FunV names body fun-env)
            ((compile body) (extend names arg-vals fun-env))]
          [else (error 'call ; this is *not* a compilation error
                        "function call with a non-function: ~s"
                        fval)])))]
    [(If cond-expr then-expr else-expr)
    (lambda ([env : ENV])
      (: compile* : TOY -> VAL)
      (define (compile* expr) ((compile expr) env))
      (compile* (if (cases (compile* cond-expr)
                      [(RktV v) v] ; Racket value => use as boolean
                      [else #t])  ; other values are always true
                  then-expr
                  else-expr)))]))

and with this we shifted a bit of actual work to compile time — the code that checks what structure we have, and extracts its different slots. But this is still not good enough — it’s only the first top-level cases that is moved to compile-time — recursive calls to compile are still there in the resulting closures. This can be seen by the fact that we have those calls to compile in the Racket closures that are the results of our compiler, which, as discussed above, mean that it’s not an actual compiler yet.

For example, consider the Bind case:

[(Bind names exprs bound-body)
(lambda ([env : ENV])
  (: compile* : TOY -> VAL)
  (define (compile* expr) ((compile expr) env))
  ((compile bound-body)
    (extend names (map compile* exprs) env)))]

At compile-time we identify and deconstruct the Bind structure, then create a the runtime closure that will access these parts when the code runs. But this closure will itself call compile on bound-body and each of the expressions. Both of these calls can be done at compile time, since they only need the expressions — they don’t depend on the environment. Note that compile* turns to run here, since all it does is run a compiled expression on the current environment.

[(Bind names exprs bound-body)
(let ([compiled-body (compile bound-body)]
      [compiled-exprs (map compile exprs)])
  (lambda ([env : ENV])
    (: run : (ENV -> VAL) -> VAL)
    (define (run compiled-expr) (compiled-expr env))
    (compiled-body (extend names
                            (map run compiled-exprs)
                            env))))]

We can move it back up, out of the resulting functions, by making it a function that consumes an environment and returns a “caller” function:

(define (compile expr)
  ;; convenient helper
  (: caller : ENV -> (ENV -> VAL) -> VAL)
  (define (caller env)
    (lambda (compiled) (compiled env)))
  (cases expr
    ...
    [(Bind names exprs bound-body)
    (let ([compiled-body (compile bound-body)]
          [compiled-exprs (map compile exprs)])
      (lambda ([env : ENV])
        (compiled-body (extend names
                                (map (caller env) compiled-exprs)
                                env))))]
    ...))

Once this is done, we have a bunch of work that can happen at compile time: we pre-scan the main “bind spine” of the code.

We can deal in a similar way with other occurrences of compile calls in compiled code. The two branches that need to be fixed are:

  1. In the If branch, there is not much to do. After we make it pre-compile the cond-expr, we also need to make it pre-compile both the then-expr and the else-expr. This might seem like doing more work (since before changing it only one would get compiled), but since this is compile-time work, then it’s not as important. Also, if expressions are evaluated many times (being part of a loop, for example), so overall we still win.

  2. The Call branch is a little trickier: the problem here is that the expressions that are compiled are coming from the closure that is being applied. The solution for this is obvious: we need to change the closure type so that it closes over compiled expressions instead of over plain ones. This makes sense because closures are run-time values, so they need to close over the compiled expressions since this is what we use as “code” at run-time.

Again, the goal is to have no compile calls that happen at runtime: they should all happen before that. This would allow, for example, to obliterate the compiler once it has done its work, similar to how you don’t need GCC to run a C application. Yet another way to look at this is that we shouldn’t look at the AST at runtime — again, the analogy to GCC is that the AST is a data structure that the compiler uses, and it does not exist at runtime. Any runtime reference to the TOY AST is, therefore, as bad as any runtime reference to compile.

When we’re done with this process we’ll have something that is an actual compiler: translating TOY programs into Racket closures. To see how this is an actual compiler consider the fact that Racket uses a JIT to translate bytecode into machine code when it’s running functions. This means that the compiled version of our TOY programs are, in fact, translated all the way down to machine code.

Yet another way to see this is to change the compiler code so instead of producing a Racket closure it spits out the Racket code that makes up these closures when evaluated. For example, change

[(Bind names exprs bound-body)
(let ([compiled-body (compile bound-body)]
      [compiled-exprs (map compile exprs)])
  (lambda ([env : ENV])
    (compiled-body (extend ...))))]

into

[(Bind names exprs bound-body)
(let ([compiled-body (compile bound-body)]
      [compiled-exprs (map compile exprs)])
  (string-append
    "(lambda ([env : ENV]) ("
    compiled-body
    " (extend ...)))"))]

so we get a string that is a Racket program. But since we’re using a Lisp dialect, it’s generally better to use S-expressions instead:

[(Bind names exprs bound-body)
(let ([compiled-body (compile bound-body)]
      [compiled-exprs (map compile exprs)])
  `(lambda ([env : ENV])
      (,compiled-body (extend ...))))]

(Later in the course we’ll talk about these “`“s and “,“s. For now, it’s enough to know that “`” is kind of like a quote, and “,” is an unquote.)

Lazy Evaluation: Using a Lazy Racket

PLAI §7 (done with Haskell)

For this part, we will use a new language, Lazy Racket.

#lang pl lazy

As the name suggests, this is a version of the normal (untyped) Racket language that is lazy.

First of all, let’s verify that this is indeed a lazy language:

> (define (foo x) 3)
> (foo (+ 1 "2"))
3

That went without a problem — the argument expression was indeed not evaluated. In this language, you can treat all expressions as future promises to evaluate. There are certain points where such promises are actually forced, all of these stem from some need to print a resulting value, in our case, it’s the REPL that prints such values:

> (+ 1 "2")
+: expects type <number> as 2nd argument,
given: "2"; other arguments were: 1

The expression by itself only generates a promise, but when we want to print it, this promise is forced to evaluate — this forces the addition, which forces its arguments (plain values rather than computation promises), and at this stage we get an error. (If we never want to see any results, then the language will never do anything at all.) So a promise is forced either when a value printout is needed, or if it is needed to recursively compute a value to print:

> (* 1 (+ 2 "3"))
+: expects type <number> as 2nd argument,
given: "3"; other arguments were: 2

Note that the error was raised by the internal expression: the outer expression uses *, and + requires actual values not promises.

Another example, which is now obvious, is that we can now define an if function:

> (define (my-if x y z) (if x y z))
> (my-if (< 1 2) 3 (+ 4 "5"))
3

Actually, in this language if, and, and or are all function values instead of special forms:

> (list if and or)
(#<procedure:if> #<procedure:and> #<procedure:or>)
> ((third (list if and or)) #t (+ 1 "two"))
#t

(By now, you should know that these have no value in Racket — using them like this in plain will lead to syntax errors.) There are some primitives that do not force their arguments. Constructors fall in this category, for example cons and list:

> (define (fib n) (if (<= n 1) n (+ (fib (- n 1)) (fib (- n 2)))))
> (define a (list (+ 1 2) (+ 3 "4") (fib 30) (* 5 6)))

Nothing — the definition simply worked, but that’s expected, since nothing is printed. If we try to inspect this value, we can get some of its parts, provided we do not force the bogus one:

> (first a)
3
> (fourth a)
30
> (third a)
196418
> (second a)
+: contract violation, expected: number?, given: "4" ...

The same holds for cons:

> (second (cons 1 (cons 2 (first null))))
2

Now if this is the case, then how about this:

> (define ones (cons 1 ones))

Everything is fine, as expected — but what is the value of ones now? Clearly, it is a list that has 1 as its first element:

> (first ones)
1

But what do we have in the tail of this list? We have ones which we already know is a list that has 1 in its first place — so following Racket’s usual rules, it means that the second element of ones is, again, 1. If we continue this, we can see that ones is, in fact, an infinite list of 1s:

> (second ones)
1
> (fifth ones)
1

In this sense, the way define behaves is that it defines a true equation: if ones is defined as (cons 1 ones), then the real value does satisfy

(equal? ones (cons 1 ones))

which means that the value is the fixpoint of the defined expression.

We can use append in a similar way:

> (define foo (append (list 1 2 3) foo))
> (fourth foo)
1

This looks like it has some common theme with the discussion of implementing recursive environments — it actually demonstrates that in this language, letrec can be used for simple values too. First of all, a side note — here an expression that indicated a bug in our substituting evaluator:

> (let ([x (list y)])
    (let ([y 1])
      x))
reference to undefined identifier: y

When our evaluator returned 1 for this, we noticed that this was a bug: it does not obey the lexical scoping rules. As seen above, Lazy Racket is correctly using lexical scope. Now we can go back to the use of letrec — what do we get by this definition:

> (define twos (let ([xs (cons 2 xs)]) xs))

we get an error about xs being undefined.

xs is unbound because of the usual scope that let uses. How can we make this work? — We simply use letrec:

> (define twos (letrec ([xs (cons 2 xs)]) xs))
> (first twos)
2

As expected, if we try to print an infinite list will cause an infinite loop, which DrRacket catches and prints in that weird way:

> twos
#0=(2 . #0#)

How would we inspect an infinite list? We write a function that returns part of it:

> (define (take n l)
    (if (or (<= n 0) (null? l))
      null
      (cons (first l) (take (sub1 n) (rest l)))))
> (take 10 twos)
(2 2 2 2 2 2 2 2 2 2)
> (define foo (append (list 1 2 3) foo))
> (take 10 foo)
(1 2 3 1 2 3 1 2 3 1)

Dealing with infinite lists can lead to lots of interesting things, for example:

> (define fibs (cons 1 (cons 1 (map + fibs (rest fibs)))))
> (take 10 fibs)
(1 1 2 3 5 8 13 21 34 55)

To see how it works, see what you know about fibs[n] which will be our notation for the nth element of fibs (starting from 1):

fibs[1] = 1  because of the first `cons'
fibs[2] = 1  because of the second `cons'

and for all n>2:

fibs[n] = (map + fibs (rest fibs))[n-2]
        = fibs[n-2] + (rest fibs)[n-2]
        = fibs[n-2] + fibs[n-2+1]
        = fibs[n-2] + fibs[n-1]

so it follows the exact definition of Fibonacci numbers.

Note that the list examples demonstrate that laziness applies to nested values (actually, nested computations) too: a value that is not needed is not computed, even if it is contained in a value that is needed. For example, in:

(define x (/ 1 0))
(if (list (+ 1 x)) 1 2)

the if needs to know only whether its first argument (note: it is an argument, since this if is a function) is #f or not. Once it is determined that it is a pair (a cons cell), there is no need to actually look at the values inside the pair, and therefore (+ 1 x) (and more specifically, x) is never evaluated and we see no error.