PL: Lecture #16  Tuesday, March 8th
(text)

Compilation and Partial Evaluation

Instead of interpreting an expression, which is performing a full evaluation, we can think about compiling it: translating it to a different language which we can later run more easily, more efficiently, on more platforms, etc. Another feature that is usually associated with compilation is that a lot more work was done at the compilation stage, making the actual running of the code faster.

For example, translating an AST into one that has de-Bruijn indexes instead of identifier names is a form of compilation — not only is it translating one language into another, it does the work involved in name lookup before the program starts running.

This is something that we can experiment with now. An easy way to achieve this is to start with our evaluation function:

(: eval : TOY ENV -> VAL)
;; evaluates TOY expressions
(define (eval expr env)
  ;; convenient helper
  (: eval* : TOY -> VAL)
  (define (eval* expr) (eval expr env))
  (cases expr
    [(Num n)  (RktV n)]
    [(Id name) (lookup name env)]
    [(Bind names exprs bound-body)
    (eval bound-body (extend names (map eval* exprs) env))]
    [(Fun names bound-body)
    (FunV names bound-body env)]
    [(Call fun-expr arg-exprs)
    (let ([fval (eval* fun-expr)]
          [arg-vals (map eval* arg-exprs)])
      (cases fval
        [(PrimV proc) (proc arg-vals)]
        [(FunV names body fun-env)
          (eval body (extend names arg-vals fun-env))]
        [else (error 'eval "function call with a non-function: ~s"
                      fval)]))]
    [(If cond-expr then-expr else-expr)
    (eval* (if (cases (eval* cond-expr)
                  [(RktV v) v] ; Racket value => use as boolean
                  [else #t])  ; other values are always true
              then-expr
              else-expr))]))

and change it so it compiles a given expression to a Racket function. (This is, of course, just to demonstrate a conceptual point, it is only the tip of what compilers actually do…) This means that we need to turn it into a function that receives a TOY expression and compiles it. In other words, eval no longer consumes and environment argument which makes sense because the environment is a place to hold run-time values, so it is a data structure that is not part of the compiler (it is usually represented as the call stack).

So we split the two arguments into a compile-time and run-time, which can be done by simply currying the eval function — here this is done, and all calls to eval are also curried:

(: eval : TOY -> ENV -> VAL) ;*** note the curried type
;; evaluates TOY expressions
(define (eval expr)
  (lambda (env)
    ;; convenient helper
    (: eval* : TOY -> VAL)
    (define (eval* expr) ((eval expr) env))
    (cases expr
      [(Num n)  (RktV n)]
      [(Id name) (lookup name env)]
      [(Bind names exprs bound-body)
      ((eval bound-body) (extend names (map eval* exprs) env))]
      [(Fun names bound-body)
      (FunV names bound-body env)]
      [(Call fun-expr arg-exprs)
      (let ([fval (eval* fun-expr)]
            [arg-vals (map eval* arg-exprs)])
        (cases fval
          [(PrimV proc) (proc arg-vals)]
          [(FunV names body fun-env)
            ((eval body) (extend names arg-vals fun-env))]
          [else (error 'eval
                        "function call with a non-function: ~s"
                        fval)]))]
      [(If cond-expr then-expr else-expr)
      (eval* (if (cases (eval* cond-expr)
                    [(RktV v) v] ; Racket value => use as boolean
                    [else #t])  ; other values are always true
                then-expr
                else-expr))])))

We also need to change the eval call in the main run function:

(: run : String -> Any)
;; evaluate a TOY program contained in a string
(define (run str)
  (let ([result ((eval (parse str)) global-environment)])
    (cases result
      [(RktV v) v]
      [else (error 'run "evaluation returned a bad value: ~s"
                  result)])))

Not much has changed so far.

Note that in the general case of a compiler we need to run a program several times, so we’d want to avoid parsing it over and over again. We can do that by keeping a single parsed AST of the input. Now we went one step further by making it possible to do more work ahead and keep the result of the first “stage” of eval around (except that “more work” is really not saying much at the moment):

(: run : String -> Any)
;; evaluate a TOY program contained in a string
(define (run str)
  (let* ([compiled (eval (parse str))]
        [result  (compiled global-environment)])
    (cases result
      [(RktV v) v]
      [else (error 'run "evaluation returned a bad value: ~s"
                  result)])))

At this point, even though our “compiler” is not much more than a slightly different representation of the same functionality, we rename eval to compile which is a more appropriate description of what we intend it to do (so we change the purpose statement too):

(: compile : TOY -> ENV -> VAL)
;; compiles TOY expressions to Racket functions.
(define (compile expr)
  (lambda (env)
    (: compile* : TOY -> VAL)
    (define (compile* expr) ((compile expr) env))
    (cases expr
      [(Num n)  (RktV n)]
      [(Id name) (lookup name env)]
      [(Bind names exprs bound-body)
      ((compile bound-body)
        (extend names (map compile* exprs) env))]
      [(Fun names bound-body)
      (FunV names bound-body env)]
      [(Call fun-expr arg-exprs)
      (let ([fval (compile* fun-expr)]
            [arg-vals (map compile* arg-exprs)])
        (cases fval
          [(PrimV proc) (proc arg-vals)]
          [(FunV names body fun-env)
            ((compile body) (extend names arg-vals fun-env))]
          [else (error 'call ; this is *not* a compilation error
                        "function call with a non-function: ~s"
                        fval)]))]
      [(If cond-expr then-expr else-expr)
      (compile* (if (cases (compile* cond-expr)
                      [(RktV v) v] ; Racket value => use as boolean
                      [else #t])  ; other values are always true
                  then-expr
                  else-expr))])))

(: run : String -> Any)
;; evaluate a TOY program contained in a string
(define (run str)
  (let* ([compiled (compile (parse str))]
        [result  (compiled global-environment)])
    (cases result
      [(RktV v) v]
      [else (error 'run "evaluation returned a bad value: ~s"
                  result)])))

Not much changed, still. We curried the eval function and renamed it to compile. But when we actually call compile almost nothing happens — all it does is create a Racket closure which will do the rest of the work. (And this closure closes over the given expression.)

Running this “compiled” code is going to be very much like the previous usage of eval, except a little slower, because now every recursive call involves calling compile to generate a closure, which is then immediately used — so we just added some allocations at the recursive call points! (Actually, the extra cost is minimal because the Racket compiler will optimize away such immediate closure applications.)

Another way to see how this is not really a compiler yet is to consider when compile gets called. A proper compiler is something that does all of its work before running the code, which means that once it spits out the compiled code it shouldn’t be used again (except for compiling other code, of course). Our current code is not really a compiler since it breaks this feature. (For example, if GCC behaved this way, then it would “compile” files by producing code that invokes GCC to compile the next step, which, when run, invokes GCC again, etc.)

However, the conceptual change is substantial — we now have a function that does its work in two stages — the first part gets an expression and can do some compile-time work, and the second part does the run-time work, and includes anything inside the (lambda (env) …). The thing is that so far, the code does nothing at the compilation stage (remember: only creates a closure). But because we have two stages, we can now shift work from the second stage (the run-time) to the first (the compile-time).

For example, consider the following simple example:

#lang pl

(: foo : Number Number -> Number)
(define (foo x y)
  (* x y))

(: bar : Number -> Number)
(define (bar c)
  (: loop : Number Number -> Number)
  (define (loop n acc)
    (if (< 0 n)
        (loop (- n 1) (+ (foo c n) acc))
        acc))
  (loop 40000000 0))

(time (bar 0))

We can do the same thing here — separate foo it into two stages using currying, and modify bar appropriately:

#lang pl

(: foo : Number -> Number -> Number)
(define (foo x)
  (lambda (y)
    (* x y)))

(: bar : Number -> Number)
(define (bar c)
  (: loop : Number Number -> Number)
  (define (loop n acc)
    (if (< 0 n)
        (loop (- n 1) (+ ((foo c) n) acc))
        acc))
  (loop 40000000 0))

(time (bar 0))

Now instead of a simple multiplication, lets expand it a little, for example, do a case split on common cases where x is 0, 1, or 2:

(: foo : Number -> Number -> Number)
(define (foo x)
  (lambda (y)
    (cond [(= x 0) 0]
          [(= x 1) y]
          [(= x 2) (+ y y)] ; assume that this is faster
          [else (* x y)])))

This is not much faster, since Racket already optimizes multiplication in a similar way.

Now comes the real magic: deciding what branch of the cond to take depends only on x, so we can push the lambda inside:

(: foo : Number -> Number -> Number)
(define (foo x)
  (cond [(= x 0) (lambda (y) 0)]
        [(= x 1) (lambda (y) y)]
        [(= x 2) (lambda (y) (+ y y))]
        [else (lambda (y) (* x y))]))

We just made an improvement — the comparisons for the common cases are now done as soon as (foo x) is called, they’re not delayed to when the resulting function is used. Now go back to the way this is used in bar and make it call foo once for the given c:

#lang pl

(: foo : Number -> Number -> Number)
(define (foo x)
  (cond [(= x 0) (lambda (y) 0)]
        [(= x 1) (lambda (y) y)]
        [(= x 2) (lambda (y) (+ y y))]
        [else (lambda (y) (* x y))]))

(: bar : Number -> Number)
(define (bar c)
  (define foo-c (foo c))
  (: loop : Number Number -> Number)
  (define (loop n acc)
    (if (< 0 n)
        (loop (- n 1) (+ (foo-c n) acc))
        acc))
  (loop 40000000 0))

(time (bar 0))

Now foo-c is generated once, and if c happens to be one of the three common cases (as in the last expression), we can avoid doing any multiplication. (And if we hit the default case, then we’re doing the same thing we did before.)

[However, the result runs a little slower! The reason is that dealing with functions can have a higher cost when the compiler cannot “simplify closures away” — and this is what happens in the last version. The additional overhead is much higher than the multiplication we save (the Racket compiler inlines multiplications, so their cost is close to just executing a single machine-code instruction).]

Here is another useful example that demonstrates this:

(define (foo list)
  (map (lambda (n) (if ...something... E1 E2))
      list))

-->

(define (foo list)
  (map (if ...something...
        (lambda (n) E1)
        (lambda (n) E2))
      list))

(Question: when can you do that?)

This is not unique to Racket, it can happen in any language. Racket (or any language with first class function values) only makes it easy to create a local function that is specialized for the flag.