CS 241e

Back to cs241e

Foundations of Sequential Programs Enriched

Prof. Ondr̂ej Lhoták

Introduction

Bits

Conventions

There are infinitely many integers, but only 232 can be represented on a 32-bit system, so arithmetic is done on the finite ring of equivalence classes mod 232.

Numeric Interpretation

Our Computer

The computer we're using happens to have state equivalent to {0, 1}226 + 32 * 34

+------------------------------------------------+          Memory
|                                                |            32
|                            Registers           |    0 +-----------+
|   +-----------------+          32              |    4 |           |
|   |   Control Unit  |    +---------------+     |    8 |           |
|   |                 |   1| 0 1 0 ...     |     |    12|           |
|   +-----------------+   2| 1             |     | ->   |           |
|   |       ALU       |   .| 0             |     |      |           |
|   |                 |   .|               |34   |      |           |
|   |                 |  31|               |     |    . |           |
|   |                 |  LO|               |     |    . |           |
|   |                 |  HI|               |     |    . |           |
|   +-----------------+  PC+---------------+     |      |           |
|                                                |      +-----------+
+------------------------------------------------+   2^24-4

PC = program counter (the register we're on)

The CPU implements a function step:

step: state -> state
s_(i+1) = step(s_i)
Def: step*(s) = if (step(s) defined) step*(step(s)) else s

input
  | encode
  v       step*
  s_0 -----------> s_n

Want:
For all i, f(i) = decode(step*(encode(i)))

(program, input)           (program, data)      
       |                         ^
       | encode (compiler)       | decode
       |                         |
       v         step*           |
       s_0 -------------------> s_n

Definitions

A stored program computer (von Neumann machine) includes the program as part of its input

Semantics: meaning of a program

Defining step functions

def step(state) = {

  // fetch
  instruction = state.memory[state.register[PC]]

  // increment PC
  state2 = state.setRegister(PC, state.register(PC)+4)

  // decode, execute
  instruction.match {
    // ...
  }
}

MIPS

Reference: https://www.student.cs.uwaterloo.ca/~cs241/mips/mipsref.pdf

Definitions:

Compiler features

Labels

An abstraction of memory addresses

e.g.: absolute value

SLT 2, 1, 0
BEQ 2, 0, 1
SUB 1, 0, 1
JR 31

Same example, using labels:

SLT 2, 1, 0
BEQ 2, 0, label
SUB 1, 0, 1
Define label
JR 31

e.g. a procedure

... ; main
... ; program
LIS 1
USE label
JALR 1
...

DEFINE label
... ; procedure
...
...
JR 31

To compile out labels, we need two passes:

  1. Determine the address of each label
  2. Generate code for all the instructions with labels converted to their corresponding addresses

Relocation, Linking

An object file is a file that contains:

Relocation is the process of adjusting machine language code using object file metatada so that it can be loaded at a different address by:

Linking is the process of combining multiple object files into a machine code program

Typical C build process:

C source              Assembly        Object
files              files           files
a.c -----------> a.s -----------> a.o ---+
   Compiler (cc)    Assembler (as)       |
                                         |
b.c -----------> b.s -----------> b.o ---+--> linker (ld) ------> executable machine language program
                                         |
                                         |
c.c -----------> c.s -----------> c.o ---+

Variables

A variable is an abstraction of a storage location (register, fixed or dynamically determined address in memory) that can hold a value

Read from address ALPHA to $1:

LIS 2
WORD ALPHA ; Saves address ALPHA into register 2
LW 1, 0, 2 ; Loads value from memory address ALPHA into register 1

Variable instances

e.g.

def fact(x: Int): Int = if (x < 2) 1 else x*fact(x-1)
fact(3)
// fact(3) = 3 * fact(2) = 3 * 2 * fact(1) = 3*2*1 = 6
// Three instances of x occur in this execution
               fact(1)
               +--------+
       fact(2) |        |
       +-------+        +-------+
fact(3)|                        |
-------+                        +--------
                 time -->

The extent of a variable instance is the time interval in which it can be accessed

e.g.

The Stack

The extent of local variables begin and end in a last in, first out order. A stack allows us to create and destroy storage locations this way.

Implementation

After entering a procedure with variables a, b, c

                  memory
               +-----------+
               |           |
               +-----------+
               |    42     |
               +-----------+
      SP = 100 |     a     | \
               +-----------+ |
           104 |     b     | | stack frame (all local
               +-----------+ | vars from procedure)
           108 |     c     | /
               +-----------+
           112 |   stack   |
               +-----------+

e.g. read variable c at offset 8 from SP

LW 1, 8, 30 ; 8, 30 = offset, register

Definitions

In this course, all data in memory will be in chunks

Evaluating expressions

e.g. a*b + c*d

Technique 1 (stack)

t1 = a*b
t2 = c*d
t3 = t1+t2
def evaluate(e1, op, e2): Code = {
  evaluate(e1)
  push $3
  evaluate(e2)
  pop $4
  $3 = $4 op $3
}
def evaluateVar(v: Variable): Code = read v (into $3)

Technique 2 (temporary variables)

t1 = a*b
t2 = c*d    // generate code to eval e1 op e2, put result in some variable
t3 = t1+t2  // return code, variable
def evaluate(e1, op, e2): (Code, Variable) = {
  (code1, var1) = evaluate(e1)
  (code2, var2) = evaluate(e2)
  v3 = new Variable
  code = block(code1, code2, var3 = var1 op var2)
  return (code, var3)
}

Register allocation is the process of assigning variables to real registers or memory addresses (abstract)

+------------------+                 +----------------+
|       Code       |                 | IR with real   |
|Intermediate      |---------------->| registers      |
|Representation    |  register       | addresses/     |
|with virtual regs |  allocation     | offsets        |
+------------------+                 +----------------+

Technique 3 (hybrid) (temp vars, but operations on real registers)

t1 = a*b  // $4 = a, $3 = b, $3 = $4 * $3, t1 = $3
t2 = c*d
t3 = t1+t2
// generated code leaves result in $3
def evaluate(e1 op e2):Code = {
  val t1 = new Variable
  block(
    evaluate(e1),
    write(t1, 3),
    evaluate(e2),
    read(4, t1),
    $3 = $4 op $3
  )
}

Register allocation

e.g. a+b+c+d+e

e.g.

t1 = a * b

e.g.

t1 = a * b

Inference graph

Control structures

If statements

if (e1, op, e2, then, else) => {
  withTempVar{t1 =>
    evaluate(e1)
    t1 = $3
    evaluate(e2)
    $4 = t1
    evaluate op
    T
    beq
    define(else)
    E
    define(end)
  }
}

Procedures

A Procedure is an abstraction that encapsulates a reusable sequence of code

e.g.

Caller Procedure (callee)
Define(proc)
Reg.savedParamPtr = Reg.allocated
stack.allocate(frame)
evaluate args in temp vars
Stack.allocate(parameters)
copy arguments from temp vars into parameter chunk
dynamicLink = Reg.fp (29)
savedPC = Reg.savedPC (31)
Reg.fp = Reg.allocated (6)
paramPtr = Reg.savedParamPtr (get it out of a register)
LIS(Reg.targetPC) body
Use(proc)
JALR(Reg.targetPC) Reg.savedPC = savedPC
Reg.fp = dynamicLink
stack.pop //frame
stack.pop //parameters
JR(31)

A prologue/epilogue are the instructions at the beginning or end of a procedure

Conventions:

Eliminating variable accesses (A5)

Nested Procedures

e.g.
As a loop:

def m() = {
  var i = 0
  var j = 0
  while (i < 10) {
    i = i + 1
    j = j + i
  }
  i + j
}

As a procedure:

def m() = {
  var i = 0
  var j = 0
  def loop() = {
    if (i < 10) {
      i = i + 1
      j = j + i
      loop()
    }
  }
  loop()
  i + j
}

e.g.

def f() = {
  val x = 2
  val w = 5
  def g() = {
    val y = 3
    val w = 7
    x + y + h()
  }
  def h() = {
    val z = 4
    z + w
  }
  g()
}
        Frame
      +--------+
   h  |        |      Params
      |   pp   | --> +-------+
      |   dl   |     |       |
      +--------+     |  sl   | ------+
          |          +-------+       |
          v                          |
      +--------+                     |
   g  |        |                     |
      |   pp   | --> +-------+       |
      |   dl   |     |       |       |
      +--------+     |  sl   | ------+
          |          +-------+       |
          v                          |
      +--------+ <-------------------+
   f  |        |
      |   pp   | --> +-------+
      |   dl   |     |       |
      +--------+     |  sl   |
                     +-------+
f()={
  g()={}
  h()={
    k()={}
  }
}

Variable Access (eliminateVarAccessesA6)

let n = depth(current procedure) - depth(proc declaring var)

Follow static link n times, then look for variable

Midterm

Review: Oct 28 in the tutorial

Topics

First-Class Functions

def procedure(x: Int) = { x + 1 }
var increase: (Int)=>Int = procedure
increase = { x => x+2 }
def twice(fun: (Int)=>Int): (Int)=>Int = {
  x => fun(fun(x))
}
increase = twice(increase)
def increaseBy2(increment: Int): (Int)=>Int {
  def procedure(x: Int) = {x + increment}
  procedure
}

Free variables

A free variable in some expression is a variable that is not bound (defined) in that expression.

An expression is closed if it contains no free variables.

The closure closes over its environment

Stack vs Heap

def constructor(a: Int): (Int)=>Int = {
  var b = a*2
  def procedure(c: Int): Int = {
    a+b+c
  }
  procedure //closure creation: will be represented by a `Closure` for us
}
var functionValue: (Int)=>Int = constructor(42)
functionValue(5)

The extent of a and b:

Implementation

After closure creation, store:

To call a closure, pass environment as the static link.

Objects

class Counter {
  private var value = 0
  def get() = value
  def incrementBy(amount: Int) = {
    value = value + amount
  }
}

val c = new Counter
c.incrementBy(42)
c.incrementBy(-5)
c.get()

def newCounter: (()=>Int, (Int)=>Unit) = {
  var value = 0
  def get() = value
  def incrementBy(amount: Int) = {
    value = value + amount
  }
  (get, incrementBy)
}
val c2 = newCounter
c2._2(42)
c2._2(-5)
c2._1()

Tail calls

This will overflow the stack if there's a large number of recursion and we don't make it a tail call:

def m() = {
  var i = 0
  var j = 0
  def loop() = {
    if (i < 10000000) {
      i = i + 1
      j = j + i
      loop()
    }
  }
  loop()
  i + j
}

A call is a tail call if it is the last action before the epilogure

def f() = {
  var v = ???
  def g() = {
    // do something with v
  }
  // epilogue?
  g()
}

Implementation

Before After tail call transformation Notes
evaluate arguments
(call loop) allocate parameters (1) (of callee/target)
evaluate arguments write arguments into parameters
allocate parameters
write arguments into parameters
LIS(8)
Use(label)
JALR(8)
(epilogue) (epilogue)
Reg.savedPC = savedPC Reg.savedPC = savedPC
Reg.fp = dynamicLink Reg.fp = dynamicLink
pop (frame)
pop (parameters)
JR(31)
pop (new parameters (1))
pop (frame)
pop (parameters of caller)
allocate parameters (2) (of callee/target)
copy parameters (1) to (2) This is ok because nothing should have written to this memory space yet
Also, make sure you copy from the bottom up to make sure you don't overwrite anything
LIS(8)
Use(label)
JR(31)

What if the caller and/or callee frame is on the heap instead of the stack?

Formal Languages

(CS 360, 365, 462)

alphabet(Σ): finite set of symbols
e.g. {a, b, c, ..., z}, {0, 1, ..., 9}, {0, 1}, {def, var, val, ..., =, (, ), ...}

word (over Σ): finite sequence of symbols
e.g. hello, 42, 1010101, def proc(), ε (sequence of length 0 - empty word)

language: set of words (possibly infinite)
e.g.

A language class is a collection of languages

Deterministic Finite Automata

                   v──┐
start ┌──┐  a   ╔═══╗ │b   ┌──────┐
   ──>│  ├─────>║   ╟─┘    │      │
      │  │      ║   ╟─────>│error │
      └──┘      ╚═══╝      └──────┘
              accepting/
              final state

e.g.
Σ = {a, b} where all words with an even number of as are in the language

    ┌─────┐  ┌───────────┐
   b│     │  v     a     │
    │   ┌─┴────┐       ╔═╧════╗
 ───┴──>│ odd  │       ║ even ║<──┐
        └────┬─┘       ╚════╤═╝   │
             │     a     ^  │     │b
             └───────────┘  └─────┘

A DFA is a 5-tuple (Σ, Q, q0, A, δ) where:

Symbol Meaning Example
Σ input alphabet Σ = { a, b }
Q a finite set of states Q = { odd, even }
q0 ∈ Q start state q0 = even
δ : Q x × → Q transition function (partial, not defined for all of domain) δ(even, a) = odd
δ(even, b) = even
δ(odd, a) = even
δ(odd, b) = odd

Non-Deterministic Finite Automata

e.g.

DECIMAL NUMBERS
        ┌───┐ 0 ╔══╗
      ┌─┴┐  └──>║  ║
   ──>│  │      ╚══╝
      └─┬┘        v──┐
        └───┐   ╔══╗ │0-9
         1-9└──>║  ╟─┘
                ╚══╝

HEXADECIMAL NUMBERS
                    ┌──┐
                    v  │0-9,A-F
    ┌──┐ 0 ┌──┐ x ╔══╗ │
 ──>│  ├──>│  ├──>║  ╟─┘
    └──┘   └──┘   ╚══╝

ALL NUMBERS
        ┌───┐ 0 ╔══╗
      ┌─┴┐  └──>║  ║
   ──>│  │      ╚══╝
      └┬┬┘        v──┐
      ┌┘└───┐   ╔══╗ │0-9
      │  1-9└──>║  ╟─┘
      │         ╚══╝
     0│             ┌──┐
      └─────v       v  │0-9,A-F
           ┌──┐ x ╔══╗ │
           │  ├──>║  ╟─┘
           └──┘   ╚══╝

Non-determinism is when the algorithm has a choice of multiple paths
How does one choose the right path?

An NFA is a 5-tuple (Σ, Q, q0, A, δ), same as a DFA, except:

A word is accepted by the NFA if δ*(q0, w) ∩ A ≠ {}

ε Transitions

Whenever we are in state A, we may (but don't have to) move to state B. (E = ε in the diagram)

  ┌───┐ E ┌───┐
  │ A ├──>│ B │
  └───┘   └───┘
        ║
        ║
        v
    ┌─────────────┐
  ┌─┤             v
  └─┴───>┌───┐   ╔═══╗
         │ A │   ║ B ║
  ┌─┬───>└───┘   ╚═══╝
  └─┤             ^
    └─────────────┘

Regular Expressions

Regular Expression Meaning
ε L(ε) = {ε}
R1 \ R2 where R1, R2 are REs
R1R2 where R1, R2 are REs L(R1R2) = {xy \
R* where R is a RE L(R*) = {x1, x2, ..., xn \
Example Meaning
(a*)\ (b*)
a*b* some number of as followed by some number of bs
(a\ b)*
a(a*) 1 or more as
(aa)* even number of as
(a\ b)*aa(a\
(b\ ab)*(a\

Kleene's Theorem

For a language L, the following statements are equivalent:

  1. ∃ a DFA specifying L
  2. ∃ an NFA without ε-transitions specifying L
  3. ∃ an NFA with ε-expressions specifying L
  4. ∃ a regular expression specifying L
  5. L is a regular language

Scanning

Scanning output is not always unique

maximal Munch (Greedy) Scanning

A common hack: Bite off the largest piece of w possible at each step

Algorithm:

  1. Run DFA on remaining input until it gets stuck
  2. If in a non-accepting state, backtrack to the last seen accepting state
  3. Output a token corresponding to the current (accepting) state
  4. Set DFA state to start state, and repeat from step 1

In A7:

def scanOne(input: List[Char], state: State, backtrack: (List[Char], State)): (List[Char], State) = ???
// Arguments:
//  input: rest of input to scan
//  state: current state of DFA
// Return value: (rest of input after token, accepting state reached on token)

Context-free Languages

e.g. A DFA for arithmetic, such as a + b * c - d

(this only parses singly nested brackets)

          a-z
      ┌────────v
   ┌──┴┐       ┌───┐
 ─>│   │       │   │
   └┬──┘       └┬──┘
    │ ^─────────┘ ^
   (│   v,-,*,/   │)
    v ┌─────────v │
   ┌──┴┐  a-z  ┌──┴┐
   │   │       │   │
   └───┘       └┬──┘
      ^─────────┘
        v,-,*,/

Regular expressions can't express unlimited recursive structures. We want nesting in programming languages, so we need something more powerful than regular languages.

Context-free Grammars

Regular Expressions use iteration, Context-free Grammars use recursion

e.g.

Parsing of (a+b)*c

                 expr
                  │
           ┌──────┴──────────────┬───────────────┐
           │                     │               │
          expr                  op              expr
           │                     │               │
   ┌───────┼─────────────────┐   *               ID
   │       │                 │                   │
   (      expr               )                   c
           │
     ┌─────┴─────┬────────┐
     │           │        │
    expr         op      expr
     │           │        │
     ID          +        ID
     │                    │
     a                    b

A context-free grammar (CFG) is a 4-tuple (V, Ε P, S) where:

Let:

αAβ directly derives αγβ if A → γ ∈ P

α derives αn (0 or more ⇒ steps) if α1 ⇒ ... ⇒ αn

The language generate/specified by G=(V, Σ, P, S) is L(G) = { w ∈ Σ* \| S ⇒* w }

e.g. 1+2*3 = 7 or 9?

  WRONG

              e
              │
   ┌───┬──────┴──┐
   │   │         │
   │   │         │
   e   op        e
   │   │         │                 
   │   │     ┌───┴──┬─────┐
   ID  +     │      │     │
            expr    op    e
             │      │     │
             │      │     │
             ID     *     ID

  RIGHT

              e
              │
        ┌─────┴────────┬───────┐
        │              │       │
        e              op      e
        │              │       │
   ┌────┼─────┐        │       │
   │    │     │        *       ID
   e    op    e
   │    │     │
   │    │     │
   ID   +     ID

A CFG is ambiguous if it allows multiple parse trees for the same input string

We can design a grammar for desired precedence and associativity

e.g. unambiguous for the examples used:

CYK (Cocke, Younger, Kasami) Parsing

parse(alpha, w) = { // returns a sequence of parse trees if alpha =>* w for symbols in alpha
  if (alpha.isEmpty) {
    if (w.isEmpty) Some(Seq()) else None
  } else if (alpha == a,beta) { // starts with terminal
    if (w == a,z && parse(beta, z)) Some(a +: parse(beta, z).get) // Create a new tree node here
    else None
  } else if (alpha == A) {
    (A -> gamma in P).forEach {
      if (parse(gamma, w)) return Some(Seq(tree A where children are parse(gamma, w).get)) // Create new tree node here
    }
    None
  } else { // alpha = A, beta
    split(w == u,v).forEach { // for each ways of splitting w into u and v
      if (parse(A, u) && parse(beta, v)) return Some(parse(A, u).get ++ parse(beta, v).get)
    }
    None
  }
}
p(expr, ID + ID)
│ 
├─ p(ID, ID + ID)
│    p(epsilon, + ID) X
│
└─p(expr op expr, ID + ID) // u = epsilon (*<=expr), u = ID + ID (*<=op expr)
  ├─ p(expr, epsilon)    <────────────────────────────────────┐
  │  │                                                        │
  │  ├─p(ID, epsilon) X                                       │
  │  │                                                        │
  │  └─p(expr op expr, epsilon) // u = epsilon, u = epsilon   │
  │      p(expr, epsilon) X (memo) ───────────────────────────┘
  │        ...infinite loop
  └┬─p(expr, ID)
   │   p(ID, ID) √
   │
   └─p(op expr, ID)

Remember (memoize) values at (α, w) → Boolean (for validity testing) or Option[Seq[Tree]] (for parse tree)

Using the table (memoization):

Parser comparison:

Category CYK Earley LR(k) LR(1) LL(k) LL(1)
time O(n3) O(n3) for ambiguous, O(n2) for unambiguous, O(n) for almost all LR(1) grammars O(n) O(n)
grammars all all most unambiguous grammars, test if it's LR(1) to see if it's unambiguous (approximate test) very few practical grammars, no left-associativity
difficulty 1 lecture 1.5 weeks 3 weeks 1.5 weeks

Let w = xaz ∉ L be an incorrect input.

To augment a grammar G with start symbol S means:

LR(1) parsing

Algorithm (rough idea)

let beta = input string
while (beta != S) {
  find alpha element of F(beta)
  output s"$alpha => &beta"
  beta = alpha
}

What needs to be added:

Context-Sensitive Analysis

Purpose

For Lacs:

  1. Resolve names/symbols
  2. compute and check types

Uses of types

Lacs types

A type system is a set of rules that compute the type of an expression from the types of its subexpressions.

Lacs type inference rules

  Premises
__________            means if premises then conclusion
Conclusion                                              

|¬ |- e : tau         means in scope |¬, e has type tau

hastype(|¬, e, tao)   means the same



__________
 NUM: Int


|¬(ID) = tau
_____________
|¬ |- ID : tau


Let E in {expras, expra, expr, term, factor}

                |¬ |- E1 : Int
                |¬ |- E2 : Int
_______________________________
      |¬ |- E1 + E2 : Int
               -
               *
               /
               %


               |¬(ID) = tau
              |¬ |- E : tau
____________________________
     |¬ |- ID = E : tau


        |¬ |- E1 : tau1   
        |¬ |- E2 : tau2
________________________
  |¬ |- E1 ; E2 : tau2


                        |¬ |- E1 : Int
                        |¬ |- E2 : Int
                        |¬ |- E3 : tau
                        |¬ |- E4 : tau
______________________________________
 |¬ |- if (E1 == E2) E3 else E4 : tau
              !=
              >=
              <=


 |¬ |- E' : (tau-) => tau'
  forall i . |¬ |- Ei : Ti
___________________________
       |¬ |- E' (E-)


forall i . |¬, vardef-, vardef'-, defdef- |- defdef i wf
             |¬, vardef-, vardef'-, defdef- |- E : tau
_____________________________________________________
    def ID(vardef-): tau = {vardef- defdef- E} wf


forall i . defdef- |- defdef i wf
_________________
empty env |- defdef- wf

Assignment 10


    +----------+                +-------------+                    ++
    |          |  A7 (scanner)  |             |   A8 (parser)     ++++
    | program  +--------------->|   tokens    +--------------->  ++  ++  A9 (parse tree)
    |          |                |             |                 ++    ++     
    +----------+                +-------------+                -+--+---+-
                                                                   |
                                         context-sensitive <-------+
                                            information           A10
                                         scopes and types  --------+
                                                                   |
                                                                   v
                                                        +--------------------+
                                                        |                    |
                                                        |    intermediate    |
                                                        |   representation   |
                                                        |       (code)       |
                                                        |                    |
                                                        +----------+---------+
                                                                   |
                                                                   |
                             +-----------------------+             |  A1-6
                             |                       |             |
                             |   machine language    |<------------+
                             |                       |
                             +-----------------------+

Memory Management

Heap

A heap is a data structure to manage memory that can be allocated/freed at any time

Which blocks are used/free?

  1. add a bit in the header of each block to indicate if it is free or used
  2. linked list of free blocks
1.
 +-+-------------+
 | | size        |
 +-+-------------+
 |^ free/used    |
 |     bit       |
 |               |
 |               |
 |               |
 |               |
 |               |
 +---------------+

2.
freelist
 V
 +---------------+  
 | size          |
 +---------------+
 | next          +--+
 +---------------+  |
 |               |  |
 |               |  |
 | free          |  |
 |               |  |
 +---------------+  |
                    |
                    |
 +---------------+  |
 | size          |  |
 +---------------+  |
 | next          |  |
 +---------------+  |
 |               |  |
 |               |  |
 | used          |  |
 |               |  |
 +---------------+  |
                    |
                    |
 +---------------+  |
 | size          | <+
 +---------------+  
 | next          +-------|
 +---------------+  
 |               |  
 |               |  
 | free          | 
 |               |  
 +---------------+
      +-----------------+
  100 | 8               |
      +-----------------+
  104 | freelist        +--+
      +-----------------+  |
  108 | 36              | <+
      +-----------------+
  112 |                 +--+
      +-----------------+  |
      |                 |  |
      |                 |  |
      |                 |  |
  140 |                 |  |
      +-----------------+  |
  144                 <----+
def size(block) = deref(block)
def next(block) = deref(block+4)
def setSize(block, size) = assignToAddr(block, size)
def setNext(block, next) = assignToAddr(block+4, next)
def init() = {
  val block = heapStart + 8 // 100 + 8
  setSize(heapStart, 8)
  setNext(heapStart, block)
  setSize(block, heapSize - 8) // 44 - 8
  setNext(block, heapStart + heapSize)
}

def malloc(wanted) = {
  def find(previous) = {
    val current = next(previous)
    if (size(current) < wanted + 4) {
      find(current)
    } else {
      if (size(current) > wanted + 12) {
        //split block
        val newBlock = current + wanted + 4
        setSize(newBlock, size(current) - size(wanted + 4))
        setNext(newBlock, next(current))
        setSize(current, wanted + 4)
        setNext(previous, newBlock)
      } else {
        setNext(previous, next(current))
      }
      setNext(previous, next(current))
      current
    }
  }
}

def free(toFree) = {
  def find(previous) = {
    val current = next(previous)
    if (current < toFree) {
      find(current)
    } else {
      if (toFree + size(toFree) == current && current < heapStart + heapSize) {
        // merge with current
        setSize(toFree, size(toFree)+size(current))
        setNext(toFree, next(current))
      } else {
        setNext(toFree, current)
      }
      if (previous + size(previous) == toFree && previous != heapStart) {
        // merge with previous block
        setSize(previous, size(previous) + size(toFree))
        setNext(previous, next(toFree))
      } else {
        setNext(previous, toFree)
      }
    }
  }
}

A heap is fragmentedif the free space is split into small blocks.

Compaction

We need to identify which words in memory are addresses, so we need sound types.

New chunk diagram:

+---------------+
| size          |
+---------------+
| 3 (number of  |
|    pointers)  |
+---------------+
|               |
| pointer vars  |
|               |
+---------------+
|               |
|  other vars   |
|               |
+---------------+

After compaction, allocation is simple/constant time (increment a pointer).

Which blocks should we compact? A block is live or dead if it will or will not be accessed in the future.

Cheney's Copying Garbage Collector, 1970

              +------------+
              |            |
              | code       |
              |            |
              +------------+
    heapStart | used       |
              |............|
              |            | heapPointer
              | from-space |
              |            |
              +------------+
   heamMiddle | to-space   | semiSpaceTop
              |            |
              +------------+
      heapTop |            |
              | stack      |
              |            |
              +------------+
def init = {
  heapPointer = heapStart
  semiSpaceTop = heapMiddle
}
def allocate(wanted) = {
  if (heapPointer + wanted > semispaceTop) {
    gc()
  }
  val ret = heapPointer
  heapPointer = heapPointer + wanted
  ret
}
def gc = {
  // Swap segments of memory
  val newBottom =
    if (semiSpaceTop == heapMiddle) heapMiddle
    else heapStart
  var free = newBottom
  var scan = dynamicLink // top of stack
  while (scan < memSize) {
    forwardPtrs(scan) // will also increment free
    scan = scan + size(scan)
  }
  scan = newBottom
  while (scan < free) {
    forwardPtrs(scan)
    scan = scan + size(scan)
  }
  semispaceTop = newBottom + semiSpaceSize
  heapPointer = free
}

def forwardPtrs(block) = {
  for each offset o in block that holds a pointer {
    assignToAddr(block + o, copy(deref(block + o)))
  }
}

def copy(block) = {
  if (block is not in from-space) {
    block;
  } else {
    if (size(block) >= 0) { // not yet copied
      copyChunk(free, block)
      setNext(block, free) // forwarding address
      setSize(block, -size(block))
      free = free + size(free)
    }
    next(block)
  }
}

Time complexity

Generational GC

The Lambda Calculus

e → v \| λv.e \| e e

(λv.e1) e2 → e1[e2/v]

e1 --> e1'
______________
e1e2 --> e1'e2

e2 --> e2'
______________
e1e2 --> e1e2'

e --> e'
__________________________
lambda v.e --> lambda v.e'
abstract class Expr
case class Var(name: String) extends Expr
case class Lambda(name: String, expr: Expr) extends Expr
case class Apply(function: Expr, argument: Expr) extends Expr

val ID = Lambda("x", Var("x")) // lambda x . x
val IDID = Apply(ID, ID) // (lambda x . x) (lambda x . x)

val step: PartialFunction[Expr, Expr] = {
  case Apply(Lambda(variable, e1), e2) =>
    subst(e1, variable, e2)
}
def subst(expr: Expr, name: String, replacement: Expr): Expr = expr match {
  case Var(name2) =>
    if (name == name2) replacement
    else expr
  case Apply(e1, e2) => Apply(
    subst(e1, name, replacement),
    subst(e2, name, replacement)
  )
  case Lambda(name2, expr2) =>
}