Implementing A Graph Coloring For Register Allocation With Chaitin Briggs And Iterated Register Coalescing

2024-11-15 · Leonardo Benicio

Summary

A comprehensive technical exploration of implementing a graph coloring for register allocation with chaitin briggs and iterated register coalescing, covering key concepts, practical implementations, and real-world applications.

Contents

Here is the expanded version of the blog post, structured to meet your requirements for depth, technical accuracy, and length. It builds directly on your excellent introduction and provides a comprehensive, tutorial-like walkthrough of these classic algorithms.

Title: The Great Squeeze: Implementing Graph Coloring for Register Allocation with Chaitin-Briggs and Iterated Register Coalescing

Introduction (Expanded)

Imagine, for a moment, the most brilliant piece of software ever written. A sprawling, elegant symphony of algorithms, data structures, and logic, designed to push the boundaries of human computation. Now, imagine trying to run it on a single 8-bit micro-controller with 256 bytes of RAM. It wouldn’t just be slow; it would be impossible. The program, for all its intellectual beauty, is a sculpture made of marble that cannot fit through a standard door.

This is the fundamental tension at the heart of every compiler. A developer writes code that assumes an almost infinite, idealized machine—a “virtual” machine where variables are abundant and instantly accessible. In the high-level language, a function may declare dozens of local variables, temporary values, and parameters, all of which feel equally fast and unlimited. The hardware, however, is a miserly, finite reality. A modern x86-64 CPU offers only 16 general-purpose registers. An ARM Cortex-M0 microcontroller might have just 8. These are the only locations where the CPU can perform arithmetic directly. Everything else must reside in memory—cache, RAM, or even disk—and moving data between memory and registers (a process called a load or store) is orders of magnitude slower than a register-to-register operation.

The compiler’s job, in essence, is to perform a high-stakes act of logistical magic. It must take the programmer’s infinite-seeming web of variables and squeeze them into the procrustean bed of the target architecture’s limited register file. Fail at this, and the program spends its life waiting for data to be fetched from memory—a slow death by a thousand (or a million) clock cycles. This is not just a concern for embedded systems. Even in a desktop application, poor register allocation can cause a cascade of performance problems: cache misses, increased power consumption, and a saturated memory bus. A program that is register-allocated well can run 10x or more faster than one that is not, simply because the CPU is never idle waiting for operands.

This process is Register Allocation, and it is arguably the single most important optimization a compiler performs. It is the critical bridge between the architecturally unfettered world of high-level language semantics and the brutally constrained, performance-critical world of machine code. For decades, the best solution to this NP-hard problem has been elegantly borrowed from a seemingly unrelated field: cartography. The technique is Graph Coloring.

The connection is profound. In map-making, you want to color adjacent countries with different colors to make boundaries clear. The question is: what is the smallest number of colors needed so that no two neighbors share the same hue? In register allocation, we want to assign different registers to variables that are “alive” at the same time—so that they don’t interfere with each other. The question is: what is the smallest number of registers needed so that no two interfering variables share the same register? The mathematical structure is identical.

This blog post will demystify this elegant transfer of ideas. We will start from the concrete problem of register allocation, build the interference graph, and then walk through the classic Chaitin-Briggs algorithm. We will see why pure graph coloring fails in practice, and then dissect the state-of-the-art solution: Iterated Register Coalescing, a framework that combines coloring with copy propagation and spill heuristics. By the end, you will understand not just the theory, but the practical code that makes your compiler smart enough to squeeze an infinite program into a finite machine.

1. The Problem: When Variables Collide

Before we can solve the puzzle, we must define its pieces. A compiler’s intermediate representation (IR) is a sea of virtual registers—temporary names that the compiler treats as if there were an infinite supply. The actual CPU has a small, fixed set of physical registers (e.g., %rax, %rbx on x86-64). The goal of register allocation is to map every virtual register to either a physical register or to a location in memory (the stack). Mapping to memory is called spilling, and it is a last resort because it introduces expensive loads and stores.

The core constraint is simple: two virtual registers cannot be assigned to the same physical register if their live ranges overlap.

What is a live range? A variable v is live at a point in the program if its current value will be read in the future. More formally, live range analysis tracks the set of instructions between a definition (a write to v) and its last use (a read of v). If two registers are both live at the same program point, they interfere.

Example 1: Simple Interference

Consider this pseudo-code:

// Virtual registers: %v1, %v2, %v3
1. %v1 = 5
2. %v2 = %v1 + 10  // %v1 is live here (read)
3. %v3 = %v2 * 2    // %v2 is live here (read)
4. return %v3       // %v3 is live here (read)

Let’s trace liveness:

After line 1, %v1 is live.
At line 2, we read %v1. After line 2, %v1 is no longer live (its last use was here). Now %v2 is live.
At line 3, we read %v2. After line 3, %v2 is dead. Now %v3 is live.
At line 4, we read %v3. After line 4, %v3 is dead.

Notice that the live ranges of %v1, %v2, and %v3 do not overlap. Therefore, we could assign all three to the same physical register (say %rax). The code would still work correctly because each value is written before the previous one is read. This is a trivial case.

Example 2: True Interference

Now consider:

1. %v1 = 5
2. %v2 = 10
3. %v3 = %v1 + %v2   // Both %v1 and %v2 are live here
4. return %v3

Here, at instruction 3, both %v1 and %v2 are live (they are being read). Therefore, they must be assigned to different physical registers. If we only had one register, we would need to spill one of them to memory, load it back, do the addition, and store the result. This is where the “squeeze” begins.

The art of register allocation is to find a mapping that respects all these interference constraints, using the minimum number of physical registers. This is exactly the graph coloring problem.

2. The Model: Building the Interference Graph

The first step in any coloring-based allocator is to translate the program into a data structure that captures the interference constraints. This is the Interference Graph.

Nodes: Each node represents a virtual register or a pre-colored register (a specific physical register required by the instruction set, like the stack pointer).
Edges: An undirected edge exists between two nodes u and v if their live ranges overlap—meaning they cannot share a physical register.

How to build it efficiently? The standard algorithm uses live ranges computed via a data-d8e18db4 analysis (liveness analysis) on the control flow graph (CFG). For each basic block, we compute liveOut (the set of registers live on exit). Then we walk the block backwards. At each instruction i, we define the set of registers that are simultaneously live. For every pair of registers in that set, we add an interference edge.

This is computationally expensive if done naively (O($V^2$) per instruction). In practice, compilers use bit matrices or efficient adjacency structures.

Example 3: Building the Graph

Let’s build the graph for a small C function:

int example(int a, int b) {
    int c = a + b;
    int d = c * 2;
    int e = a - b;
    return d + e;
}

After converting to three-address code (with virtual registers):

// Assume %v1 = a, %v2 = b (arguments)
1. %v3 = %v1 + %v2   // c
2. %v4 = %v3 * 2     // d
3. %v5 = %v1 - %v2   // e
4. %v6 = %v4 + %v5   // result
5. return %v6

Now perform liveness analysis:

At instruction 4: %v4, %v5 are live. Edge between %v4 and %v5.
At instruction 3: %v1, %v2, %v4 are live (because %v4 was just defined and will be used at 4, %v1 and %v2 are used here). So edges: %v1-%v2, %v1-%v4, %v2-%v4.
At instruction 2: %v1, %v2, %v3 are live. Edges: %v1-%v2 (already), %v1-%v3, %v2-%v3.
At instruction 1: %v1, %v2 are live. Edge: %v1-%v2 (already).

The final interference graph has nodes {v1, v2, v3, v4, v5} and edges connecting them all except v3 and v5 (they never live together). This graph is a “clique” minus one edge. The chromatic number here? With 4 registers (a,b,c,d), we could color it: v1->reg1, v2->reg2, v3->reg3, v4->reg4, v5->reg2. But if we only have 3 registers, coloring fails.

3. The Classic Algorithm: Chaitin’s Graph Coloring

Preston Briggs, in his seminal work, refined the algorithm originally proposed by Chaitin et al. The core idea is a greedy, iterative simplification process that tries to find a K-coloring (where K is the number of physical registers). The algorithm has four phases that may be repeated:

Build: Construct the interference graph (as described above).
Simplify: Remove nodes from the graph in a specific order.
Select: Assign colors to the removed nodes in reverse order.
Spill: If a node cannot be colored, mark it for spilling, insert loads/stores in the code, and restart from Build.

Let’s examine each phase.

3.1 Simplify: The Recursive Removal

The key insight is that a node with fewer than K neighbors is always colorable, regardless of the colors assigned to its neighbors. Why? Because if a node has, say, 3 neighbors, and we have 4 colors, there will always be at least one color left for it, no matter what colors the neighbors take.

The simplify phase exploits this. It repeatedly removes any node whose degree (number of interference edges) is less than K. These nodes are pushed onto a stack. When a node is removed, we decrement the degree of all its neighbors. This can cause cascading: a previously high-degree node might now become low-degree and also be removable.

What about high-degree nodes? If every node has degree >= K, the graph is said to be constrained. We cannot guarantee a coloring. The algorithm then picks a candidate to spill—a node that seems like the least harmful to spill (e.g., the one with the highest cost/degree ratio). This node is pushed onto the stack as a “potential spill.”

3.2 Select: The Reverse Pass

Once the graph is empty (all nodes on the stack), we pop nodes one by one. For each node, we look at its neighbors that have already been colored (popped and assigned). We assign it a color different from all of them.

For low-degree nodes, this will always succeed. For “potential spill” nodes, we may discover that, in the context of the already-colored graph, there is a color available. This is called a spill-free assignment. If not, it becomes an actual spill.

3.3 Spill: The Practical Nightmare

When a node is an actual spill, we must modify the program. We insert a store after every definition of the spilled virtual register, and a load before every use. This effectively creates two new live ranges: one for the stored value (which now lives in memory) and one for the loaded value (a new temporary). These new virtual registers must then be fed back into the Build phase.

This is why the algorithm is iterative. The insertion of loads and stores changes the live ranges and the interference graph. The allocator must rebuild the graph and try again. This can lead to multiple passes, and in pathological cases, the program can expand significantly, requiring multiple spilling rounds.

The Cost of Spilling is Huge. A single spill introduces two memory operations (a load and a store) that would not exist otherwise. On a modern CPU, a cache miss can cost hundreds of cycles. Therefore, the choice of which node to spill is critical.

Example 4: Spilling in Practice

Consider a target with K=2 registers. Our graph from Example 3 has nodes {v1, v2, v3, v4, v5}. The maximum degree is… v1 and v2 each have 3 neighbors. v4 has 2 neighbors. So all nodes have degree >= 2. The graph is completely uncolorable with 2 colors.

The allocator must pick a spill candidate. Perhaps v4 (the result of c*2) is chosen because it is used only once. The compiler inserts store %v4, mem after instruction 2 and load %v4_new, mem before instruction 4. Now the live range of v4 is broken into two separate live ranges, potentially reducing interference.

4. The Failure of Pure Chaitin: Optimistic Coloring

The original Chaitin algorithm was pessimistic. If a node had a degree >= K, it was immediately marked as a potential spill. However, Briggs observed that many such nodes would be colorable in the select phase, because their high-degree neighbors might end up sharing colors, freeing up a slot.

This is the innovation of Chaitin-Briggs (Optimistic Coloring).

Simplify: Remove all nodes with degree < K. When only nodes with degree >= K remain, do not mark them as spills. Instead, push them onto the stack anyway (as “optimistic” nodes).
Select: Pop nodes. For optimistic nodes, check if their already-colored neighbors have used up all K colors. If not, assign a color. If yes, then it’s an actual spill.

This simple change was revolutionary. It drastically reduced the number of spilled variables. In many real-world programs, up to 80% of the “potential spill” nodes would find a color in the select phase.

Why does this work? Imagine a graph with 4 nodes (A, B, C, D) fully connected to each other (a K4 clique) and K=3. Each node has degree 3. In a pessimistic algorithm, all would be marked for spill. In optimistic coloring:

No node has degree < 3, so all are pushed onto the stack (say A, B, C, D).
Pop D: no neighbors colored. Color it red.
Pop C: neighbors {D}. C cannot be red. Color it blue.
Pop B: neighbors {C, D}. Cannot be red or blue. Color it green.
Pop A: neighbors {B, C, D}. All three colors are used. Spill!

This is correct: we only spilled one node, not four. The algorithm successfully found a 3-coloring for 3 out of 4 nodes.

5. The Problem of Copies

Register allocation is not just about interfering variables. It is also about copy instructions. A typical IR has many mov instructions:

%v2 = %v1

This instruction simply copies a value from one register to another. The compiler would love to assign %v1 and %v2 to the same physical register, so that the mov can be eliminated entirely. This is called coalescing.

Why is this hard? If we merge %v1 and %v2 into a single node in the interference graph, we are effectively saying they must get the same color. But this merged node might have a very high degree (the union of the neighbors of both), making it harder to color. A overly aggressive coalescing can lead to unnecessary spilling.

The classic approach: Coalescing during Simplify.

The Chaitin-Briggs algorithm attempted coalescing during the Build phase. For every copy x = y, if x and y do not interfere (no edge between them), they can safely be coalesced into a single node. This is called Briggs’s criterion: coalescing is safe if the merged node has fewer than K neighbors of degree >= K. This is because high-degree nodes are the riskiest.

This approach is simple but suboptimal. It coalesces eagerly, which can cause the graph to become fully constrained (all nodes high-degree) too early, leading to spills that could have been avoided if the copies were kept separate.

6. The State of the Art: Iterated Register Coalescing (IRC)

The standard solution taught in modern compiler courses and used in production compilers (like LLVM’s register allocator, which uses a variant) is the Iterated Register Coalescing algorithm, presented by Appel and George. It elegantly interleaves the processes of building, simplifying, coalescing, and spilling into a single, tightly coupled loop.

The key insight is that coalescing should be attempted during simplification, not before. By simplifying low-degree nodes first, we reduce the degree of other nodes, which can then become eligible for coalescing.

The Five-Phase Loop:

The IRC algorithm works on a graph with two types of nodes: pre-colored (physical registers) and non-pre-colored (virtual registers). It maintains a set of worklists and a simplification stack.

Build: Construct the interference graph and list of move instructions.
Simplify (the core loop): While there is a node with degree < K, remove it from the graph and push it on the stack. This is identical to Chaitin’s simplify. Crucially, this removal may reduce the degree of its neighbors, potentially enabling new coalescing opportunities.
Coalesce: After simplification is stuck (no low-degree nodes left), look for a copy instruction x = y that can be safely coalesced. The safety check is more sophisticated than Briggs’s. There are two common criteria:
- Briggs’s criterion: The merged node xy has fewer than K neighbors of degree >= K. This is conservative but safe.
- George’s criterion: All neighbors of x with degree >= K also interfere with y, or have degree < K. This is a different heuristic that can coalesce more aggressively in some cases.
If a safe coalesce is found, the two nodes are merged into one. This reduces the number of nodes and may create new low-degree nodes, allowing simplification to resume.
Freeze: If no safe coalesce can be found, we must give up on some moves. We pick a move-related node (a node that is part of a move) that has a low degree, and we “freeze” it—we simply stop trying to coalesce it. Its moves are removed from consideration. This makes it eligible for simplification again.
Spill: If none of the above works (no low-degree nodes, no safe coalesce, no freeze candidate), we pick a spill node (a high-degree node that is not useful for coalescing). We push it on the stack as a potential spill.
Select: Pop the stack, assigning colors. As before, potential spills may or may not find a color.
Actual Spill: If select fails, rewrite the program with loads/stores and restart from Build.

The beauty of IRC is that it combines all operations into a single, iterative framework. The algorithm doesn’t just color; it intelligently decides when merging two variables (via coalescing) is worth the risk of increased degrees.

7. A Concrete Walkthrough (Conceptual)

Let’s revisit a small example to see IRC in action.

Assume K=3 physical registers. Virtual registers: %v1, %v2, %v3, %v4. Interference edges: %v1-%v2, %v1-%v3, %v2-%v3, %v3-%v4. Also, move instruction: %v4 = %v2 (we want to coalesce %v2 and %v4).

Degrees: %v1:2, %v2:2, %v3:3, %v4:1.

Standard Chaitin-Briggs (No Coalescing):

Simplify: %v4 (degree 1 < 3). Push %v4. Now %v3 degree becomes 2. Simplify %v3. Push %v3. Simplify %v1. Push %v1. Simplify %v2. Push %v2.
Select: Pop %v2->color1. Pop %v1->color2. Pop %v3: neighbors are %v1(2), %v2(1). Cannot be 1 or 2. Use color3. Pop %v4: neighbors are %v3(3). Use color1. Success! But the move %v4 = %v2 was not eliminated; they have different colors (1 and 1? No, %v4 was given color1, %v2 was given color1. Wait, if both get color1, the move could be eliminated. In this case, by chance, they got the same color. But we cannot rely on chance.

Iterated Register Coalescing:

Build graph and moves.
Simplify: %v4 (degree 1<3). Push %v4. Now degree of %v3 = 2.
Simplify: %v3 (degree 2<3). Push %v3. Now degree of %v1=1, %v2=1.
Simplify: %v1 (degree 1). Push %v1.
Now graph has only %v2 (degree 0). But the move %v4=%v2 is still pending. Wait, %v4 has been simplified, but the move refers to %v4. The IRC algorithm manages move lists carefully. In IRC, after simplifying %v4, the move %v4=%v2 is still alive. The algorithm would now look for a coalesce opportunity. Since %v2 currently has no neighbors (all others are simplified), it has degree 0. Coalescing %v2 with %v4 would create a node with degree? The merged node would have neighbors: %v1, %v3 (from the original %v2). That’s degree 2 < K. The move is safe! Coalesce %v2 and %v4.
Now push the merged node onto the stack.
Select: Pop merged node -> color1. Pop %v1 -> color2. Pop %v3 -> color3. Pop %v4 is gone.

The result: %v2 and %v4 share color1, and the move instruction is eliminated. IRC achieved coalescing that pure Chaitin would not have guaranteed.

8. Implementing in Code (Pseudo-Python)

Here is a simplified, pedagogical implementation of the core IRC loop. It omits many details (e.g., actual liveness analysis, rewrite for spill).

# Simplified IRC Core

class Graph:
    def __init__(self, K):
        self.K = K  # number of colors
        self.nodes = {}  # name -> Node
        self.moves = set()  # set of (src, dst) moves
        self.stack = []
        self.select_stack = []

    class Node:
        def __init__(self, name):
            self.name = name
            self.degree = 0
            self.neighbors = set()
            self.color = None
            self.move_list = []  # moves this node is part of

    def add_move(self, src, dst):
        self.moves.add((src, dst))
        self.nodes[src].move_list.append((src,dst))
        self.nodes[dst].move_list.append((src,dst))

    def add_edge(self, u, v):
        if u == v: return
        self.nodes[u].neighbors.add(v)
        self.nodes[v].neighbors.add(u)
        # Degree is not simply len(neighbors) due to coalescing; we track 'significant degree'
        # This is a simplification.

    def simplify(self):
        # Simplified: while node with degree < K exists, remove it.
        while True:
            candidate = None
            for n in self.nodes:
                node = self.nodes[n]
                if node.degree < self.K and node not on stack (use flags):
                    candidate = n
                    break
            if candidate is None:
                break
            # Remove node: decrease degree of neighbors
            for neighbor in self.nodes[candidate].neighbors:
                if neighbor in self.nodes:  # not yet removed
                    self.nodes[neighbor].degree -= 1
            self.select_stack.append(candidate)
            # Remove from active consideration
            # ... (complex bookkeeping)

    def coalesce(self):
        # Find a move (src,dst) where src and dst can be merged
        for (src, dst) in self.moves:
            if src not in self.nodes or dst not in self.nodes:
                continue
            # Check Briggs criterion (simplified):
            merged_deg = len(set(self.nodes[src].neighbors | self.nodes[dst].neighbors) - {src, dst})
            if merged_deg < self.K:
                # Merge: rename dst to src everywhere
                self.merge(src, dst)
                self.moves.remove((src,dst))
                return True
        return False

    def merge(self, src, dst):
        # Combine two nodes; keep src, discard dst.
        src_node = self.nodes[src]
        dst_node = self.nodes[dst]
        # All neighbors of dst become neighbors of src
        for neighbor_name in dst_node.neighbors:
            if neighbor_name != src:
                self.add_edge(src, neighbor_name)
        # Move list union
        src_node.move_list.extend(dst_node.move_list)
        # Delete dst
        del self.nodes[dst]
        # Update degrees (this is rough; actual needs recalculation)
        # ...

    def allocate(self):
        # Main loop: interleave simplify, coalesce, freeze, spill.
        while len(self.nodes) > 0:
            # Phase 1: Simplify as much as possible
            self.simplify()
            if len(self.nodes) == 0:
                break
            # Phase 2: Try to coalesce a move
            if self.coalesce():
                continue
            # Phase 3: Freeze (not implemented)
            # Phase 4: Spill - pick a high-degree node and push to stack as potential
            candidate = max(self.nodes, key=lambda n: self.nodes[n].degree)
            self.select_stack.append(candidate)
            # Remove it temporarily (same as simplify but forced)
            for neighbor in self.nodes[candidate].neighbors:
                if neighbor in self.nodes:
                    self.nodes[neighbor].degree -= 1
            del self.nodes[candidate]
        # Select phase: pop stack and assign colors
        colors = list(range(self.K))
        for node_name in reversed(self.select_stack):
            used_colors = set()
            for neighbor_name in original_graph_neighbors(node_name):
                if neighbor_name in final_colored_set:
                    used_colors.add(final_colored_set[neighbor_name])
            for c in colors:
                if c not in used_colors:
                    final_colored_set[node_name] = c
                    break
            else:
                # Actual spill: need to rewrite program
                handle_spill(node_name)
        return final_colored_set

This pseudo-code is drastically simplified, but it illustrates the core loop. Real implementations are massive (e.g., LLVM’s RegAllocGreedy is thousands of lines) due to the complexity of data structures, cost heuristics, and handling pre-colored registers.

9. Modern Challenges and Extensions

The principles of graph coloring remain the foundation, but modern compilers have introduced significant enhancements:

SSA-based Allocation (Million et al.): The dominance property of Static Single Assignment (SSA) form makes the interference graph more structured (chordal). Chordal graphs can be colored in polynomial time, and allocators like LLVM’s “basic” allocator and GCC’s modern allocator exploit this. They perform a global register allocation on SSA, then out-of-SSA translation handles the phi-nodes.
Greedy Allocation (LLVM): LLVM’s primary register allocator (RegAllocGreedy) is not a pure graph coloring algorithm. Instead, it uses a priority-queue based approach. It orders all live ranges by some heuristic (e.g., spill cost divided by size). It then iterates through them, trying to assign a register. If none is available, it proactively may split a live range—inserting loads and stores to turn a long, interfering live range into two shorter, non-interfering ones. This is a form of “spill everywhere” vs. “spill only where needed” and often performs better than pure graph coloring.
Live Range Splitting: Instead of spilling an entire variable, a compiler can spill only a portion of its live range. This is the idea behind region-based or interval-based allocation. The allocator can identify a “hot” portion that needs a register and a “cold” portion that can tolerate memory.
Rematerialization: Instead of spilling a value to memory, the compiler can re-compute it if it is cheap to compute (e.g., a constant or a simple expression). This eliminates the load entirely.
Register Allocation for SMT: Simultaneous Multithreading (Hyper-Threading) introduces new complexities. A physical register may be shared between threads, and the allocator must consider partitioned register files or port constraints.

10. Conclusion

We began with a simple metaphor: a program is a marble statue that must fit through a small door. The door is the CPU’s register file. The compiler’s task is to smash the statue into pieces and reassemble it on the other side, ensuring all parts arrive in the correct order.

Graph coloring provided the elegant theoretical framework: model virtual registers as nodes, interference as edges, and physical registers as colors. The Chaitin-Briggs algorithm showed how to greedily simplify the graph and optimistically color it, achieving near-perfect results for most programs.

The Iterated Register Coalescing algorithm elevated this into a practical, compile-time framework that intelligently manages the tension between removing copies and avoiding spills. It demonstrates that compiler optimization is not just about applying a single brilliant idea, but about carefully orchestrating multiple competing strategies.

While modern compilers have moved beyond pure graph coloring in favor of greedy heuristics and SSA-based methods, the core concepts remain. Understanding graph coloring gives you insight into the fundamental sacrifice every modern computer must make: the trade between infinite virtual space and finite physical resources.

The next time you compile a C++ template-heavy codebase in a few seconds, or run a real-time application on a tiny embedded chip, remember the “Great Squeeze” happening behind the scenes—a mathematical marvel that turns your infinite program into finite, fast machine code.

Further Reading:

Modern Compiler Implementation in C/Java/ML by Andrew Appel (Chapters on Register Allocation)
Optimization of Compilers for a New Generation by Preston Briggs (PhD Thesis, contains the detailed algorithm)
Register Allocation via Hierarchical Graph Coloring by Callahan and Koblenz
LLVM Source Code: lib/CodeGen/RegAllocGreedy.cpp