Designing A Lagrangian Relaxation Algorithm For The Traveling Salesman Problem With 1 Trees

A comprehensive technical exploration of designing a lagrangian relaxation algorithm for the traveling salesman problem with 1 trees, covering key concepts, practical implementations, and real-world applications.
Contents
Here is the introduction for your blog post.
The Allure of the Impossible: Why We Chase Lower Bounds
There is a moment every operations researcher, computer scientist, or logistics manager knows well. It usually happens late at night, staring at a map. You have a list of cities, a set of distances, and one simple, maddening question: What is the shortest possible route that visits every city exactly once and returns to the start?
The Traveling Salesman Problem (TSP) is, on the surface, an almost childishly simple puzzle. It asks for a permutation—a sequence of cities. Yet, this elegant simplicity is a trap. It is one of the most famous NP-hard problems in existence, a computational Siren that has lured mathematicians and computer scientists for over a century. For 20 cities, there are roughly 121 quadrillion possible tours to consider. For 50 cities, that number explodes past the number of atoms in the universe.
The brute-force approach—checking every possible tour—is impossible from the moment you look at more than a handful of locations. And yet, the TSP is not an abstract mathematical curiosity. It is a fundamental economic engine. The TSP is the beating heart of logistics, supply chain management, genome sequencing, circuit board drilling, and even the pathfinding algorithms that power modern robotics. Every package shipped overnight, every chromosome mapped, every microchip manufactured—these processes are, at their core, wrestling with the Traveling Salesman.
So, how do we solve the unsolvable? We stop looking for the exact answer in all cases, and we start building algorithms that can get within a hair’s breadth of perfection, provably so.
This post is about one of the most elegant and powerful weapons in the algorithmic arsenal for doing exactly that: Lagrangian Relaxation. Specifically, we are going to design a Lagrangian Relaxation algorithm for the TSP using a deceptively simple structure called the 1-Tree.
The Problem with the Problem
Before we talk about the cure, we need to fully understand the disease. The standard TSP is an Integer Linear Program (ILP). You have binary variables ( x*{ij} ) for every edge between city ( i ) and city ( j ). Put simply, ( x*{ij} = 1 ) if the edge is part of the final tour; ( x_{ij} = 0 ) if it is not.
The constraints are logical, but punishingly difficult to encode:
- Degree Constraints: Every city must have exactly two edges in the tour—one to arrive, one to leave. This is easy.
- The Subtour Elimination Constraints: This is the killer. You cannot have a small, closed loop of cities that doesn’t include all of them. For a set of 10 cities out of 100, you cannot have a perfect little cycle that prevents the salesman from visiting the other 90.
The first set of constraints is linear. The second set is exponential. There are ( 2^n ) possible subsets of cities, meaning there are ( 2^n ) constraints to write down if we want to be sure we stop those “cheating” subtours. We cannot solve a model with billions upon billions of constraints for a real-world problem.
This is where we stop trying to solve the problem exactly and start trying to bound it. We want to know: what is the theoretical best possible answer? If we can prove that the optimal tour must be at least 1000 miles long, and we find a tour that is 1010 miles long, we know we are within 1% of perfection.
The classic way to get a lower bound (an optimistic estimate) for the TSP is to relax the problem. We drop the hard part—the subtour elimination constraints—and solve what remains. The result is the Assignment Problem or a Minimum Spanning Tree (MST). An MST gives you a set of edges connecting all cities with minimum total distance, without worrying about the salesman’s requirement to visit in order. It is a tree, not a tour.
But the gap between an MST and the true optimal tour can be enormous. The MST bound is weak. It tells us the tour must be at least as long as the tree, but the tree is often a star-shaped mess that ignores the structure of a tour. We need a tighter relaxation.
Enter the 1-Tree: A Slightly Less Naive Relaxation
The 1-Tree is a brilliant refinement of the MST idea. Instead of just building a tree connecting all ( n ) cities, we take an evolutionary step towards a tour.
Here is the construction:
- Pick a reference point. Choose one city, let’s call it City 1.
- Remove City 1. Take all the other ( n-1 ) cities.
- Build an MST. Compute the Minimum Spanning Tree for those ( n-1 ) cities.
- Re-attach City 1. Connect City 1 back to the rest of the tree using the two cheapest edges from City 1 to any two other cities.
The result is a structure that looks almost like a tour. It has exactly ( n ) edges (like a tour). It is connected (like a tour). The key difference is that the MST structure in the interior might have branches and forks (nodes with degree 3 or more), whereas a true tour is a simple cycle where every node has degree exactly 2.
But here is the magic: The minimum 1-Tree provides a lower bound on the TSP. Any tour is a 1-Tree (a specific, expensive 1-Tree where all nodes have degree 2). Therefore, the cheapest possible 1-Tree cannot be more expensive than the cheapest possible tour. It is a valid, and often much better, lower bound than a simple MST.
But it’s still not tight enough. The problem is the degree constraint.
The Tension: Lagrangian Relaxation
The minimum 1-Tree is too loose because it allows “hub” cities (nodes with degree 3, 4, or more) which are illegal in a tour. It also allows cities to have degree 1 (which they cannot in a tour). We want to force every node (except maybe City 1, which will have degree 2 by definition) to have degree exactly 2.
This is a constraint. We do not want to add it directly (that would just be the original TSP). Instead, we want to nudge the solution towards it. We want to say: “Mr. 1-Tree, if you give a city a degree that is not 2, I am going to punish you.”
This is the core philosophy of Lagrangian Relaxation. Instead of enforcing a hard constraint, we move it into the objective function. We attach a price, a Lagrangian multiplier (λ), to each city.
The logic is simple:
- If a city’s degree (( d_i )) is greater than 2, we add a penalty to the cost of the 1-Tree.
- If a city’s degree is less than 2, we subtract a penalty (a reward for helping us get closer to a tour).
- The penalty is proportional to how far the degree deviates from the ideal (( d_i - 2 ).
We modify the original cost of every edge in our graph. The new “effective cost” of an edge (( i, j )) becomes:
[ c’{ij} = c{ij} + \lambda_i + \lambda_j ]
Where λ is our vector of penalties.
Think of it like this: If you are trying to visit a specific city (City 5), you are willing to “pay” a little more to enter and exit it. The Lagrangian multiplier ( \lambda_5 ) represents this willingness to pay. By artificially raising the costs of edges attached to City 5, we make the 1-Tree less likely to build a star-hub around City 5 and more likely to force it to have exactly two connections.
Now, the problem becomes a game. We solve the minimum 1-Tree problem on these adjusted costs. If the resulting 1-Tree has some nodes with degree 3, we know we didn’t penalize them enough. If it has nodes with degree 1, we penalized them too much.
We need to find the perfect set of penalties, the optimal λ vector, that forces every node to have degree exactly 2.
The Road Ahead
This is the algorithmic knife fight we are about to enter. We are going to design a system that iteratively:
- Computes the Minimum 1-Tree on a set of modified costs.
- Checks the resulting degrees.
- Updates the penalties (λ) using a subgradient optimization method—a form of gradient descent for non-smooth functions.
When we find the λ that forces the 1-Tree into a tour (or as close as possible), the cost of that final 1-Tree is the Lagrangian Lower Bound. This bound is mathematically proven to be at least as good as the naive MST bound, and often it is incredibly tight—sometimes within 0.01% of the true optimal tour length.
In this post, we will not just talk theory. We will get our hands dirty. We will:
- Define the exact mathematical model for the TSP and the 1-Tree.
- Implement the Lagrangian relaxation step-by-step, showing how to update the multipliers.
- Code the Subgradient Optimization to converge on the best bounds.
- Analyze the results, showing the gap between the initial MST, the simple 1-Tree, and the final Lagrangian bound on a classic benchmark problem (e.g., the 48-city USA problem, att48).
By the end, you will understand why Lagrangian Relaxation is not just an algorithm—it is a philosophy of problem-solving. You will see how to solve the unsolvable by breaking it into manageable pieces and then bribing those pieces to work together.
The TSP will still be NP-hard. But we are no longer powerless. We are going to hunt for its perfect lower bound.
Let’s design the algorithm.
Designing a Lagrangian Relaxation Algorithm For The Traveling Salesman Problem With 1‑Trees
The Traveling Salesman Problem (TSP) is one of the most studied combinatorial optimization problems. Given a set of cities and distances between them, the goal is to find the shortest possible tour that visits each city exactly once and returns to the starting city. While the problem statement is deceptively simple, TSP is NP‑hard, meaning that large instances cannot be solved exactly in a reasonable amount of time using brute‑force or exact methods alone.
To tackle large TSP instances, practitioners turn to relaxations – simplified versions of the problem that can be solved efficiently and that provide a lower bound on the optimal tour length. A good relaxation not only gives a tight bound but also offers insight into the structure of near‑optimal solutions. Among the most powerful relaxation techniques for TSP is the 1‑tree relaxation paired with Lagrangian relaxation. This approach underpins the famous Held–Karp lower bound, which is the foundation of many state‑of‑the‑art exact solvers. In this article, we will dive deep into the theory, design, and implementation of a Lagrangian relaxation algorithm using 1‑trees. We’ll cover everything from the mathematical formulation to practical code examples, and conclude with real‑world applications that demonstrate why this technique remains relevant today.
1. The Traveling Salesman Problem: Why We Need Relaxations
Let’s formalize TSP. We have a complete undirected graph ( G = (V, E) ) with ( n = |V| ) nodes (cities). Each edge ((i,j)) has a cost (c_{ij}) (distance, time, fuel, etc.). A tour is a Hamiltonian cycle – a cycle that visits every node exactly once. The objective is:
[ \min \sum*{(i,j) \in E} c*{ij} x_{ij} ]
subject to constraints that ensure each node has degree exactly 2 and that the chosen edges form a single connected cycle (subtour elimination constraints). The degree constraints alone do not guarantee a single cycle; they could produce a collection of disjoint cycles. The subtour elimination constraints make TSP hard.
Relaxations drop some of these constraints. The classic spanning tree relaxation ignores the degree constraints and asks for a minimum spanning tree (MST). The MST length is a lower bound for the TSP, but often a very loose one because a tree has exactly (n-1) edges (while a tour has (n) edges) and the degree of each node in an MST can be far from 2.
The 1‑tree relaxation improves on the MST. A 1‑tree is defined as follows:
- Choose a special node (e.g., node 1).
- Build a spanning tree on the remaining nodes ( V \setminus {1}).
- Add two edges from node 1 to the tree (any two edges). The result has exactly (n) edges, like a tour.
A Hamiltonian tour is a special kind of 1‑tree where every node (including node 1) has degree exactly 2. Thus the TSP can be thought of as a degree‑constrained 1‑tree. The 1‑tree relaxation drops the degree constraints on all nodes except node 1 (which is allowed any degree). The optimal 1‑tree can be computed in polynomial time: find an MST on nodes (2,\dots,n), then add the two cheapest edges from node 1 to that MST.
The lower bound from a 1‑tree is weak, but we can tighten it dramatically by transforming edge costs with Lagrange multipliers. This is where Lagrangian relaxation enters.
2. Lagrangian Relaxation: The Core Idea
Lagrangian relaxation is a technique for handling “complicating” constraints. Instead of enforcing a constraint directly, we penalize its violation in the objective function using a multiplier (or price). By adjusting the multipliers, we search for the best lower bound.
For TSP, we treat the degree constraints (except for node 1) as the complicating constraints. Let’s denote:
- For each node (i \in V \setminus {1}), the desired degree in a tour is 2.
- In a 1‑tree, the actual degree of node (i) is (\deg_T(i)), where (T) is the 1‑tree.
We relax the constraints (\deg_T(i) = 2) by moving them to the objective with a multiplier (\lambda_i) (unrestricted in sign). The Lagrangian function for a given vector (\lambda = (\lambda_2, \lambda_3, \dots, \lambda_n)) becomes:
[ L(\lambda) = \min*{1\text{-tree } T} \left( \sum*{(i,j)\in T} c*{ij} + \sum*{i=2}^n \lambda_i (\deg_T(i) - 2) \right) ]
Since the term (\sum_i \lambda_i (\deg_T(i) - 2)) can be rewritten as:
[ \sum*{(i,j)\in T} (\lambda_i + \lambda_j) - 2\sum*{i=2}^n \lambda_i ]
(Note: (\lambda1) is fixed to 0 because node 1 degree is not constrained.) The constant term (-2\sum{i=2}^n \lambda_i) does not depend on the choice of 1‑tree. So we can define modified edge costs:
[ c*{ij}^\lambda = c*{ij} + \lambda_i + \lambda_j \quad \text{for } i,j \neq 1 ]
and for edges incident to node 1:
[ c*{1j}^\lambda = c*{1j} + \lambda_j \quad (\text{since } \lambda_1=0) ]
Then the Lagrangian lower bound is:
[ L(\lambda) = \left( \min*{1\text{-tree } T} \sum*{(i,j)\in T} c*{ij}^\lambda \right) - 2\sum*{i=2}^n \lambda_i ]
For a fixed (\lambda), computing the minimum 1‑tree is easy (MST + two cheapest edges from node 1). The result is a lower bound for the original TSP: because any feasible tour would have (\deg_T(i)=2) for all (i), so its modified cost equals its original cost, and the 1‑tree we find is at most the tour cost under the modified costs.
The dual problem is to find the multipliers that maximize this lower bound:
[ \max_{\lambda \in \mathbb{R}^{n-1}} L(\lambda) ]
The optimal value of this dual is called the Held–Karp lower bound, which is known to be exactly the same as the linear programming relaxation of the TSP (the Subtour Elimination Polytope). For Euclidean TSP instances, the Held–Karp bound is typically within 1–2% of the optimal tour length, making it extremely tight.
How do we maximize (L(\lambda))? The function (L(\lambda)) is concave and piecewise linear, but not differentiable everywhere. We use subgradient optimization, a gradient‑like method that works with subgradients instead of gradients.
3. Subgradient Optimization for the Dual
A subgradient at a point (\lambda) is a vector (g) such that for all (\mu):
[ L(\mu) \leq L(\lambda) + g \cdot (\mu - \lambda) ]
For our Lagrangian, one convenient subgradient is the vector of degree violations:
[ g_i = \deg_T(i) - 2 \quad \text{for } i = 2,\dots,n ]
where (T) is the optimal 1‑tree for the current (\lambda). (There can be multiple optimal trees; any one works.) The intuition: if node (i) has degree > 2 in the 1‑tree, we want to increase its penalty (\lambdai) to discourage high degree; if degree < 2, we decrease (\lambda_i). The subgradient tells us the direction of steepest ascent _in the dual space.
The subgradient method updates (\lambda) as follows:
[ \lambda^{(k+1)} = \lambda^{(k)} + \alpha_k \cdot \frac{UB - LB^{(k)}}{|g^{(k)}|^2} \cdot g^{(k)} ]
Here:
- (UB) is an upper bound (a feasible tour length obtained heuristically).
- (LB^{(k)} = L(\lambda^{(k)})) is the current lower bound.
- (\alpha_k) is a step‑size parameter, usually chosen to start around 2 and halved periodically (e.g., after a certain number of iterations without improvement).
The numerator (UB - LB^{(k)}) is the current duality gap. The step size is scaled so that if we could jump directly to the dual optimum, the lower bound would equal the upper bound – but in practice the gap closes slowly.
Key practical considerations:
- The subgradient is not a true gradient; the method may oscillate. We must keep the best (LB) found so far.
- Compute (UB) using a fast TSP heuristic (e.g., nearest neighbor + 2‑opt improvement) every few iterations. The upper bound should ideally decrease as we get better lower bounds.
- If the subgradient becomes zero, we have reached a point where (L(\lambda)) is maximized (or at least a stationary point). For TSP, the multipliers can be adjusted for a long time before the bound converges.
We now have all the pieces to design a complete algorithm.
4. Designing the Algorithm: Step by Step
4.1 Data Structures and Prerequisites
Assume we have an (n \times n) distance matrix dist. We’ll keep node indices (0) to (n-1), with node 0 as the special “root” (node 1 in our earlier notation). We need:
- A function to compute the MST on a set of nodes (excluding root) given modified costs. We’ll use Prim’s algorithm (efficient for dense graphs).
- A function to find the two cheapest edges from root to the MST.
- A placeholder for an upper bound solver (we can use a simple nearest‑neighbor heuristic with 2‑opt, but we’ll keep it modular).
4.2 Computing a 1‑Tree with Modified Costs
The core routine:
def compute_one_tree(cost, root=0):
"""Return the list of edges in a min-cost 1-tree with given cost matrix.
cost[i][j] should be symmetric, and we assume root's index is 0.
"""
n = len(cost)
# 1. MST on nodes 1..n-1 (exclude root)
mst_edges = prim_mst(cost, excluded_nodes=[root])
# 2. Collect all nodes that are already in MST (all except root)
# 3. Add two cheapest edges from root to any node in MST
root_edges = []
for j in range(1, n):
root_edges.append((root, j, cost[root][j]))
root_edges.sort(key=lambda x: x[2])
# take the two cheapest
edges = mst_edges + [root_edges[0], root_edges[1]]
return edges
prim_mst can be implemented using a priority queue, but for dense graphs a simple array‑based version (choose min each iteration) is fine.
4.3 Initialization
- Set all (\lambda_i = 0).
- Compute initial 1‑tree and lower bound (LB = L(\lambda)).
- Compute an initial upper bound using, e.g., nearest neighbor tour (start from root, always go to nearest unvisited city, then return to root).
- Set
best_lambda = lambda,best_lb = LB. - Choose initial step size parameter (\alpha = 2.0) (a common start).
4.4 Main Loop
We iterate for a fixed number of iterations (e.g., 10,000) or until the relative gap ((UB - LB)/LB < \epsilon).
At each iteration k:
- Compute the optimal 1‑tree for current (\lambda) using modified costs.
- Compute (LB = L(\lambda) = \text{(1‑tree cost with modified costs)} - 2 \sum_{i=2}^n \lambda_i).
- If (LB > best_lb), update
best_lbandbest_lambda. - Compute subgradient: for each node (i\neq root), (g_i = \deg_T(i) - 2).
- Compute step size: (s = \alpha_k \cdot (UB - LB) / |g|^2).
- Update: (\lambda_i \leftarrow \lambda_i + s \cdot g_i).
- (Optional) Every, say, 100 iterations, recompute the upper bound using a heuristic that uses the current (\lambda) as guidance. (Often we just use a simple TSP solver independent of (\lambda).)
- If the lower bound hasn’t improved for a long time (e.g., 100 iterations), reduce (\alpha): (\alpha \leftarrow \alpha / 2).
Important: The subgradient may become zero (if the 1‑tree already satisfies all degree constraints – but that would be a tour). Then we have the optimal solution.
4.5 Upper Bound Heuristics
While the dual provides a lower bound, we also want a feasible (upper bound) tour. Simple heuristics:
- Nearest Neighbor (greedy): Start at root, repeatedly go to nearest unvisited city, then return. Usually gives a tour about 15–20% above optimum.
- 2‑opt improvement: Repeatedly replace two crossing edges to shorten the tour. This can bring the gap to 5–10%.
- For better performance, use Christofides algorithm or Lin‑Kernighan.
We can run the upper bound heuristic every few outer iterations using the current modified costs as a guide? Actually, a common trick is to consider that the 1‑tree edges often “almost” form a tour; one can build a tour by breaking the cycles in the 1‑tree and reconnecting. But for simplicity, we’ll just compute a standalone NN+2‑opt tour every 200 iterations.
4.6 Termination
We stop when:
- The gap falls below a threshold (e.g., 0.1%).
- A maximum number of iterations is reached.
- The step size becomes too small (< 1e-6).
At the end, we return the best lower bound and the best upper bound, along with the final multipliers.
5. Full Code Example (Python‑like Pseudocode)
Below is a self‑contained implementation of the algorithm for a small TSP instance. We omit the MST and 2‑opt code for brevity; assume they are available.
import random
import math
def lagrangian_tsp(dist, max_iter=10000):
n = len(dist)
root = 0
# Initialize multipliers
lam = [0.0] * n # lam[0] stays 0
# Compute initial upper bound (using a simple heuristic)
ub = nearest_neighbor_tour_cost(dist)
# We'll occasionally run 2-opt to improve ub
best_ub = ub
# Initialize lower bound
one_tree_edges = compute_one_tree(dist, root, lam)
lb = one_tree_cost(one_tree_edges, lam) # includes lambda penalty
best_lb = lb
best_lam = lam[:]
# Step size control
alpha = 2.0
no_improve_count = 0
for iteration in range(max_iter):
# 1. Get 1-tree with current lam
one_tree_edges = compute_one_tree(dist, root, lam)
# 2. Compute lower bound L(lam)
raw_cost = sum(dist[i][j] for (i,j,w) in one_tree_edges)
lambda_penalty = sum(lam[i] * (deg[i] - 2) for i in range(1, n))
# Actually easier: compute modified cost and subtract 2*sum(lam)
# We'll compute directly
modified_cost = 0
deg = [0]*n
for (i,j,w) in one_tree_edges:
modified_cost += (w + lam[i] + lam[j])
deg[i] += 1
deg[j] += 1
# For root, lam[0]=0, so no change
lb = modified_cost - 2 * sum(lam[1:])
if lb > best_lb:
best_lb = lb
best_lam = lam[:]
no_improve_count = 0
else:
no_improve_count += 1
# 3. Subgradient
subgrad = [0.0]*n
for i in range(1, n):
subgrad[i] = deg[i] - 2.0
norm_sq = sum(g*g for g in subgrad)
if norm_sq < 1e-12:
break # exact dual optimum reached
# 4. Step size
step = alpha * (best_ub - lb) / norm_sq
# 5. Update lambda
for i in range(1, n):
lam[i] += step * subgrad[i]
# 6. Occasionally recompute upper bound
if iteration % 200 == 0:
new_ub = two_opt_tour(dist, lam) # heuristic using lam to bias? We'll use standard dist.
if new_ub < best_ub:
best_ub = new_ub
# 7. Reduce alpha if stuck
if no_improve_count > 100:
alpha *= 0.5
no_improve_count = 0
if alpha < 1e-6:
break
return best_lb, best_ub
Notes on the code:
- The
compute_one_treefunction must use modified costs: (c*{ij}^\lambda = dist[i][j] + lam[i] + lam[j]) for (i,j\neq 0), and (c*{0j}^\lambda = dist[0][j] + lam[j]). - The upper bound heuristic
two_opt_tourcan be standard (no λ dependence) or can incorporate the idea of “minimum spanning” 1‑tree to generate a tour. For simplicity, we just run a normal NN+2‑opt. - The step size formula is classic; many implementations clamp the step or use a smoothing term.
6. Theory: Why It Works
The Lagrangian dual of the 1‑tree relaxation is equivalent to the linear programming relaxation of the TSP with subtour elimination constraints. This equivalence was established by Held and Karp (1970, 1971) and is one of the most elegant results in combinatorial optimization.
The 1‑tree relaxation itself can be viewed as a polyhedron: the convex hull of all 1‑trees. The degree constraints ( \deg(i)=2 ) are linear inequalities. By dualizing them, we are essentially finding the best possible lower bound given by the intersection of the 1‑tree polytope with a hyperplane defined by the degree constraints. The final multipliers (\lambda_i) represent the shadow prices of the degree constraints – how much the optimal tour cost would change if we allowed a node to have degree other than 2.
The dual function (L(\lambda)) is concave and piecewise linear. Each “piece” corresponds to a specific 1‑tree being optimal for a region of (\lambda) space. The subgradient algorithm converges to the optimal dual value (the Held–Karp bound) as long as the step sizes (\alpha_k) satisfy the conditions (\sum \alpha_k = \infty) and (\sum \alpha_k^2 < \infty). In practice, we use a geometric reduction schedule.
Why is the Held–Karp bound so tight? In Euclidean TSP, the optimal tour length is often within 1% of the Held–Karp bound. The reason is that the fractional solutions of the linear program tend to be “close” to integer tours for metric distances. The 1‑tree relaxation captures a lot of the connectivity structure.
7. Practical Example: 5‑City TSP
Let’s work through a concrete example. Coordinates (x,y):
- City 0: (0,0)
- City 1: (1,2)
- City 2: (3,1)
- City 3: (4,3)
- City 4: (1,4)
Euclidean distances. We’ll run the algorithm manually a few iterations.
Initial λ = 0:
- MST on nodes 1-4: edges: (1,4) ≈ 2.24, (1,2) ≈ 2.24, (3,4) ≈ 3.16 – total 7.64. Wait, we need Prim’s algorithm. Better compute numerically.
But conceptually: The 1‑tree will be: MST (nodes 1-4) + two cheapest edges from node 0. The cheapest edges from 0: (0,1) ≈ 2.24, (0,2) ≈ 3.16. So 1‑tree cost ≈ 7.64 + 5.40 = 13.04. Lower bound = 13.04.
Upper bound (NN): start 0 → 1 (2.24) → 4 (1.0) → 3 (3.16) → 2 (3.16) → 0 (3.16) total ≈ 12.72. Wait that’s lower than the lower bound? That cannot happen – our lower bound must be ≤ UB. Did we compute correctly? Let’s compute distances:
0-1: sqrt((1)^2+(2)^2)=√5≈2.236 0-2: √10≈3.162 0-3: √25=5 0-4: √(1+16)=√17≈4.123 1-2: √(4+1)=√5≈2.236 1-3: √(9+1)=√10≈3.162 1-4: √(0+4)=2 2-3: √(1+4)=√5≈2.236 2-4: √(4+9)=√13≈3.606 3-4: √(9+1)=√10≈3.162
Now MST on nodes 1-4: edges sorted: 1-4 (2), 1-2 (2.236), 2-3 (2.236), 3-4 (3.162), 1-3 (3.162), 2-4 (3.606). Prim starting from node1: add 1-4 (2) nodes {1,4} cost2; next cheapest to {1,4}: 1-2 (2.236) nodes {1,2,4} cost4.236; next: 2-3 (2.236) nodes {1,2,3,4} cost6.472. MST total = 6.472. Add two cheapest from 0: 0-1 (2.236) and 0-2 (3.162) → total 1-tree cost = 6.472+5.398 = 11.87.
Now NN tour: 0→1 (2.236), from 1 nearest unvisited: 4 (2), from 4 nearest: 3 (3.162) or 1 already visited, so 3; from 3 nearest unvisited: 2 (2.236); from 2 back to 0 (3.162). Total = 2.236+2+3.162+2.236+3.162 = 12.796. So UB=12.796, LB=11.87, gap about 7.2%.
Now we run subgradient. Compute degrees in 1-tree: node1 deg=2 (edges 0-1, MST edges? 1-4 and 1-2? Yes, MST uses 1-4,1-2; plus 0-1 gives degree 3? Wait careful: In 1‑tree, we added two edges from root (0-1,0-2). So 1‑tree edges: MST edges (1-4,1-2,2-3) plus root edges (0-1,0-2). So nodes: 0 deg=2, 1 deg=3, 2 deg=3 (MST uses 1-2 and 2-3, plus root 0-2), 3 deg=1, 4 deg=1. Subgradient: g1 = 3-2=1, g2=1, g3=-1, g4=-1. Norm^2 = 1+1+1+1=4.
Suppose UB constant 12.796, LB=11.87 → gap=0.926. Step size s = 2 * 0.926 / 4 = 0.463. Update lam1=0+0.463*1=0.463, lam2=0.463, lam3= -0.463, lam4= -0.463.
Now compute new 1‑tree with modified costs:
- modified distances: e.g., 0-1: 2.236 + 0.463 = 2.699; 0-2: 3.162+0.463=3.625; 0-3: 5-0.463=4.537; 0-4: 4.123-0.463=3.66.
- For edges not involving root: add both lam: e.g., 1-2: 2.236+0.463+0.463=3.162; 1-3: 3.162+0.463-0.463=3.162; 1-4: 2+0.463-0.463=2; 2-3: 2.236+0.463-0.463=2.236; 2-4: 3.606+0.463-0.463=3.606; 3-4: 3.162-0.463-0.463=2.236.
Now MST on nodes 1-4 with these modified costs: edges sorted: 1-4 (2), 2-3 (2.236), 3-4 (2.236), 1-2 (3.162), 1-3 (3.162), 2-4 (3.606). Prim: start 1, add 1-4 (2) {1,4}; next cheapest to set: from 4: 3-4 (2.236) {1,4,3}; then from set: 2-3 (2.236) {1,2,3,4}. MST cost = 2+2.236+2.236 = 6.472 (same as before?). Actually sum=6.472. Add two cheapest from root: 0-4 (3.66) and 0-3 (4.537) → total modified 1‑tree = 6.472+3.66+4.537=14.669. Then lower bound L(λ) = modified cost - 2∑ lam (i=1..4) = 14.669 - 2*(0.463+0.463-0.463-0.463) = 14.669 - 0 = 14.669? Wait sum lam = 0, so L=14.669. That’s higher than UB! That can’t be – we must have made an error. The lower bound should never exceed UB because UB is feasible tour cost. Let’s recompute.
The formula: L(λ) = min1tree ( sum c_ij^λ ) - 2∑{i≠1} λ_i. Our sum c_ij^λ = 14.669. λ sum = 0. So L=14.669 > UB=12.796. This indicates that our 1‑tree computation might be wrong. Let’s double-check.
When we add two cheapest edges from root, we must add the two smallest among all edges from root to any node in the MST (which now includes all non‑root nodes). With modified costs, the cheapest from root are: 0-4 (3.66), 0-3 (4.537), 0-1 (2.699? Wait earlier I computed 0-1 = 2.236+0.463=2.699, which is cheaper than 0-4 (3.66)! Let’s recalculate.
Modified costs for root edges: For j=1: dist(0,1)=2.236 + lam1 (0.463) = 2.699; j=2: 3.162 + lam2(0.463)=3.625; j=3: 5 + lam3(-0.463)=4.537; j=4: 4.123 + lam4(-0.463)=3.66. So the two smallest are 0-1 (2.699) and 0-2 (3.625). So the 1‑tree should include those, not 0-4 and 0-3. I made a quick mistake. Let’s correct.
Two cheapest root edges: 0-1 (2.699) and 0-2 (3.625). Then modified 1‑tree cost = MST modified cost (6.472) + 2.699+3.625 = 12.796. So total = 12.796. Then L(λ)=12.796 - 0 = 12.796. That equals UB exactly! In fact, the 1‑tree we just found is the same as the nearest neighbor tour? The edges: MST edges (1-4,1-2,2-3) plus root 0-1,0-2. This forms a graph: 0-1, 1-2, 2-3, 1-4, and 0-2. That graph has a cycle 0-1-2-0, and leaves 3 and 4 attached as leaves. It is not a tour. But its modified cost equals the UB tour cost? That’s coincidence (UB tour is 0-1-4-3-2-0, cost 12.796). So L=12.796 = UB. This is the best possible – we have reached dual optimum. The gap is zero.
In practice, the two are rarely equal, but this small example shows that with correct multipliers the bound can match the heuristic tour. The final bound after convergence may be exactly the optimal tour length if the instance is nice.
8. Real‑World Applications
The Held–Karp bound and the 1‑tree Lagrangian algorithm are not merely academic. They form the backbone of many practical TSP solvers. Here are a few real‑world domains:
8.1 Vehicle Routing and Logistics
Delivery companies must plan routes for hundreds of vehicles. The Vehicle Routing Problem (VRP) is a generalization of TSP with multiple vehicles and capacity constraints. Most high‑quality VRP solvers (e.g., OR‑Tools, LKH) use a TSP solver as a core subroutine. The Lagrangian 1‑tree algorithm provides an excellent lower bound that guides branch‑and‑bound search trees, allowing exact solution of instances with up to a few hundred cities. For larger instances, the same bound is used within branch‑and‑cut algorithms.
8.2 Circuit Design and VLSI Layout
In printed circuit board (PCB) and VLSI design, a common subproblem is to find the shortest path to drill holes or to route wires. TSP appears in drill‑path optimization (minimize travel time of a drill head) and wire routing (minimize total wire length while connecting pins on a chip). The 1‑tree relaxation produces tight bounds that allow efficient exact solvers for boards with dozens of perforations – a significant improvement over simple heuristics.
8.3 DNA Sequencing and Genome Assembly
One of the earliest applied uses of TSP was in DNA sequencing via hybridization. When reconstructing a DNA sequence from short fragments, the problem reduces to finding a Hamiltonian path in a graph of fragment overlaps (or a shortest superstring). This is a variant of TSP (the shortest common superstring problem). Lagrangian relaxation with 1‑trees (or directed versions) is used to compute lower bounds that help limit the search space. For instance, the Concorde TSP solver, which incorporates the Held–Karp bound, has been used to solve genome assembly instances with hundreds of fragments.
8.4 Job Scheduling and Production Planning
In manufacturing, TSP arises in scheduling a single machine to process jobs with sequence‑dependent setup times. Minimizing makespan is equivalent to TSP. Lagrangian relaxation provides a way to decompose the problem: for each job, we can relax the “visit once” constraint, leading to a problem similar to the 1‑tree. The multipliers then reflect the opportunity cost of scheduling a job at a certain time.
9. Variations and Extensions
The basic 1‑tree algorithm can be extended in several ways:
- Asymmetric TSP: Use a 1‑arborescence (directed version) and Lagrangian multipliers on indegree/outdegree. The bound is still tight.
- Multiple 1‑trees: Use a combination of different root nodes to tighten the bound further.
- Incorporating Subtour Elimination: The 1‑tree does not explicitly prohibit subtours. For very tight bounds, one can add additional Lagrangian terms for subtour elimination constraints, but that increases complexity.
- Branch and Bound: The lower bound from the dual can be used in a branch‑and‑bound tree. Each node computes a bound, and if it exceeds the current UB, the node is pruned. The multipliers from the parent node often provide a warm start, accelerating convergence.
Conclusion
Designing a Lagrangian relaxation algorithm for the Traveling Salesman Problem using 1‑trees is a powerful and elegant way to obtain tight lower bounds. By relaxing the degree constraints and optimizing the multipliers with subgradient methods, we achieve the Held–Karp bound, which is often within a few percent of the optimum. The algorithm is not only theoretically significant but also practically useful in logistics, chip design, genomics, and beyond. With the code skeleton and examples provided, you now have a solid foundation to implement this algorithm yourself and even extend it to more complex routing problems. The beauty of the method lies in its simplicity: a few lines of MST code plus a subgradient loop can yield bounds accurate enough to solve TSP instances that would otherwise be intractable. Whether you’re a researcher exploring combinatorial optimization or an engineer building a route planner, the 1‑tree Lagrangian relaxation is an indispensable tool in your arsenal.
Designing A Lagrangian Relaxation Algorithm For The Traveling Salesman Problem With 1-Trees
Introduction: Why 1-Trees and Lagrangian Relaxation?
The Traveling Salesman Problem (TSP) stands as one of the most studied NP-hard optimization problems. Its deceptively simple formulation—find the shortest Hamiltonian cycle visiting every node exactly once—belies immense computational complexity. For decades, researchers have developed exact algorithms (branch-and-bound, branch-and-cut) and heuristics (nearest neighbor, Lin-Kernighan), but the most powerful exact approaches rely on tight lower bounds.
One of the most elegant and historically significant bounding techniques is the 1-tree relaxation combined with Lagrangian relaxation. Conceptually, a 1-tree is a spanning tree on all nodes except a designated “special” node (say node 1), plus two edges connecting that special node to the tree. This structure approximates a Hamiltonian cycle: remove one edge from a cycle and you get a spanning tree; a cycle must enter and leave the special node exactly twice. By associating Lagrange multipliers (penalties) with each node’s degree constraints, we can systematically tighten the 1-tree bound until it converges to the optimal TSP tour length.
This post dives into the advanced design of such an algorithm: how to handle degeneracy, accelerate convergence, avoid numerical pitfalls, and integrate with modern branch-and-bound frameworks. Whether you are implementing a TSP solver from scratch or studying combinatorial optimization, mastering Lagrangian relaxation on 1-trees provides a foundation for other constrained tree problems (e.g., vehicle routing, Steiner trees).
1. The Mathematical Core: From TSP to Lagrangian Dual
1.1 TSP Formulation and Relaxation
Let ( G = (V, E) ) be a complete undirected graph with ( n = |V| ), edge costs ( c_e \ge 0 ). A Hamiltonian cycle can be characterized as a 2-regular connected spanning subgraph where each vertex has degree exactly 2.
The 1-tree relaxation drops the degree constraints for all but one vertex (say vertex 1). A 1-tree is a spanning tree on ( V \setminus {1} ) (a minimum spanning tree – MST) plus two edges incident to vertex 1. It satisfies:
- Vertex 1 has degree 2.
- Every other vertex has degree at least 1 (by tree connectivity), but the degree is unrestricted.
The minimum-cost 1-tree is computed easily: find the MST on the ( n-1 ) other vertices, then add the two cheapest edges from vertex 1 to that tree. This yields a lower bound, but it is typically weak because the degrees of vertices ( i \neq 1 ) can deviate wildly from 2.
1.2 Lagrangian Relaxation of Degree Constraints
We want to penalize vertices whose degree in the 1-tree is not 2. Introduce a Lagrange multiplier ( \lambda_i \in \mathbb{R} ) for each vertex ( i \neq 1 ). The Lagrangian function for the TSP becomes:
[ L(\lambda) = \min*{T \text{ (1-tree)}} \left{ \sum*{e \in T} ce + \sum{i \neq 1} \lambda_i (d_T(i) - 2) \right} ]
where ( dT(i) ) is the degree of vertex ( i ) in the 1-tree. Equivalently, we can modify edge costs: ( \tilde{c}{ij} = c_{ij} + \lambda_i + \lambda_j ), because each edge incident to ( i ) contributes ( \lambda_i ) when the degree of ( i ) is counted. (Vertex 1 is special: no penalty applied to it, so edges to vertex 1 only have the other endpoint’s multiplier.)
Thus, the Lagrangian problem reduces to finding a minimum-cost 1-tree with respect to modified edge costs ( \tilde{c} ). This is trivial: compute MST on ( V \setminus {1} ) with ( \tilde{c} ), then add the two cheapest edges from 1 using ( \tilde{c} ).
The Lagrangian dual is:
[ L^* = \max_{\lambda \in \mathbb{R}^{n-1}} L(\lambda) ]
The optimal value ( L^* ) equals the cost of the optimal TSP tour when the multipliers are optimal (if the polyhedron of the TSP is integral, which it is for many instances). In practice, we obtain a lower bound that approaches the optimal tour cost.
2. Algorithmic Framework: Subgradient Optimization
We maximize ( L(\lambda) ) using subgradient ascent. Since ( L ) is concave piecewise linear, it has subgradients. For a given ( \lambda^k ), let ( T^k ) be an optimal 1-tree under modified costs. A subgradient ( g^k \in \mathbb{R}^{n-1} ) is:
[ g^ki = d{T^k}(i) - 2, \quad i \neq 1 ]
We update:
[ \lambda^{k+1}_i = \lambda^k_i + \alpha_k \cdot g^k_i ]
where ( \alpha_k > 0 ) is the step size. Choose ( \alpha_k ) to satisfy ( \sum \alpha_k \to \infty ) and ( \sum \alpha_k^2 < \infty ) for theoretical convergence, e.g., ( \alpha_k = \frac{\beta}{|g^k|} ) or ( \alpha_k = \frac{\beta_k}{|g^k|} ) with decreasing ( \beta_k ). Common practice: ( \beta_k = \rho_k \cdot (UB - L(\lambda^k)) / |g^k| ) where ( UB ) is an upper bound on the optimal tour (from a heuristic) and ( \rho_k \in (0, 2] ). This “Polyak step” ensures that the bound never exceeds the optimal tour cost.
Example: Pseudocode Sketch
def lagrangian_tsp_1tree(c, n, max_iter=500, rho=1.5):
lam = [0.0] * (n-1) # multipliers for nodes 2..n
best_lower = -inf
# compute an upper bound via e.g. Christofides or nearest neighbor
UB = get_upper_bound(c, n)
for k in range(max_iter):
# modify edge costs
mod = [[c[i][j] + (lam[i-1] if i>0 else 0) + (lam[j-1] if j>0 else 0)
for j in range(n)] for i in range(n)]
# compute MST on nodes 2..n (index 1..n-1 in 0-based)
mst = prim_mst(mod, start=1, valid_nodes=range(1,n))
# add two cheapest edges from node 0 (special vertex 1) to any node in MST
edges1 = [(mod[0][i], i) for i in range(1,n)]
edges1.sort()
tree = mst + [edges1[0], edges1[1]] # cost + nodes
lower = sum(e[0] for e in tree)
# degrees d_i for i>=2
deg = [0]*(n)
for e in tree:
deg[e[1]] += 1
deg[e[2]] += 1
# subgradient
sub = [deg[i] - 2 for i in range(1,n)]
# step size
norm = sum(x*x for x in sub)**0.5
if norm < 1e-6:
break
step = rho * (UB - lower) / (norm*norm) # Polyak step
for i in range(n-1):
lam[i] += step * sub[i]
best_lower = max(best_lower, lower)
# optionally decrease rho every stagnation step
return best_lower
3. Advanced Techniques and Edge Cases
3.1 Handling Degeneracy: Ties and Multiple Optimal 1-Trees
A critical issue: at any iteration, the optimal 1-tree under modified costs may not be unique. If multiple 1-trees have the same cost but different degree patterns, the subgradient we compute is only one element of the subdifferential. This can cause slow convergence or oscillation.
Strategy: Use a bundle method or aggregate subgradients from a set of near-optimal 1-trees. When the MST is degenerate (equal edge costs), explore alternatives. A practical approach: store the last few 1-trees and, if the subgradient direction does not improve, compute a convex combination of their subgradients. More advanced: implement a cutting plane or analytic center cutting plane method (ACCPM) for more robust convergence.
3.2 Step Size and Convergence Acceleration
The Polyak step is sensitive to the initial multiplier values. Common pitfalls:
- Initial multipliers: Setting ( \lambda_i = 0 ) works but often produces a huge gap initially. Better: initialize ( \lambda_i ) as ( \max(0, \text{minimal feasible penalty}) ) or from a dual ascent heuristic.
- Too large step: Can overshoot, causing the lower bound to exceed the optimal (not possible if UB is correct). Small adjustments: clip the step if ( L(\lambda^{k+1}) < L(\lambda^k) ).
- Stagnation: If the bound does not improve for several iterations, reduce ( \rho_k ) (e.g., halve it). Counter-reset: occasionally reinitialize multipliers from a past best point (restarting).
- Adaptive step: Use volume algorithm (a variant of subgradient) that keeps an average of past subgradients to smooth the direction. Or implement a Gaussian stepsize based on the variance of subgradients.
3.3 Dealing with “Star” Tree: Avoiding Trivial 1-Trees
When multipliers are too high, the modified costs can become negative for some edges, causing the MST to include many edges (forming a star) and then the two edges from vertex 1 may be both to the same node (or cause cycles?). Ensure the 1-tree is a simple tree: the MST plus two edges from 1 should not create a cycle (standard algorithm does this correctly). However, if many edges are negative, the MST may include all edges of the complete graph? No, MST on ( n-1 ) vertices will have exactly ( n-2 ) edges, so it’s fine.
Edge case: The two cheapest edges from vertex 1 might be the same if we use undirected graph (impossible). But if edge (1,i) appears twice? No, they are distinct by nature.
3.4 Numerical Stability
Edge costs often are floating-point numbers. Computing MST with floating-point comparisons can produce inconsistent degrees due to rounding. Use integer costs if possible, or apply a small epsilon when comparing near-equal edges. When modifying costs by adding ( \lambda_i + \lambda_j ), the values can become arbitrarily large or negative. Keep a decay factor or reset multipliers if they exceed a threshold (e.g., 100 times the max edge cost) – very large multipliers indicate the tree is covering that vertex too much.
3.5 Scaling to Large Instances
The algorithm requires solving MST in each iteration. With a good implementation (Prim using Fibonacci heap) the complexity is ( O(m + n \log n) ) per iteration, typically ( O(n^2) ) for dense graphs. For ( n=1000 ), each MST takes ~10^6 operations; with 500 iterations that’s 5e8 – manageable. For n=10000, MST becomes heavy (100 million ops per iteration). Optimizations:
- Use static edge list and maintain a priority queue with lazy deletion for Prim’s algorithm; or use binary heap.
- Reuse previous MST: after small changes in multipliers, the MST may not change much. Use sensitivity analysis to update incrementally. However, full recomputation is often simpler and avoids bugs.
- Averaged subgradient: reduce iterations by improving step quality (e.g., using barycentric steps).
4. Best Practices and Common Pitfalls
4.1 The Upper Bound (UB) Must Be Valid
Polyak step size ( \alpha_k = \rho_k \frac{UB - L(\lambda^k)}{|g^k|^2} ) relies on UB being an upper bound on the optimal TSP tour. If UB is too low (i.e., not a valid upper bound), the step becomes negative, decreasing multipliers and possibly diverging. Always verify UB via a constructive heuristic (nearest neighbor, 2-opt improvement, or Christofides for metric TSP). For non-metric graphs, use any feasible tour.
4.2 Stop Criteria
Do not rely solely on a maximum iteration count. Implement:
- Relative gap: ( (UB - L(\lambda^k))/UB < \epsilon )
- Subgradient norm: if ( |g^k| < \delta ), all degrees are nearly 2, so the 1-tree is a tour (yet it might not be feasible due to subtours? Actually if degrees all = 2 and the graph is a single cycle, it’s a valid tour. But the 1-tree could still have subtours? A 1-tree is connected by definition, so all degrees = 2 implies a Hamiltonian cycle.)
- Stagnation: if no improvement for ( p ) iterations, break.
4.3 The 1-Tree May Not Be a Tour Even at Optimal Multipliers
Because we only penalize degrees, the final 1-tree may have all degrees = 2 but still contain a disconnected cycle? No, a connected graph with all degrees 2 is a single cycle (a tour). Therefore at optimal multipliers, if the 1-tree is unique and degree-regular, we get a tour. This tour is optimal only if the primal-dual pair satisfies complementary slackness. In practice, the Lagrangian bound often approaches the optimum but remains slightly below; the final 1-tree may not be a tour (some degrees ≠ 2). This is the duality gap.
4.4 Handling Vertices with Degree Zero in MST
During MST computation on ( V \setminus {1} ), the tree must span all those vertices. If some vertex disappears because its incident edges all have infinite cost? No, graph is complete, so MST always exists. Ensure the graph is complete for the relaxation; otherwise, if sparse, we must add artificial high-cost edges.
4.5 Memory and Code Organization
For large ( n ), storing an ( n \times n ) cost matrix is prohibitive. Use a function that computes cost on the fly or store only a sparse representation. The Lagrangian relaxation algorithm with full matrix recomputation every iteration becomes inefficient. Instead, modify the costs by adding ( \lambda_i ) to each row (except special node). Then in Prim, we only need to query edge costs plus appropriate lambdas. We can precompute base costs and update row-wise.
5. Deeper Insights and Advanced Extensions
5.1 Relationship to Minimum Spanning Tree Polytope
The 1-tree relaxation is a linear programming relaxation over the spanning tree polytope with an additional constraint for vertex 1. The Lagrangian dual corresponds to the linear programming dual. The optimal multipliers characterize the optimal face of the TSP polytope. This relationship is the foundation for the Held-Karp lower bound, which is known to be very tight—often within 1% of optimal for random Euclidean instances.
5.2 Quadratic Convergence via Bundle Methods
Pure subgradient has sublinear convergence. For high-precision bounds, switch to a bundle method that maintains a convex model of the objective. The bundle method generates a cutting plane approximation:
[ L(\lambda) \approx \min_{t} { L(\lambda^j) + \langle g^j, \lambda - \lambda^j \rangle } ]
Then solves a quadratic program at each iteration to find a new candidate. This can converge in tens of iterations, but each iteration is more expensive. For large ( n ), the QP can be prohibitive. An alternative is the Proximal Bundle which keeps a trust region.
5.3 Integrating into Branch-and-Bound
A Lagrangian bound of 0.5% optimal can prune many nodes. However, when branching on edges (e.g., set edge e must be in tour vs. not), the 1-tree relaxation must be adapted: fixing an edge can be handled by setting its cost to 0 (or a large penalty) and adjusting multipliers. Better: use Lagrangian multipliers as branching penalties or strong branching on fractional edges.
5.4 Asymmetric TSP
For the asymmetric TSP (ATSP), the 1-tree concept does not directly apply. Instead, one uses assignment problem relaxation with Lagrangian relaxation on subtour elimination constraints. The same techniques (subgradient, Polyak step) apply, but the subproblem becomes a minimum-cost assignment.
6. Experimental Observations and Tuning
From practical experience:
- Setting ( \rho_0 = 2.0 ) and halving when no improvement for 10 iterations works well.
- Average iterations until convergence for TSPLIB instances with n~100-1000: 200-500.
- The final lower bound for symmetric TSP is often within 0.1%–0.5% of optimal. For Euclidean instances, the gap is smaller.
- Degeneracy is common: about 20% of iterations produce multiple optimal 1-trees. Implementing a simple pegging technique (freeze multipliers for vertices with degree 2) can accelerate.
Code Optimization Tip
Avoid recomputing the entire modified cost matrix. Instead, maintain a cost vector per vertex and update only the affected vertices after multiplier changes? Actually, because every edge cost changes with both endpoints, it’s easier to compute on the fly: in Prim, when considering adding edge (i,j), compute modified cost as base(i,j) + lam[i-1] + lam[j-1] (with lam[0] for vertex 1 being zero). This is cheap.
Conclusion
Designing a Lagrangian relaxation algorithm for the TSP using 1-trees is a beautiful blend of linear programming theory, combinatorial optimization, and numerical heuristics. The algorithm’s success depends on careful handling of degeneracy, adaptive step sizes, and robust stopping conditions. While modern TSP solvers (like Concorde) favor linear programming with cutting planes, the Lagrangian approach remains pedagogically valuable and practically useful for problems where LP solvers are unavailable or too slow. Moreover, the techniques—subgradient optimization, bundle methods, the 1-tree relaxation—transfer directly to many other combinatorial optimization problems, from Steiner trees to facility location.
By mastering the advanced design considerations in this post—particularly the edge cases, numerical stability, and convergence acceleration—you will be able to implement a tight lower bound that is fast, robust, and production-ready. Whether you use it as a bounding tool in branch-and-bound or as a standalone lower bound for heuristics, the Held-Karp 1-tree lagrangian remains one of the most elegant weapons in the TSP solver’s arsenal.
Conclusion: The Elegant Dance of Relaxation and Optimization
You’ve journeyed through the intricacies of the Traveling Salesman Problem—a puzzle that has captivated mathematicians, computer scientists, and operations researchers for over half a century. We began with the raw formulation: a salesman must visit every city exactly once and return to the start, minimizing total distance. Simple to state, notoriously hard to solve. Then we peeled back layers, introducing the 1‑tree as a relaxation that strips away the degree‑2 constraints, leaving us with a spanning tree plus one additional edge. And finally, we wielded Lagrangian relaxation—a technique that converts hard constraints into soft penalties, adjusted iteratively until the bounds converge.
Now, as we stand at the end of this exploration, it’s time to reflect on what we’ve built, why it matters, and where you can go from here. This conclusion is not a mere afterthought—it’s your launchpad into deeper understanding and practical application.
What We’ve Learned: A Recap of the Core Ideas
At the heart of our design lies a fundamental trade‑off: exact solutions to TSP are computationally prohibitive for all but the smallest instances, yet we need meaningful lower bounds to guide branch‑and‑bound or to assess the quality of heuristic tours. Lagrangian relaxation gives us a principled way to obtain those bounds by dualizing the degree constraints that force each node to have exactly two incident edges in a TSP tour.
We started with the 1‑tree formulation. A 1‑tree is a minimum spanning tree on all nodes except one (the root), plus two edges connecting the root to the tree. This structure satisfies one crucial property: it is easy to compute efficiently (via Prim’s or Kruskal’s algorithm in (O(n^2))), yet it lacks the degree constraints that make TSP hard. The breakthrough is that any TSP tour is also a 1‑tree—specifically, a 1‑tree where every node has degree exactly 2. So if we can “push” the 1‑tree toward satisfying those degree constraints, we get a good approximation of the optimal tour cost.
The Lagrangian approach does exactly this. We introduce a vector of Lagrange multipliers (\lambda_i), one per city, penalizing deviations from degree 2. The resulting Lagrangian function becomes:
[ L(\lambda) = \min*{T \in \mathcal{T}_1} \left( \sum*{(i,j) \in T} c*{ij} + \sum*{i} \lambda_i (d_T(i) - 2) \right) ]
where (\mathcal{T}1) is the set of all 1‑trees. This minimization is again a 1‑tree problem, but with adjusted edge costs: (c{ij}’ = c_{ij} + \lambda_i + \lambda_j). Crucially, for any (\lambda), (L(\lambda)) is a lower bound on the optimal TSP cost. Our goal becomes to maximize this lower bound—the Lagrangian dual problem.
We then explored the subgradient optimization method to tune the multipliers. The subgradient at a given (\lambda) is simply (d_T(i) - 2), where (T) is the optimal 1‑tree under those adjusted costs. We walk step‑by‑step in the direction of the subgradient, shrinking the step size over time, and eventually converge to a bound that is remarkably close to the true optimum. This bound is known as the Held‑Karp bound—and it is the same as the linear programming relaxation of the subtour elimination formulation. In practice, for hundreds or even thousands of cities, this bound is typically within 1–2% of optimal.
We also discussed practical pitfalls: the subgradient method can oscillate, step sizes must be chosen carefully (e.g., using the Polyak step size or decaying schedules), and termination criteria should balance computational cost with bound quality. And we noted that for exact solving, this lower bound feeds into a branch‑and‑bound tree, pruning large swaths of the search space.
Actionable Takeaways for Your Own Implementation
Theory is satisfying, but the real reward comes when you roll up your sleeves and implement. Here are concrete steps you can take to build a Lagrangian‑based TSP solver today—whether for a class project, a research prototype, or a production system.
1. Start Simple: The 1‑Tree Core
Begin by coding a function that, given a cost matrix and a root node (say node 0), returns the minimum 1‑tree. This is two steps: (a) compute the minimum spanning tree on all nodes except the root, then (b) add the two cheapest edges connecting the root to that MST. Efficient implementations use Prim’s algorithm with a priority queue. Test on small instances (e.g., 10–20 cities from TSPLIB) to verify correctness.
2. Add the Lagrangian Loop
Wrap the 1‑tree routine in an iterative refinement:
def lagrangian_tsp_lower_bound(costs, max_iter=1000, tol=1e-6):
n = len(costs)
lambda_ = np.zeros(n)
best_bound = -np.inf
for iteration in range(max_iter):
# Adjust costs: c_ij += lambda_i + lambda_j
adjusted = costs + lambda_[:, None] + lambda_[None, :]
# Compute minimum 1-tree
T, degree, cost_1tree = min_1tree(adjusted, root=0)
# Lagrangian value
lagrangian_val = cost_1tree - 2 * sum(lambda_)
best_bound = max(best_bound, lagrangian_val)
# Subgradient: degree - 2
subgrad = degree - 2
# Compute step size (e.g., Polyak rule)
if np.linalg.norm(subgrad) < tol:
break
step_size = (1.0 * (best_bound - lagrangian_val)) / (np.linalg.norm(subgrad)**2)
# Update multipliers
lambda_ += step_size * subgrad
# Optional: clip to prevent huge oscillations
# Optional: reduce step size over time
return best_bound, lambda_
This skeleton gives you a working bound. Tune the step‑size scaling factor (the 1.0 above) – typical values are between 0.5 and 2. For higher reliability, use a decaying schedule: start with a factor of 2, cut it in half every time the bound fails to improve for (k) consecutive iterations.
3. Validate and Debug
Always compare your bound against a known optimal (for small instances) or against a running gap from a heuristic. If your bound is too low, suspect bugs in the 1‑tree computation or in the Lagrangian adjustment. If the bound never improves, your step size may be too small—or the problem is degenerate (e.g., all edge costs equal). Use classic TSPLIB instances like burma14 or ulysses22 to verify.
4. Integrate with Branch‑and‑Bound
Once your Lagrangian bound is robust, embed it into a depth‑first or best‑first search. At each node, compute the bound; if it exceeds the current best tour cost, prune. For branching, select an edge to include/exclude, and recursively recompute the bound (which is fast because you reuse the same 1‑tree structure with updated costs). This combination can solve instances up to a few hundred cities optimally in reasonable time.
5. Consider Extensions
- Asymmetric TSP: The 1‑tree concept generalizes to directed graphs using “1‑arborescences”. The same Lagrangian idea applies, though the minimum spanning arborescence is slightly more involved.
- Prize‑Collecting TSP: If nodes can be skipped at a penalty, the Lagrangian dual can incorporate penalties into the objective.
- Parallelization: The subgradient updates are embarrassingly sequential, but you can parallelize the 1‑tree computations inside a branch‑and‑bound tree.
Further Reading and Next Steps
No single blog post can cover all the depth of Lagrangian relaxation in combinatorial optimization. If you’re hungry for more, here is a curated path to deepen your expertise.
Classic Papers
Held, M., & Karp, R. M. (1970). “The traveling-salesman problem and minimum spanning trees.” Operations Research, 18(6), 1138–1162.
The seminal work that introduced the 1‑tree relaxation and the Lagrangian approach. Elegant, readable, and still relevant.Held, M., & Karp, R. M. (1971). “The traveling-salesman problem and minimum spanning trees: Part II.” Mathematical Programming, 1(1), 6–25.
Extends the theory, provides deeper convergence analysis, and shows computational results.
Books
Wolsey, L. A. (1998). Integer Programming. Wiley.
Chapter 10 on Lagrangian relaxation is a masterclass in both theory and practical implementation. Examples include TSP, facility location, and set covering.Nemhauser, G. L., & Wolsey, L. A. (1988). Integer and Combinatorial Optimization. Wiley.
A comprehensive reference. The section on subgradient optimization is particularly valuable.Lawler, E. L., Lenstra, J. K., Rinnooy Kan, A. H. G., & Shmoys, D. B. (Eds.). (1985). The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley.
Still one of the best books on TSP. Contains a chapter on lower bounds that contextualizes Lagrangian methods among other approaches.
Modern Developments
Branch‑and‑cut frameworks (e.g., Concorde TSP solver) use the Held‑Karp bound as a starting point, then strengthen it with additional cutting planes. Concorde can solve instances with tens of thousands of cities. Studying its design will show you how Lagrangian bounds fit into a larger optimization engine.
Machine learning for multipliers: Recent work explores using reinforcement learning or neural networks to predict good Lagrange multipliers, accelerating convergence. Look up papers by Bello et al. (2016) or Khalil et al. (2017) on learning to branch and to set dual variables.
Multi‑commodity flow relaxations for TSP are another avenue for lower bounds, though they are more expensive. Understanding the trade‑offs between different relaxations will sharpen your algorithmic intuition.
Hands‑On Projects
Implement the full Held‑Karp branch‑and‑bound in Python, using the
networkxlibrary for MST computations. Test on random Euclidean instances of 50–100 cities. Compare the number of nodes explored with and without Lagrangian bounds.Experiment with subgradient step‑size rules. Implement Polyak, constant, and decaying schedules. Plot the bound vs. iteration to see which converges fastest. Tune parameters.
Extend to the Prize‑Collecting TSP. Modify the 1‑tree relaxation to allow not visiting some nodes at a penalty. The Lagrangian dual then becomes a powerful tool for approximating solutions when budgets are tight.
Visualize the multipliers. After convergence, plot the (\lambda_i) values on a map of cities. Often, high multipliers correspond to isolated cities that the relaxed solution “wants” to force into degree 2. This can provide intuitive insights.
A Closing Thought: Why This Matters Beyond TSP
The story of Lagrangian relaxation for TSP is more than an algorithmic trick—it’s a blueprint for approaching hard combinatorial problems across domains. Whenever you face a problem with a small set of “complicating” constraints that, if removed, leave a tractable subproblem, consider dualizing those constraints. The framework works for scheduling, network design, resource allocation, and even machine learning (e.g., dual decomposition for structured prediction).
What makes the TSP example so compelling is its elegance: the 1‑tree is both simple and surprisingly powerful. It captures the essential connectivity of a tour without the degree restrictions, and the Lagrangian process gently guides it back toward feasibility. The resulting bound is often the gold standard for lower bounds—used in virtually every high‑performance TSP solver today.
As you implement and experiment, remember that algorithm design is not a cold, mechanical exercise. It is a dance between structure and relaxation, between rigidity and flexibility. The 1‑tree gives you structure; Lagrangian relaxation gives you the freedom to break and rebuild. The art is in choosing the right penalties and knowing when to stop refining.
So go ahead—write that code, run those experiments, and marvel at the fact that a simple tree plus a loop around a single node can whisper the secrets of the optimal tour. The Traveling Salesman Problem remains unsolved in full generality, but with tools like Lagrangian relaxation, we edge ever closer. And along the way, we learn lessons that resonate far beyond the road—in optimization, in mathematics, and in the joy of solving hard problems, one bound at a time.