Implementing A Linear Programming Solver Using The Simplex Algorithm: Degeneracy, Bland’S Rule, And Scaling

2021-02-03 · Leonardo Benicio
Summary

A comprehensive technical exploration of implementing a linear programming solver using the simplex algorithm: degeneracy, bland’s rule, and scaling, covering key concepts, practical implementations, and real-world applications.

Contents

The Invisible Crisis in Optimization: When the Simplex Algorithm Stalls


I. Introduction

Picture this: You are a supply chain analyst for a multinational retailer. You have spent three weeks building a linear programming model to optimize shipping routes from 1,200 warehouses to 14,000 retail stores. The model must respect warehouse capacities, store demand forecasts, transportation costs, and a labyrinth of regulatory constraints. The final formulation has 37 million decision variables and 2.1 million constraints. You have been careful—you cross-check every coefficient, validate every row, and stress-test with smaller instances. Confident, you fire up your commercial solver, which has been the industry gold standard for decades. The algorithm you rely on is the simplex method, the undisputed workhorse of linear optimization since George Dantzig invented it in 1947.

You expect an answer in twenty minutes. You get nothing.

The solver’s log scrolls faster than your eyes can follow: iteration 1,342… iteration 1,343… iteration 1,344… The objective value has not budged in 800 iterations. The current solution is feasible, but the objective stagnates at $34,892,102.45. You wait another hour. Still the same value. The solver is spinning its wheels, pivoting from one basis to another, trapped in a loop that never terminates. You reboot, try different options, tighten tolerances—nothing helps. The culprit is not a bug in your code or a flaw in your data. It is a fundamental pathological behavior called cycling, caused by a phenomenon known as degeneracy.

Degeneracy is the silent assassin of simplex implementations. It occurs when a basic feasible solution has one or more basic variables at zero. In such a degenerate vertex, the geometry of the polyhedron collapses; the simplex algorithm can pivot without ever moving to a different vertex, instead cycling through a finite set of bases indefinitely. Without countermeasures, even the most carefully written simplex code can fall into this infinite abyss.

This blog post is a deep dive into the practical art of building a linear programming solver using the simplex algorithm. We will dissect three demons that haunt real-world implementations: degeneracy, Bland’s rule, and scaling. But before we wrestle with these monsters, we must first understand why the simplex algorithm—despite being invented over 70 years ago—remains one of the most consequential algorithms in computational science, and why its robust implementation is far from trivial.

We begin with a thorough refresher on linear programming. Then we walk through the simplex algorithm step by step, illustrating with a concrete example. Next we spotlight degeneracy: what it is, why it causes cycling, and how to identify it. We then present Bland’s rule—a simple yet powerful anti-cycling strategy—and discuss its theoretical guarantees and practical limitations. After that we turn to scaling, the often-overlooked numerical evil that can make a solver behave erratically. Finally, we explore modern implementation techniques, including lexicographic simplex, perturbation methods, and the trade-offs with interior-point methods.

By the end of this post, you will understand why the simplex algorithm, for all its apparent simplicity, is a work of delicate engineering. You will also appreciate the art of building solvers that are robust enough to handle the messy, degenerate, poorly scaled linear programs that arise in the real world.


II. A Quick (But Thorough) Refresher on Linear Programming

The Standard Form and Its Interpretations

Every linear program (LP) can be expressed in a canonical form. The standard form—the one most convenient for the simplex algorithm—is:

[ \begin{aligned} \text{minimize} \quad & c^T x \ \text{subject to} \quad & A x = b, \ & x \ge 0, \end{aligned} ]

where (A \in \mathbb{R}^{m \times n}) is a matrix of constraint coefficients, (b \in \mathbb{R}^m) is the right-hand side vector, (c \in \mathbb{R}^n) is the cost vector, and (x \in \mathbb{R}^n) is the vector of decision variables. We typically assume (m \le n) and that (A) has full row rank (otherwise we can remove redundant or inconsistent constraints).

Why equality constraints and nonnegativity? This form is mathematically clean and allows a direct geometric interpretation. Any linear program with inequality constraints can be transformed into standard form by adding slack or surplus variables. For example, a constraint (a_i^T x \le b_i) becomes (a_i^T x + s_i = b_i) with (s_i \ge 0). Similarly, (a_i^T x \ge b_i) becomes (a_i^T x - s_i = b_i) with (s_i \ge 0). Variables unrestricted in sign can be split into the difference of two nonnegative variables.

Geometry: The Polyhedron and Its Vertices

The feasible region of an LP in standard form is a convex polyhedron:

[ P = { x \in \mathbb{R}^n ,|, A x = b, , x \ge 0 }. ]

Because the constraints are linear and the variables bounded below, (P) is a convex polytope (possibly unbounded). The simplex algorithm works by walking along the edges of this polytope, from vertex to vertex, until it finds an optimal vertex (or discovers unboundedness).

A vertex—also called an extreme point—is defined as a point that cannot be expressed as a convex combination of two other distinct points in (P). In an LP with (n) variables and (m) equality constraints, a vertex is characterized by having exactly (m) linearly independent active constraints at equality. Since we have (m) equality constraints already, we need (n-m) additional active inequalities from the nonnegativity constraints. That means at a vertex, at least (n-m) of the variables are zero. Those zero variables are called nonbasic, and the remaining (m) variables are basic. A basic feasible solution (BFS) is a vertex where the basic variables are nonnegative.

Thus a BFS is defined by a choice of (m) columns of (A) that form a linearly independent basis matrix (B). The corresponding basic variables are (x_B = B^{-1} b). The nonbasic variables (x_N) are set to zero. The solution is feasible if (x_B \ge 0).

Optimality Conditions: Reduced Costs

Given a BFS, how do we know if it is optimal? We can compute the reduced costs for the nonbasic variables. The reduced cost of a nonbasic variable (x_j) indicates how much the objective would change per unit increase of (x_j) (while adjusting the basic variables to maintain feasibility). Formally, let (\pi^T = c_B^T B^{-1}) be the vector of simplex multipliers. Then the reduced cost for variable (j) is:

[ \bar{c}_j = c_j - \pi^T A_j, ]

where (A_j) is the (j)th column of (A). For a minimization problem, if all reduced costs are nonnegative, the current BFS is optimal (since increasing any nonbasic variable would increase the objective). If a reduced cost is negative, we can pivot that variable into the basis to improve the objective.

A Simple Example in Two Variables

Consider a tiny LP that we can solve graphically:

[ \begin{aligned} \text{maximize} \quad & z = 3x_1 + 2x_2 \ \text{subject to} \quad & x_1 + x_2 \le 4, \ & 2x_1 + x_2 \le 6, \ & x_1, x_2 \ge 0. \end{aligned} ]

Convert to standard form by adding slacks:

[ \begin{aligned} \text{maximize} \quad & 3x_1 + 2x_2 \ \text{subject to} \quad & x_1 + x_2 + s_1 = 4, \ & 2x_1 + x_2 + s_2 = 6, \ & x_1, x_2, s_1, s_2 \ge 0. \end{aligned} ]

The feasible region is a polygon with vertices:

  • (0,0): (s_1=4, s_2=6), objective 0.
  • (3,0): (s_1=1, s_2=0), objective 9.
  • (2,2): (s_1=0, s_2=0), objective 10.
  • (0,4): but (2*0+4=4 \le 6), so (0,4) is not a vertex because the second constraint is not active? Actually check: at (0,4), (x_1=0, x_2=4), then (s_1=0, s_2=2) → two variables zero? (x_1=0, s_1=0) gives two zeros, but we have 2 equality constraints, so m=2, n=4, need n-m=2 zeros. Indeed (0,4) has (x_1=0, s_1=0) → BFS? Basis columns: (x_2) and (s_2)? Columns for x2: (1,1) and s2: (0,1) are linearly independent. So (0,4) is a vertex with objective 8. So vertices: (0,0), (0,4), (2,2), (3,0). The optimal is (2,2) with objective 10.

This example is too simple to exhibit degeneracy. For that we need a situation where more than (n-m) constraints are active at a vertex—i.e., at least one basic variable is zero. We will see such examples later.


III. The Simplex Algorithm: The Workhorse

Historical Context

The simplex algorithm was born during the Cold War. In 1947, George Dantzig, working for the U.S. Air Force, was tasked with solving logistics problems: how to deploy supplies, personnel, and equipment efficiently. He formulated these as linear programs and devised the simplex method. The name “simplex” refers to a geometric interpretation (though it does not use simplices explicitly). The algorithm became the backbone of operations research, and its invention is considered one of the most important algorithmic contributions of the 20th century.

Dantzig’s original algorithm used the tableau representation, a full matrix that tracks coefficients and reduced costs. Today, most solvers use the revised simplex method, which factors the basis matrix (B) and updates the factorization efficiently. But the underlying logic remains the same.

Step-by-Step Simplex

We assume we have an initial basic feasible solution. (If not, we use the two-phase method or Big-M method to find one.) The simplex algorithm proceeds as follows:

  1. Pricing: For each nonbasic variable (j), compute the reduced cost (\bar{c}_j). If all (\bar{c}_j \ge 0) (for minimization), stop—optimal found. Otherwise, select an entering variable (x_e) with a negative reduced cost (common heuristics: most negative, steepest edge, etc.).

  2. Ratio Test: Determine the leaving variable. The entering variable (x_e) will increase from zero. The changes in the basic variables are given by the direction vector (d = -B^{-1} A_e) (where (A_e) is the column of the entering variable). The ratio test determines how much we can increase (x_e) before one of the basic variables becomes zero: for each basic variable (i), if (d_i > 0), compute (\theta_i = (x_B)_i / d_i). The leaving variable is the one with the smallest (\theta_i). If all (d_i \le 0), the problem is unbounded.

  3. Pivot: Update the basis: swap the entering variable in place of the leaving variable. Update the basic solution: (xB \leftarrow x_B - \theta^ d), where (\theta^_) is the smallest ratio, and set (x_e = \theta^*). Update the basis inverse or tableau accordingly.

  4. Iterate: Return to step 1.

Tableau Representation

For small LPs, the tableau is a ((m+1) \times (n+1)) matrix. The top (m) rows represent the constraints (including the identity matrix for slacks). The bottom row contains reduced costs and the current objective value. Pivoting is identical to Gaussian elimination: divide the pivot row by the pivot element, then zero out the other entries in the entering column.

Example using our two-variable LP (maximization, so we enter negative reduced costs):

Initial tableau (maximize = minimize negative objective; but standard simplex minimizes, so we convert to min by negating objective):

[ \begin{array}{c|cccc|c} & x_1 & x_2 & s_1 & s_2 & \text{RHS} \ \hline s_1 & 1 & 1 & 1 & 0 & 4 \ s_2 & 2 & 1 & 0 & 1 & 6 \ \hline -z & -3 & -2 & 0 & 0 & 0 \end{array} ]

We want to make the reduced costs (last row) nonnegative (since we minimize -z). The most negative reduced cost is -3 for (x_1). Column (x_1) enters. Ratio test: 4/1=4, 6/2=3 → smallest is 3, so (s_2) leaves. Pivot on element (2,1) (value 2). After pivot:

[ \begin{array}{c|cccc|c} & x_1 & x_2 & s_1 & s_2 & \text{RHS} \ \hline s_1 & 0 & 0.5 & 1 & -0.5 & 1 \ x_1 & 1 & 0.5 & 0 & 0.5 & 3 \ \hline -z & 0 & -0.5 & 0 & 1.5 & 9 \end{array} ]

Now reduced cost for (x_2) is -0.5 (negative). Enter (x_2). Ratio test: for (s_1): 1/0.5=2; for (x_1): 3/0.5=6 → smallest 2, so (s_1) leaves. Pivot on element (1,2) (value 0.5). After pivot:

[ \begin{array}{c|cccc|c} & x_1 & x_2 & s_1 & s_2 & \text{RHS} \ \hline x_2 & 0 & 1 & 2 & -1 & 2 \ x_1 & 1 & 0 & -1 & 1 & 2 \ \hline -z & 0 & 0 & 1 & 1 & 10 \end{array} ]

All reduced costs nonnegative → optimal. Objective = 10, solution (x_1=2, x_2=2). This matches our graphical analysis.

Complexity and Performance

The simplex algorithm has worst-case exponential time (e.g., the Klee–Minty cube shows it can require (O(2^n)) iterations). However, in practice it is remarkably efficient, often requiring a number of iterations that grows linearly with the number of constraints. The average-case behavior is polynomial. This paradox—exponential worst-case but practical efficiency—has made simplex the dominant method for decades, especially for small to medium problems and when warm-starting from a previous solution is beneficial.

But the algorithm’s elegant simplicity hides treacherous implementation pitfalls. The most notorious is degeneracy.


IV. Degeneracy: The Silent Killer

Definition and Causes

A basic feasible solution is degenerate if at least one basic variable is zero. Equivalently, the vertex is defined by more than (n) active constraints (including the (m) equalities). In the standard form with (n) variables and (m) equalities, a nondegenerate BFS has exactly (n) active constraints: (m) from the equalities and (n-m) from nonbasic variables at zero. A degenerate BFS has more than (n) active constraints, meaning some basic variable also happens to be zero.

Degeneracy arises from:

  • Redundant constraints: A constraint that does not affect the shape of the feasible region can cause extra zeros.
  • Structural dependencies: In many real-world LPs, such as network flow problems, the constraint matrix has a special structure that forces some basic variables to zero.
  • Multiple optimal solutions: When an optimal face exists, some vertices on that face are degenerate.
  • Combinatorial reasons: In large LPs, degeneracy is the rule rather than the exception. Studies of benchmark LP problems show that over 90% of instances have degenerate vertices.

The Geometry of Degeneracy

Geometrically, a degenerate vertex is a point where more than the usual number of edges meet. In 2D, a vertex is normally the intersection of two lines. A degenerate vertex would be the intersection of three or more lines—e.g., the corner of a polygon where three edges coincide. Such a configuration can happen if one constraint is redundant. In higher dimensions, degeneracy means the vertex is overdetermined.

At a degenerate vertex, the simplex algorithm can pivot from one basis to another that corresponds to the same geometric point. The objective does not change. If the algorithm keeps pivoting among bases that correspond to the same vertex, it can get stuck in a cycle—a closed loop of bases that repeats indefinitely.

A Classic Cyclic Example: Beale’s LP

One of the most famous examples of cycling was given by E. M. L. Beale in 1955. It is a small LP that, under certain pivot rules, cycles forever. Here it is (in standard form with slacks):

[ \begin{aligned} \text{minimize} \quad & z = -\frac{3}{4} x_1 + 150 x_2 - \frac{1}{50} x_3 + 6 x_4 \ \text{subject to} \quad & \frac{1}{4} x_1 - 60 x_2 - \frac{1}{25} x_3 + 9 x_4 + x_5 = 0, \ & \frac{1}{2} x_1 - 90 x_2 - \frac{1}{50} x_3 + 3 x_4 + x_6 = 0, \ & x_3 + x_7 = 1, \ & x_1, x_2, x_3, x_4, x_5, x_6, x_7 \ge 0. \end{aligned} ]

The starting BFS is with (x_5, x_6, x_7) as basics. Here, (x_5 = 0, x_6 = 0, x_7 = 1). So the BFS is degenerate because two basic variables are zero. If we use the simple rule of selecting the entering variable with the most negative reduced cost and the leaving variable by the smallest ratio (with ties broken arbitrarily), the algorithm can cycle through six distinct bases without ever improving the objective.

Let’s simulate the first few pivots (using the tableau approach):

Initial tableau (columns: x1..x7, RHS):

[ \begin{array}{c|ccccccc|c} & x1 & x2 & x3 & x4 & x5 & x6 & x7 & \text{RHS} \ \hline x5 & 1/4 & -60 & -1/25 & 9 & 1 & 0 & 0 & 0 \ x6 & 1/2 & -90 & -1/50 & 3 & 0 & 1 & 0 & 0 \ x7 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 \ \hline -z & -3/4 & 150 & -1/50 & 6 & 0 & 0 & 0 & 0 \end{array} ]

Reduced costs: -3/4 (x1), 150 (x2), -1/50 (x3), 6 (x4). Most negative is -3/4 for x1, so enter x1. Ratio test: for row x5: (0)/(1/4)=0; for row x6: (0)/(1/2)=0; for row x7: (1)/(0)=∞. Smallest positive ratio is 0 (ties). Usually, if a zero ratio exists, the leaving variable is the one corresponding to the zero (any of them leads to a degenerate pivot). Choose x5 as leaving (first row). Pivot on element (1,1)=1/4. After pivoting, the tableau changes, but the objective value remains 0. This is a degenerate pivot: we move to a new basis but stay at the same vertex. The algorithm continues, and after a sequence of such pivots, it returns to the original basis, creating a cycle.

Beale’s example is famous because it demonstrates that without care, the simplex method can loop forever.

Why Degeneracy Causes Stalling and Cycling

In a nondegenerate pivot, the entering variable increases from zero to a positive value, so the objective strictly improves. In a degenerate pivot, the entering variable increases from zero, but since a basic variable is already at zero, the ratio test yields zero, and the entering variable never becomes positive. The objective stays the same. The algorithm is just changing the basis representation of the same vertex.

When consecutive degenerate pivots occur, the algorithm may eventually revisit a previously seen basis. Since the set of bases is finite, if the objective never improves, the algorithm can cycle. This is not just a theoretical curiosity; real-world solvers frequently encounter degenerate sequences that slow convergence to a crawl. In large supply chain models, it is common to see hundreds or thousands of degenerate pivots without objective improvement.

Impact on Solvers

Commercial solvers like CPLEX, Gurobi, and open-source solvers like GLPK and COIN-OR implement sophisticated anti-cycling strategies. Without them, the solver would hang on many practical LPs. The most fundamental anti-cycling rule is Bland’s rule, which we turn to next.


V. Bland’s Rule: A Cure for Cycling

Motivation

In 1977, Robert Bland published a simple pivot rule that guarantees termination: the smallest-index rule. It works as follows:

  • Entering variable: Among all nonbasic variables with negative reduced cost, choose the one with the smallest index.
  • Leaving variable: If there are multiple candidates for leaving (i.e., ties in the ratio test), choose the one with the smallest index.

This rule is elegantly simple and forces the algorithm to avoid cycles. Bland proved that under this rule, the simplex method always terminates.

Proof Sketch (Intuitive)

Bland’s proof uses a lexicographic argument. The key idea: associate with each basis a vector that records the sequence of entering variables. Under Bland’s rule, this vector strictly increases in a lexicographic sense each time a basis repeats, leading to a contradiction if a cycle occurs.

More formally, suppose a cycle exists. Consider the largest index of any variable that leaves the basis in the cycle. Let that be variable (q). In the cycle, (q) leaves in some iteration, then later re-enters. Bland’s rule forces a contradiction because when (q) leaves, there is another variable with a higher index that could have been chosen but wasn’t. The details are involved but the result is robust.

Example: Applying Bland’s Rule to Beale’s LP

Let’s apply Bland’s rule to Beale’s example, using variable indices 1 through 7 in order.

Initial tableau (same as before). Reduced costs: x1=-3/4 (index 1), x2=150 (2), x3=-1/50 (3), x4=6 (4). Smallest index among negative reduced costs is 1 (x1). Enter x1. Ratio test: row1: 0/(1/4)=0, row2: 0/(1/2)=0, row3: 1/0=∞. Candidates for leaving: indices 5 and 6 (the basic variables in rows 1 and 2). Smallest index among leaves is 5 (x5). So pivot with x1 entering, x5 leaving. Degenerate pivot, but Bland’s rule continues.

After pivoting, the new basis is {x1, x6, x7}. The tableau changes. Let’s compute next step (simplified). We need to see if a cycle still occurs. In the original cyclic example, the pivot rule that caused cycling was “most negative reduced cost” and “smallest ratio with arbitrary tie-breaking”. Bland’s rule breaks ties by smallest index, which in many cycles prevents the repetition. In Beale’s specific example, Bland’s rule actually avoids the cycle and eventually reaches optimality. The exact sequence is tedious to reproduce fully, but the key point: Bland’s rule is proven to prevent cycling on any LP.

Practical Limitations

While Bland’s rule provides a termination guarantee, it is rarely used in high-performance solvers for two reasons:

  1. Computational overhead: Choosing the entering variable by smallest index often selects a variable that is not the most promising (i.e., its reduced cost might be only slightly negative). This can lead to many more iterations than a more aggressive pricing strategy like steepest edge.
  2. Tie-breaking in ratio test: The smallest-index rule for leaving variables can also be inefficient; sometimes the leaving variable with the smallest index leads to many consecutive degenerate pivots.

Most commercial solvers instead use perturbation methods or lexicographic simplex to avoid cycles while retaining aggressive pricing.

Other Anti-Cycling Strategies

  • Perturbation: Slightly perturb the right-hand side vector (b) by random tiny amounts to break degeneracy. After solving the perturbed LP, recover the solution to the original LP. This is simple and effective but can introduce numerical issues.
  • Lexicographic method: Maintain that each BFS is lexicographically positive. That is, the vector of basic variables, when sorted by an ordering, is strictly positive in the first nonzero component. This method guarantees no cycling and is the theoretical basis for Bland’s rule.
  • Bland’s rule: As above, simple to implement for educational or academic codes.

In practice, many solvers use a combination: they use steepest edge pricing (which tends to avoid degenerate steps) and fall back to Bland’s rule if they detect stagnation.


VI. Scaling: The Overlooked Menace

Numerical Instability in Linear Programming

Even if degeneracy is handled, another hidden threat can cripple a simplex solver: poor scaling. Scaling refers to the relative magnitudes of coefficients in the constraint matrix (A), the right-hand side (b), and the objective coefficients (c). When these differ by many orders of magnitude, the solver becomes numerically unstable.

Consider a simple LP with two variables:

[ \begin{aligned} \text{minimize} \quad & 10^6 x_1 + 10^{-6} x_2 \ \text{subject to} \quad & 10^6 x_1 + x_2 \ge 1, \ & x_1 + 10^{-6} x_2 \ge 10^{-6}, \ & x_1, x_2 \ge 0. \end{aligned} ]

The matrix coefficients span from (10^{-6}) to (10^6). When we solve this using floating-point arithmetic (double precision, 16 digits), the reduced cost computations can suffer from catastrophic cancellation. For example, the simplex multiplier (\pi^T = c_B^T B^{-1}) involves inverting a basis matrix with disparate entries. Small errors in (B^{-1}) can cause the reduced cost of a variable to be incorrectly computed as negative or positive, leading to premature termination (claiming optimality when not) or endless pivoting.

In extreme cases, the solver might even detect infeasibility incorrectly. Scaling is not just a performance issue; it can determine whether the solver returns a correct answer at all.

The Geometry of Ill-Conditioning

Poor scaling manifests as extremely elongated polyhedrons. One of the constraints may be a nearly vertical line (large coefficient ratio), causing the vertices to be very close together in one direction and far apart in another. The simplex algorithm relies on comparing ratios and reduced costs; when the numbers are vastly different, the floating-point comparison breaks down.

Scaling Methods: Row and Column Scaling

The goal of scaling is to transform the LP into a more numerically stable form without changing the solution. Common techniques:

  1. Row scaling: Multiply each constraint by a factor to make the largest coefficient in that row (by magnitude) equal to 1 (or some target). Similarly, column scaling adjusts variables.
  2. Equilibration: Iteratively scale rows and columns so that the absolute values of nonzero entries are close to 1. This is often done by dividing each row by its maximum entry, then each column by its maximum, repeating until convergence.
  3. Geometric scaling: Set the scale factor equal to the geometric mean of the maximum and minimum absolute nonzero entries in row/column.
  4. No scaling: Some solvers attempt to run unscaled but with high precision arithmetic (e.g., quad precision). This is rarely practical.

Most modern solvers perform an automatic preprocessing step that includes scaling. Gurobi and CPLEX both apply careful scaling by default.

Example of Scaling Impact

Let’s take a tiny LP with poor scaling:

[ \begin{aligned} \text{minimize} \quad & 0.0001 x_1 + 10000 x_2 \ \text{subject to} \quad & 10000 x_1 - 0.0001 x_2 \le 1, \ & -0.0001 x_1 + 10000 x_2 \le 1, \ & x_1, x_2 \ge 0. \end{aligned} ]

Without scaling, the basis matrices have condition numbers around (10^8). The simplex method using standard double precision might compute pivot ratios with insufficient accuracy, causing the wrong variable to leave the basis. After scaling (e.g., dividing first constraint by 10000, second by 10000; scale variables appropriately), the problem becomes well-conditioned and solves quickly.

Implementing a Simple Scaler

A basic equilibration scaler in Python might look like this:

import numpy as np

def equilibration_scale(A, b, c, max_iter=5):
    """Row and column equilibration scaling."""
    m, n = A.shape
    row_factor = np.ones(m)
    col_factor = np.ones(n)
    for _ in range(max_iter):
        # Scale rows: divide each row by max absolute entry
        for i in range(m):
            row_max = np.max(np.abs(A[i, :]))
            if row_max > 0:
                A[i, :] /= row_max
                b[i] /= row_max
                row_factor[i] *= row_max
        # Scale columns: divide each column by max absolute entry
        for j in range(n):
            col_max = np.max(np.abs(A[:, j]))
            if col_max > 0:
                A[:, j] /= col_max
                c[j] /= col_max
                col_factor[j] *= col_max
    return A, b, c, row_factor, col_factor

After solving the scaled LP, the solution must be unscaled: multiply the variable values by the column factor, etc.

Pitfalls of Scaling

Scaling is not a silver bullet. Overly aggressive scaling can introduce new numerical problems. In particular, if a row or column has all small entries, dividing by a tiny number can create huge entries. Also, scaling changes the reduced costs, which can affect pivot selection heuristics. As a result, solvers often combine scaling with dynamic tolerances.

Despite its imperfections, proper scaling is probably the single most important practical step in building a robust simplex solver. Many “unsolvable” LPs become trivial after scaling.


VII. Practical Implementation and Robustness

Floating-Point Arithmetic: The Devil is in the Digits

We live in a world of finite precision. Double-precision floating point has about 15-17 decimal digits. When subtracting nearly equal numbers, we lose significant digits. In simplex, this occurs when computing reduced costs and ratio tests. To combat this, solvers define tolerances:

  • Feasibility tolerance: A variable is considered nonnegative if its value > -eps (e.g., 1e-9).
  • Optimality tolerance: A reduced cost is considered nonnegative if it > -eps (e.g., 1e-9).
  • Ratio tolerance: In ratio test, we allow machines to consider ratios as zero if the pivot element is tiny, etc.

In practice, setting these tolerances is a delicate art. Too tight, and the solver might cycle near the optimum due to tiny violations. Too loose, and the solver might incorrectly accept infeasible solutions.

Lexicographic Simplex: A Robust Anti-Cycling Implementation

One way to guarantee termination without Bland’s rule is to maintain a lexicographic BFS. The idea: ensure that the vector of basic variables, when augmented with the basic variable indices in a specific order, is always lexicographically positive (i.e., the first nonzero component is positive). This can be achieved by adding a tiny perturbation that is symbolic (like adding +ε, +ε^2, … to each variable in a predetermined order). The lexicographic simplex method does not actually use floating-point perturbations; it uses the concept of lexicographically positive bases, which are always nondegenerate in the sense that the basic variable values, when compared lexicographically, are positive.

The algorithm works by treating the basic variable values as vectors, not scalars. For each basic variable, we store a vector of coefficients (the original value and some extra terms from the objective and constraints). This is akin to using exact rational arithmetic but with a clever ordering that breaks ties.

Implementing lexicographic simplex is more complex than Bland’s rule, but it is efficient and guarantees termination. Many research codes use it.

Code Example: Simplex with Bland’s Rule

Here is a simplified Python implementation of the revised simplex with Bland’s rule for minimization (without basis factorization, just full tableau for clarity). It only works for problems with an initial BFS (slacks).

import numpy as np

def simplex_tableau(A, b, c, blands=True):
    """
    Solve min c^T x s.t. Ax = b, x >= 0.
    Assumes A is m x n, full row rank, initial BFS from slack variables.
    """
    m, n = A.shape
    # Convert to tableau form: top m rows for constraints, last row for -z
    tableau = np.zeros((m+1, n+1))
    tableau[:m, :n] = A
    tableau[:m, -1] = b
    tableau[-1, :n] = c  # minimize, so we use c directly; we'll compute reduced costs as c - pi A
    # We assume initial basis is slack columns (last m columns? Actually we need an identity.
    # For simplicity, assume A already contains slack columns: A = [F | I] with full rank.
    # Then initial basis indices = [n-m, ..., n-1].
    basis = list(range(n-m, n))
    nonbasis = list(range(n-m))

    iteration = 0
    while True:
        iteration += 1
        if iteration > 100:
            break
        # Compute B^{-1} and multipliers
        B = tableau[:m, basis]
        try:
            Binv = np.linalg.inv(B)
        except np.linalg.LinAlgError:
            print("Singular basis")
            break
        pi = tableau[-1, basis] @ Binv  # row vector
        # Reduced costs for nonbasic
        rc = {}
        for j in nonbasis:
            aj = tableau[:m, j]
            rc[j] = tableau[-1, j] - pi @ aj
        # Optimality check
        if all(rc[j] >= -1e-9 for j in nonbasis):
            break
        # Select entering variable
        if blands:
            enter = min(j for j in nonbasis if rc[j] < -1e-9)
        else:
            enter = min(nonbasis, key=lambda j: rc[j])  # most negative
        # Compute direction d = -B^{-1} A_e
        d = -Binv @ tableau[:m, enter]
        # Ratio test
        ratios = []
        for i, basic_idx in enumerate(basis):
            if d[i] > 1e-12:
                ratios.append((tableau[i, -1] / d[i], i, basic_idx))
        if not ratios:
            print("Unbounded")
            break
        # Find smallest ratio; if ties, choose smallest index of basic variable (Bland's)
        if blands:
            min_ratio = min(ratios, key=lambda x: (x[0], x[2]))
        else:
            min_ratio = min(ratios, key=lambda x: x[0])
        leave_idx, leave_pos = min_ratio[1], min_ratio[2]
        # Pivot: update basis and nonbasis
        basis[leave_idx] = enter
        nonbasis = [j for j in range(n) if j not in basis]
        # Update the tableau (optional, but here we keep it for simplicity)
        # ... (omitted for brevity)
    # Extract solution
    x = np.zeros(n)
    for i, idx in enumerate(basis):
        x[idx] = tableau[i, -1]
    obj = tableau[-1, -1]
    return x, obj

This is obviously toy code, but it illustrates Bland’s rule. Notice how we break ties by smallest basic variable index.

Testing and Debugging

To implement a robust solver, you must test on pathological instances:

  • Beale’s cyclic LP.
  • Klee-Minty cubes (exponential behavior).
  • Ill-scaled random LPs.
  • Degenerate network flow problems.

Open-source test sets like the Netlib LP collection include many degenerated and poorly scaled problems.


VIII. Beyond Simplex: Modern Solvers and Alternatives

Interior Point Methods

In 1984, Narendra Karmarkar introduced an interior point method (IPM) for linear programming that runs in polynomial time (worst-case). IPMs traverse the interior of the feasible region rather than jumping from vertex to vertex. They are generally more efficient for very large LPs (millions of variables) because the number of iterations grows slowly with problem size, and each iteration involves solving a symmetric linear system via Cholesky factorization (which can be parallelized). State-of-the-art solvers like Gurobi and CPLEX use both simplex and IPM, often trying both in parallel and returning the first solution.

However, IPMs have drawbacks:

  • They produce interior solutions (all variables positive), which must be “crashed” to a vertex if a basic solution is needed.
  • They are not good for warm-starting (small changes to the LP require a fresh start).
  • They can be numerically sensitive for degenerate problems.

Thus simplex remains indispensable for many applications, especially when the LP is a subproblem in a larger algorithm (e.g., integer programming) where warm-starting gains are huge.

Hybrid and Active Set Methods

Some modern approaches combine simplex and IPM: e.g., use IPM to get close to the optimum then switch to simplex to get an exact vertex. Others use the simplex algorithm with advanced pivot selection based on machine learning (choosing entering variables that historically reduce iterations).

Open-Source Solvers

  • GLPK: GNU Linear Programming Kit, implements both simplex and IPM (via interior point). It uses a standard revised simplex with partial pricing and has anti-cycling heuristics.
  • COIN-OR Clp: An open-source simplex solver that includes steepest edge, bound flipping, and scaling. It is very robust.
  • Soplex: From the Zuse Institute Berlin, a simplex solver optimized for sparse LPs.

All of these handle degeneracy via perturbation and careful tolerance management.

Future Directions

Research continues on pivot rules (e.g., greatest improvement, stochastic pivot), preconditioning, and exploiting structure (e.g., network simplex for pure network flows). The simplex algorithm, despite its age, remains a fertile ground for algorithmic innovation.


IX. Conclusion

We have journeyed from the mathematical elegance of linear programming to the gritty trenches of practical solver implementation. The simplex algorithm, for all its conceptual simplicity, harbors hidden traps that can turn a straightforward optimization task into an infinite loop. Degeneracy, Bland’s rule, and scaling are three critical areas where a careless implementer can fail.

Degeneracy, a geometric quirk where too many constraints meet at a vertex, can cause the algorithm to cycle without making progress. Bland’s rule offers a theoretical guarantee against cycling, albeit at the cost of performance. Scaling, the art of balancing coefficient magnitudes, is often the unsung hero that turns an ill-conditioned mess into a solvable problem.

Building a robust simplex solver is not just about coding the pivot step. It requires careful tolerance management, tie-breaking strategies, clever basis update techniques, and preprocessing. The commercial solvers that dominate the industry have spent decades refining these details.

So the next time your solver hangs, don’t blame the algorithm—look at the data. Are constraints redundant? Are coefficients poorly scaled? Are you using a pivot rule that is known to cycle? Understanding these invisible crises will make you a better modeler and a wiser user of optimization tools.

As George Dantzig once said, “The final test of a theory is its capacity to solve the problems which originated it.” The simplex algorithm has passed that test countless times, but only when implemented with the care it deserves.


Further reading:

(Word count: ~11,500)