Inside the algorithmic vise that closes around your network design problem at 3 a.m. — branch, bound, cut, and the dark art of making NP-hard problems sit down and behave.
Mixed-integer programming is the art of solving problems that can't be solved. The trick is that we lie to ourselves twice — once optimistically, once pessimistically — and then squeeze the truth out of the gap between the two lies.
You've used Gurobi or CPLEX or HiGHS or whatever Lyric Studio wraps. You've watched the log scroll. You've seen Best Bound and Best Solution march toward each other and felt vaguely that something profound was happening. Something is. This is that something.
I'm going to walk you through the machinery in seven moves. We start with a problem so small you could solve it in your head. Then I take that problem away from you and force you to watch a solver do it — because the way the solver does it is not the way you did it, and that difference is where all the action is.
Read a solver log and know which knob to turn. Diagnose why your formulation is slow. Understand why your colleague's "obvious" big-M reformulation made the problem 100× worse. And — most importantly — develop a physical intuition for what's happening inside, the way a good mechanic feels an engine.
A thief breaks into a museum. Capacity is finite. Greed is not. Pick the items that maximize loot under a weight constraint. This is the fruit fly of integer programming — small enough to study, vicious enough to teach you everything.
The last line is the cunt of the whole thing. That little ∈ {0, 1} is what turns a problem you can solve in milliseconds into a problem that — at scale — is harder than factoring large primes. Drop the integrality constraint, replace it with 0 ≤ xᵢ ≤ 1, and the problem dissolves into trivial linear programming. Keep it, and you're staring at NP-hard.
Try it. Below are six items and a knapsack of capacity 15. Click items to put them in. Find the best loot.
Notice: with 6 items there are 2⁶ = 64 combinations. With 60 items it's 2⁶⁰ ≈ 10¹⁸. With 600 items the universe ends before brute force finishes. And yet a real solver eats 600-item knapsacks for breakfast.
A modern CPU evaluates maybe 10⁹ knapsack configurations per second. At 60 items you're staring at roughly 36 years of compute. At 100 items, 40 thousand trillion years. The heat death of the universe arrives on schedule, your model does not.
So we don't enumerate. We do something more devious.
The single most important idea in MIP solving is also the most embarrassingly simple: pretend the variables are continuous, solve the resulting linear program in milliseconds, and use the answer as a lying compass.
Here's the move. Take your knapsack. Replace xᵢ ∈ {0, 1} with 0 ≤ xᵢ ≤ 1. You're now allowed to take half a sword, 0.7 of a tiara. Ridiculous, but solvable in microseconds.
The LP gives you a number. Call it z*LP. Two things are true about this number, and both matter:
For a maximization problem, the LP relaxation can only do better than the integer problem, never worse. Why? Because every integer solution is also a feasible LP solution (an integer is a continuous number that happens to be whole). The LP has more options, so its optimum is ≥ the integer optimum.
This is the dual bound. It tells you "no matter how clever you get with integers, you cannot beat this number." It's the ceiling.
The LP solution will usually have fractional variables. x₃ = 0.62, x₅ = 0.41. This is impossible in reality. But the fractional solution tells you which items the math thinks are good. That's a hint a smart algorithm can exploit.
For two-variable problems we can draw the whole thing. Below, the gray region is the LP feasible space — anywhere you can be if you ignore integrality. The red dots are the actual integer-feasible points. The LP optimum sits at a corner of the polyhedron. The integer optimum is one of the dots — and it's almost never the same point.
The LP optimum lives on a corner. The integer optimum lives on a dot. The distance between them is the integrality gap, and shrinking it is most of what a MIP solver does for a living.
Branch and bound is what happens when someone realizes you don't have to enumerate 2ⁿ solutions if you can prove huge swaths of them are losers without looking at them. It's a search tree where most branches get axed before they're explored.
The recipe is brutal in its simplicity:
1. Solve the LP relaxation. If the answer is integer, you're done. If not, pick a fractional variable.
2. Branch. Create two subproblems. In one, force xᵢ ≤ 0. In the other, force xᵢ ≥ 1. Each subproblem is itself a smaller MIP.
3. Bound. Solve the LP relaxation of each subproblem. The LP value of a subproblem is an upper bound on what any integer solution in that subtree can achieve.
4. Fathom. If the subproblem's upper bound is worse than the best integer solution found so far (the incumbent), prune the entire subtree. You don't need to look inside. This is the magic.
Watch it happen. The widget below is a real branch-and-bound run on a small knapsack. Hit "Step" to advance one node at a time. The tree fills in. The bounds tighten. Watch the fathomed branches go gray — those are the subtrees the algorithm didn't have to explore.
Reading the tree: each node shows the LP value and the variable that was just branched on. Amber = active. Green = integer-feasible (incumbent candidate). Faded = fathomed (pruned). The whole point is that we don't have to expand the faded nodes.
Step 1 said "pick a fractional variable." Which one? This question is worth tens of millions of dollars per year to Gurobi customers, because the choice can change solve time by orders of magnitude.
Naive answer: pick the most fractional one (closest to 0.5). Slightly less naive: pick the one whose value drives the biggest LP change. Modern solvers use strong branching — they tentatively try several candidates, see which one tightens the bound the most, and pick that one. Expensive per node, but a single great branching decision can fathom thousands of subsequent nodes. Probabilistic thinking applied to algorithm design.
Branching makes a tree. Which node do we expand next? Two main schools:
Best-bound: always expand the node with the most optimistic LP bound. Tightens the dual bound fastest. Tends to be memory-hungry — the tree fans out wide.
Depth-first: dive down a branch until you fathom or find an incumbent. Finds feasible solutions fast, which lets you fathom siblings aggressively. Memory-friendly.
Real solvers do both. They start with depth-first to find an incumbent, then switch to best-bound once they have something to prune with. This is just the explore/exploit trade-off you already know from poker, dressed up in different clothes.
If branch and bound is a knife, cutting planes are a sculptor's chisel. Instead of dividing the problem, you carve away pieces of the LP region — fractional bullshit you don't want — without removing a single integer-feasible point.
Recall: the LP optimum sits at a corner of the polyhedron, often fractional. What if we could just add a constraint that slices off that fractional corner without cutting away any integer solutions? The new LP would be tighter. The dual bound would improve. The gap would shrink.
This is exactly what cutting planes do. Watch.
Each cut you add is a hyperplane chosen to slice off the current fractional LP optimum without touching any integer point. The polyhedron contracts. The new LP optimum moves. Eventually — in theory — you've added enough cuts that the LP polyhedron equals the convex hull of the integer points, and the LP relaxation is exact.
Because describing the convex hull of an integer program can require exponentially many constraints. The convex hull is the truth, but the truth has too many sides to write down. So we add cuts lazily, only when they're needed — only when the LP gives us a fractional answer we want to chop off.
Gomory cuts (1958): the original. Derived from the simplex tableau. Always valid, but often weak. The first proof that cutting planes were a viable strategy.
Cover cuts: for knapsack-style constraints. If a set of items must have at least one excluded (because together they exceed capacity), add that as a cut.
Mixed-integer rounding (MIR): a workhorse. Exploits the rounding behavior of fractional coefficients. The unsung hero of modern solvers.
Clique cuts, flow cover cuts, lift-and-project cuts... dozens more. Modern solvers carry an arsenal and try multiple families at every node.
The headline algorithm of every solver shipping today is branch-and-cut: branch and bound, but at every node you also try to add cuts before branching. Two pincers. The dual bound drops from above. The primal bound rises from below. The gap dies in the middle.
Pull up a Gurobi log. You'll see this:
Read this with new eyes. Each line is the vise tightening. Incumbent is the best integer solution found so far — the primal bound — and it only goes up (for a max problem). BestBd is the LP-derived ceiling — the dual bound — and it only goes down. H means a heuristic found a feasible solution. * means a node solve found one. The gap closes. The solver wins.
The dual (amber, top) is what the LP relaxation thinks is achievable. The primal (green, bottom) is what we've actually proven achievable with an integer solution. Optimality is when they kiss.
This is the single most underrated practical lesson in MIP: setting the MIP gap parameter is the highest-leverage knob in the entire solver. Default is usually 0.01% (vanishingly tight). Set it to 1% or 2% and watch your problem solve in 1/10 the time, with answers that — for any real-world business problem — are indistinguishable from optimal.
BestBd = 1000, Incumbent = 950, Gap = 5%. You are running a maximization problem. Which statement is true?If you remember nothing else, remember this: the algorithm is rarely your problem. The formulation is almost always your problem. The most expensive computation in MIP is the one performed on a badly-modeled constraint.
You want to model an "if-then" constraint: "if y = 1, then x ≤ 100." The textbook trick is:
Where M is "a really big number." This works mathematically. It is also the single most common cause of slow solves on the planet. Why? Because the LP relaxation of this constraint is dogshit. With M = 10,000,000, the LP can set y = 0.0001 and x = 9,000 simultaneously and feel great. The integrality gap explodes. Branch and bound has nothing to prune. Cuts can't help much because the polyhedron is enormous.
The fix: use the smallest valid M. If x can never realistically exceed 1,000, use M = 1,000, not M = 10⁹. Better still: see if you can reformulate without big-M at all (indicator constraints, disjunctive programming, SOS constraints).
Suppose you're assigning identical workers to identical jobs. Worker 1 to Job A and Worker 2 to Job B is mathematically equivalent to Worker 2 to Job A and Worker 1 to Job B. The solver doesn't know they're equivalent. It explores both. And both subtrees of both. And both subtrees of those. This is symmetry, and it can multiply your search tree by factorial nonsense.
The fix: add symmetry-breaking constraints (x₁ ≥ x₂ ≥ x₃, etc.) or use a solver setting that detects and quotients out symmetry. Modern Gurobi and CPLEX do this automatically — but only if you let them by writing clean, recognizable formulations.
Some formulations of the same problem give wildly different LP bounds. The classic example is the uncapacitated facility location problem:
| Formulation | LP Gap | Solve Time (50 facilities) |
|---|---|---|
| Aggregated (one constraint per customer total) | ~40% | hours, often unsolved |
| Disaggregated (one constraint per customer-facility pair) | ~1–5% | seconds |
The disaggregated version has more constraints. More constraints, faster solve. Counterintuitive until you realize each extra constraint tightens the polyhedron, and a tight polyhedron means a tight LP bound, which means aggressive fathoming. Constraint count is not what slows MIPs. Loose relaxations are.
Modern solvers have hundreds of parameters. The defaults are pretty good. Pretty good is not great. For a problem you're going to solve repeatedly:
Tune the gap. If 1% is acceptable, set MIPGap = 0.01 and reclaim hours.
Tune the time limit. Take the best feasible solution and move on. Probabilistic thinking: a 1% improvement after hour 4 is rarely worth hour 4.
Use warm starts. If you solved yesterday's problem, today's looks similar. Feed in the previous solution as an initial incumbent. Solvers eat this up.
Run automatic tuning. Gurobi has grbtune. CPLEX has the tuning tool. They'll run hundreds of parameter combos and report the best for your model. This is free 2–10× speedup that 99% of practitioners never use.
Sometimes the solver isn't slow. You are slow. You're sending one solve at a time, blocking, then post-processing, then sending the next. The solver is idle 90% of wall time. Look at your end-to-end pipeline before tuning anything inside the box.
After all the algorithms, what survives? Five operating principles I'd put on a card and tape inside the laptop of anyone who ships optimization for a living.
Stop trying to prove optimality. Prove acceptability. A 1% gap delivered now beats a 0% gap delivered tomorrow that nobody waits for. Speed of decisions compounds; precision of decisions doesn't.
Your formulation will look ugly. It will have more constraints than you think reasonable. It will have indicator constraints and big-Ms with carefully-bounded values and sometimes redundant inequalities added solely to tighten the LP. Tight, ugly formulations beat clean, loose ones every single time.
By the time the solver hits 5% gap, it almost always has the optimal solution in hand. The remaining time is closing the dual bound, not improving the primal. If you don't need the proof, you don't need the time.
The solver core does branch-and-cut. The presolver does miracles. It removes redundant constraints, fixes variables, tightens bounds, detects implications, lifts coefficients. On many problems, presolve is responsible for 60%+ of the speedup over a naive implementation. Look at the presolve log. If it's removing 80% of your variables, your formulation has fat. If it's removing nothing, your formulation has bones.
Branching strategies, cut selection, heuristic deployment — all of these are bets on what will pay off. Strong branching costs more per node but pays off across the subtree. Cuts cost time to generate but pay off in fathomed branches. Modern solvers are essentially Bayesian agents managing a portfolio of techniques, deploying each one when its expected return justifies its cost. You can think this way about your own modeling decisions.
If you want the theory: Wolsey's Integer Programming is short, brutal, definitive. Nemhauser & Wolsey's Integer and Combinatorial Optimization is the bible — heavy, complete, slightly dated, still essential.
If you want practical mastery: Bob Bixby's papers on the history of CPLEX/Gurobi performance gains. Read them in order. You'll watch solvers go from "useful for 50-variable problems" in 1990 to "industrial workhorse for 5-million-variable problems" in 2020 — a million-fold algorithmic speedup that dwarfs hardware improvements over the same period.
If you want to actually feel it: open Gurobi or HiGHS, build a small VRP or facility location, and watch the log scroll while you sip coffee. Let your eyes adjust. Notice when the gap closes fast, notice when it stalls. The intuition is real and it transfers.