MIP Solvers, Demystified — A Field Guide

§ 01 / THE ITCH

What you're about to learn — and why every textbook hides it.

Mixed-integer programming is the art of solving problems that can't be solved. The trick is that we lie to ourselves twice — once optimistically, once pessimistically — and then squeeze the truth out of the gap between the two lies.

You've used Gurobi or CPLEX or HiGHS or whatever Lyric Studio wraps. You've watched the log scroll. You've seen Best Bound and Best Solution march toward each other and felt vaguely that something profound was happening. Something is. This is that something.

"All models are wrong, but some are useful — and the useful ones are usually the ones a solver can actually solve before you die." — George Box, with editorial liberties

I'm going to walk you through the machinery in seven moves. We start with a problem so small you could solve it in your head. Then I take that problem away from you and force you to watch a solver do it — because the way the solver does it is not the way you did it, and that difference is where all the action is.

What you'll be able to do after this

Read a solver log and know which knob to turn. Diagnose why your formulation is slow. Understand why your colleague's "obvious" big-M reformulation made the problem 100× worse. And — most importantly — develop a physical intuition for what's happening inside, the way a good mechanic feels an engine.

◆ ◆ ◆

§ 02 / THE KNAPSACK

A toy that bites: the 0/1 knapsack problem.

A thief breaks into a museum. Capacity is finite. Greed is not. Pick the items that maximize loot under a weight constraint. This is the fruit fly of integer programming — small enough to study, vicious enough to teach you everything.

max Σ vᵢ xᵢ
s.t. Σ wᵢ xᵢ ≤ W
xᵢ ∈ {0, 1}

The last line is the cunt of the whole thing. That little ∈ {0, 1} is what turns a problem you can solve in milliseconds into a problem that — at scale — is harder than factoring large primes. Drop the integrality constraint, replace it with 0 ≤ xᵢ ≤ 1, and the problem dissolves into trivial linear programming. Keep it, and you're staring at NP-hard.

Try it. Below are six items and a knapsack of capacity 15. Click items to put them in. Find the best loot.

Interactive · Knapsack

Weight 0 Capacity 15

Value 0

vs. Optimum —

Notice: with 6 items there are 2⁶ = 64 combinations. With 60 items it's 2⁶⁰ ≈ 10¹⁸. With 600 items the universe ends before brute force finishes. And yet a real solver eats 600-item knapsacks for breakfast.

Why brute force dies

A modern CPU evaluates maybe 10⁹ knapsack configurations per second. At 60 items you're staring at roughly 36 years of compute. At 100 items, 40 thousand trillion years. The heat death of the universe arrives on schedule, your model does not.

So we don't enumerate. We do something more devious.

◆ ◆ ◆

§ 03 / THE LP ORACLE

The first lie: relax the integers and ask the LP for a hint.

The single most important idea in MIP solving is also the most embarrassingly simple: pretend the variables are continuous, solve the resulting linear program in milliseconds, and use the answer as a lying compass.

Here's the move. Take your knapsack. Replace xᵢ ∈ {0, 1} with 0 ≤ xᵢ ≤ 1. You're now allowed to take half a sword, 0.7 of a tiara. Ridiculous, but solvable in microseconds.

The LP gives you a number. Call it z*_LP. Two things are true about this number, and both matter:

Property 1 — It's an Upper Bound

For a maximization problem, the LP relaxation can only do better than the integer problem, never worse. Why? Because every integer solution is also a feasible LP solution (an integer is a continuous number that happens to be whole). The LP has more options, so its optimum is ≥ the integer optimum.

This is the dual bound. It tells you "no matter how clever you get with integers, you cannot beat this number." It's the ceiling.

Property 2 — It Lies, But Usefully

The LP solution will usually have fractional variables. x₃ = 0.62, x₅ = 0.41. This is impossible in reality. But the fractional solution tells you which items the math thinks are good. That's a hint a smart algorithm can exploit.

"It is better to be roughly right than precisely wrong." — Keynes (allegedly), and also every LP relaxation ever

Visualize it: the polyhedron and its corners

For two-variable problems we can draw the whole thing. Below, the gray region is the LP feasible space — anywhere you can be if you ignore integrality. The red dots are the actual integer-feasible points. The LP optimum sits at a corner of the polyhedron. The integer optimum is one of the dots — and it's almost never the same point.

The LP optimum lives on a corner. The integer optimum lives on a dot. The distance between them is the integrality gap, and shrinking it is most of what a MIP solver does for a living.

◆ ◆ ◆

§ 04 / BRANCH & BOUND

The first real algorithm: divide, conquer, and prune like a maniac.

Branch and bound is what happens when someone realizes you don't have to enumerate 2ⁿ solutions if you can prove huge swaths of them are losers without looking at them. It's a search tree where most branches get axed before they're explored.

The recipe is brutal in its simplicity:

The Four Moves

1. Solve the LP relaxation. If the answer is integer, you're done. If not, pick a fractional variable.

2. Branch. Create two subproblems. In one, force xᵢ ≤ 0. In the other, force xᵢ ≥ 1. Each subproblem is itself a smaller MIP.

3. Bound. Solve the LP relaxation of each subproblem. The LP value of a subproblem is an upper bound on what any integer solution in that subtree can achieve.

4. Fathom. If the subproblem's upper bound is worse than the best integer solution found so far (the incumbent), prune the entire subtree. You don't need to look inside. This is the magic.

"Invert, always invert. The optimization algorithm doesn't try to find the best solution. It tries to eliminate solutions that cannot be best." — Munger / Jacobi, weaponized for combinatorial optimization

Watch it happen. The widget below is a real branch-and-bound run on a small knapsack. Hit "Step" to advance one node at a time. The tree fills in. The bounds tighten. Watch the fathomed branches go gray — those are the subtrees the algorithm didn't have to explore.

Interactive · Branch & Bound Solver

Bounds

Best Integer (incumbent) ↓

—

Best Bound (LP) ↓

—

Gap

100%

Activity

Reading the tree: each node shows the LP value and the variable that was just branched on. Amber = active. Green = integer-feasible (incumbent candidate). Faded = fathomed (pruned). The whole point is that we don't have to expand the faded nodes.

Branching strategy: where solvers earn their license

Step 1 said "pick a fractional variable." Which one? This question is worth tens of millions of dollars per year to Gurobi customers, because the choice can change solve time by orders of magnitude.

Naive answer: pick the most fractional one (closest to 0.5). Slightly less naive: pick the one whose value drives the biggest LP change. Modern solvers use strong branching — they tentatively try several candidates, see which one tightens the bound the most, and pick that one. Expensive per node, but a single great branching decision can fathom thousands of subsequent nodes. Probabilistic thinking applied to algorithm design.

Search strategy: where do we go next?

Branching makes a tree. Which node do we expand next? Two main schools:

Best-bound: always expand the node with the most optimistic LP bound. Tightens the dual bound fastest. Tends to be memory-hungry — the tree fans out wide.

Depth-first: dive down a branch until you fathom or find an incumbent. Finds feasible solutions fast, which lets you fathom siblings aggressively. Memory-friendly.

Real solvers do both. They start with depth-first to find an incumbent, then switch to best-bound once they have something to prune with. This is just the explore/exploit trade-off you already know from poker, dressed up in different clothes.

◆ ◆ ◆

§ 05 / CUTTING PLANES

The second weapon: shrink the polyhedron until the LP can't lie anymore.

If branch and bound is a knife, cutting planes are a sculptor's chisel. Instead of dividing the problem, you carve away pieces of the LP region — fractional bullshit you don't want — without removing a single integer-feasible point.

Recall: the LP optimum sits at a corner of the polyhedron, often fractional. What if we could just add a constraint that slices off that fractional corner without cutting away any integer solutions? The new LP would be tighter. The dual bound would improve. The gap would shrink.

This is exactly what cutting planes do. Watch.

Each cut you add is a hyperplane chosen to slice off the current fractional LP optimum without touching any integer point. The polyhedron contracts. The new LP optimum moves. Eventually — in theory — you've added enough cuts that the LP polyhedron equals the convex hull of the integer points, and the LP relaxation is exact.

Why we don't just compute the convex hull and be done with it

Because describing the convex hull of an integer program can require exponentially many constraints. The convex hull is the truth, but the truth has too many sides to write down. So we add cuts lazily, only when they're needed — only when the LP gives us a fractional answer we want to chop off.

The famous cuts

Gomory cuts (1958): the original. Derived from the simplex tableau. Always valid, but often weak. The first proof that cutting planes were a viable strategy.

Cover cuts: for knapsack-style constraints. If a set of items must have at least one excluded (because together they exceed capacity), add that as a cut.

Mixed-integer rounding (MIR): a workhorse. Exploits the rounding behavior of fractional coefficients. The unsung hero of modern solvers.

Clique cuts, flow cover cuts, lift-and-project cuts... dozens more. Modern solvers carry an arsenal and try multiple families at every node.

"The right cut, at the right node, can turn a 10-hour solve into a 30-second solve. Most cuts do nothing. A few cuts are the difference between a tractable problem and a career-ending bug." — folk wisdom from the ORMs Slack

◆ ◆ ◆

§ 06 / THE VISE

How modern solvers really work: branch-and-cut and the closing gap.

The headline algorithm of every solver shipping today is branch-and-cut: branch and bound, but at every node you also try to add cuts before branching. Two pincers. The dual bound drops from above. The primal bound rises from below. The gap dies in the middle.

Pull up a Gurobi log. You'll see this:

# Gurobi 11.0.0, MIP log, lightly cleaned Nodes | Current Node | Objective Bounds | Work Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time 0 0 823.40 0 42 - 823.40 - - 1s H 0 0 645.00 823.40 27.7% - 1s 0 2 812.70 0 38 645.00 812.70 26.0% - 2s H 31 18 681.00 789.20 15.9% 118 3s * 214 44 28 722.00 772.10 6.9% 98 7s * 1,431 201 31 741.00 751.30 1.4% 87 24s 3,892 52 cutoff 29 741.00 741.00 0.0% 81 31s # Optimal solution found. Done.

Read this with new eyes. Each line is the vise tightening. Incumbent is the best integer solution found so far — the primal bound — and it only goes up (for a max problem). BestBd is the LP-derived ceiling — the dual bound — and it only goes down. H means a heuristic found a feasible solution. * means a node solve found one. The gap closes. The solver wins.

Watch the vise close in real time

Solver Race · 60 nodes READY

Primal

Dual

100

Gap

100%

The dual (amber, top) is what the LP relaxation thinks is achievable. The primal (green, bottom) is what we've actually proven achievable with an integer solution. Optimality is when they kiss.

"You don't need certainty. You need narrowing certainty. A 0.5% optimality gap is enough to ship; a 0.0% gap is mathematical vanity that costs you 80% of total solve time." — Annie Duke, retrofitted for ops research

This is the single most underrated practical lesson in MIP: setting the MIP gap parameter is the highest-leverage knob in the entire solver. Default is usually 0.01% (vanishingly tight). Set it to 1% or 2% and watch your problem solve in 1/10 the time, with answers that — for any real-world business problem — are indistinguishable from optimal.

A solver reports BestBd = 1000, Incumbent = 950, Gap = 5%. You are running a maximization problem. Which statement is true?

The dual bound (BestBd) is the ceiling — proven, not estimated. The incumbent (950) is the floor. The truth is sandwiched between. If 5% gap is acceptable for your business, stop solving and ship. The remaining 5% is mathematical vanity.

◆ ◆ ◆

§ 07 / WHY YOUR MODEL IS SLOW

The practitioner's chapter: solver pathologies and the formulations that cause them.

If you remember nothing else, remember this: the algorithm is rarely your problem. The formulation is almost always your problem. The most expensive computation in MIP is the one performed on a badly-modeled constraint.

Sin #1 — Big-M abuse

You want to model an "if-then" constraint: "if y = 1, then x ≤ 100." The textbook trick is:

x ≤ 100 + M(1 − y)

Where M is "a really big number." This works mathematically. It is also the single most common cause of slow solves on the planet. Why? Because the LP relaxation of this constraint is dogshit. With M = 10,000,000, the LP can set y = 0.0001 and x = 9,000 simultaneously and feel great. The integrality gap explodes. Branch and bound has nothing to prune. Cuts can't help much because the polyhedron is enormous.

The fix: use the smallest valid M. If x can never realistically exceed 1,000, use M = 1,000, not M = 10⁹. Better still: see if you can reformulate without big-M at all (indicator constraints, disjunctive programming, SOS constraints).

Sin #2 — Symmetry

Suppose you're assigning identical workers to identical jobs. Worker 1 to Job A and Worker 2 to Job B is mathematically equivalent to Worker 2 to Job A and Worker 1 to Job B. The solver doesn't know they're equivalent. It explores both. And both subtrees of both. And both subtrees of those. This is symmetry, and it can multiply your search tree by factorial nonsense.

The fix: add symmetry-breaking constraints (x₁ ≥ x₂ ≥ x₃, etc.) or use a solver setting that detects and quotients out symmetry. Modern Gurobi and CPLEX do this automatically — but only if you let them by writing clean, recognizable formulations.

Sin #3 — Loose continuous relaxations

Some formulations of the same problem give wildly different LP bounds. The classic example is the uncapacitated facility location problem:

Formulation	LP Gap	Solve Time (50 facilities)
Aggregated (one constraint per customer total)	~40%	hours, often unsolved
Disaggregated (one constraint per customer-facility pair)	~1–5%	seconds

The disaggregated version has more constraints. More constraints, faster solve. Counterintuitive until you realize each extra constraint tightens the polyhedron, and a tight polyhedron means a tight LP bound, which means aggressive fathoming. Constraint count is not what slows MIPs. Loose relaxations are.

"I am not interested in solutions that are merely correct. I am interested in solutions that are tight." — George Nemhauser, paraphrased and improved

Sin #4 — Trusting defaults forever

Modern solvers have hundreds of parameters. The defaults are pretty good. Pretty good is not great. For a problem you're going to solve repeatedly:

Tune the gap. If 1% is acceptable, set MIPGap = 0.01 and reclaim hours.

Tune the time limit. Take the best feasible solution and move on. Probabilistic thinking: a 1% improvement after hour 4 is rarely worth hour 4.

Use warm starts. If you solved yesterday's problem, today's looks similar. Feed in the previous solution as an initial incumbent. Solvers eat this up.

Run automatic tuning. Gurobi has grbtune. CPLEX has the tuning tool. They'll run hundreds of parameter combos and report the best for your model. This is free 2–10× speedup that 99% of practitioners never use.

Sin #5 — Believing the solver is the bottleneck

Sometimes the solver isn't slow. You are slow. You're sending one solve at a time, blocking, then post-processing, then sending the next. The solver is idle 90% of wall time. Look at your end-to-end pipeline before tuning anything inside the box.

Your model has 10,000 variables and runs in 30 seconds. A colleague suggests "simplifying" by aggregating 50 constraints into 1. The new model has 9,950 constraints. What happens?

Aggregation feels like simplification but is usually a tightness disaster. The aggregated LP relaxation allows fractional combinations the disaggregated version forbids, the bound gets weaker, and B&B explores more nodes. Modern solvers do apply some automatic disaggregation in presolve — but they can't always recover what you destroyed.

◆ ◆ ◆

§ 08 / THE WISDOM

Mental models for living with MIP solvers.

After all the algorithms, what survives? Five operating principles I'd put on a card and tape inside the laptop of anyone who ships optimization for a living.

One — The gap is your friend, not your enemy.

Stop trying to prove optimality. Prove acceptability. A 1% gap delivered now beats a 0% gap delivered tomorrow that nobody waits for. Speed of decisions compounds; precision of decisions doesn't.

Two — Tightness eats elegance.

Your formulation will look ugly. It will have more constraints than you think reasonable. It will have indicator constraints and big-Ms with carefully-bounded values and sometimes redundant inequalities added solely to tighten the LP. Tight, ugly formulations beat clean, loose ones every single time.

Three — Most of solve time is spent proving the answer you already found.

By the time the solver hits 5% gap, it almost always has the optimal solution in hand. The remaining time is closing the dual bound, not improving the primal. If you don't need the proof, you don't need the time.

Four — Solvers are dumber than you think; presolve is smarter than you think.

The solver core does branch-and-cut. The presolver does miracles. It removes redundant constraints, fixes variables, tightens bounds, detects implications, lifts coefficients. On many problems, presolve is responsible for 60%+ of the speedup over a naive implementation. Look at the presolve log. If it's removing 80% of your variables, your formulation has fat. If it's removing nothing, your formulation has bones.

Five — Probabilistic thinking applies to algorithms, too.

Branching strategies, cut selection, heuristic deployment — all of these are bets on what will pay off. Strong branching costs more per node but pays off across the subtree. Cuts cost time to generate but pay off in fathomed branches. Modern solvers are essentially Bayesian agents managing a portfolio of techniques, deploying each one when its expected return justifies its cost. You can think this way about your own modeling decisions.

"All models are approximations. All optimizations are approximations of approximations. Don't fall in love with the math; fall in love with the decision the math improves." — what I'd tell my younger self, after eighteen years

What to read next

If you want the theory: Wolsey's Integer Programming is short, brutal, definitive. Nemhauser & Wolsey's Integer and Combinatorial Optimization is the bible — heavy, complete, slightly dated, still essential.

If you want practical mastery: Bob Bixby's papers on the history of CPLEX/Gurobi performance gains. Read them in order. You'll watch solvers go from "useful for 50-variable problems" in 1990 to "industrial workhorse for 5-million-variable problems" in 2020 — a million-fold algorithmic speedup that dwarfs hardware improvements over the same period.

If you want to actually feel it: open Gurobi or HiGHS, build a small VRP or facility location, and watch the log scroll while you sip coffee. Let your eyes adjust. Notice when the gap closes fast, notice when it stalls. The intuition is real and it transfers.