Decision · Council Debate

Should I rewrite my legacy monolith or refactor it incrementally?

Updated 2026-06-26

The dilemma
"I inherited a 5-year-old monolith that's slow to change and intimidates new hires. I'm tempted to rewrite it from scratch, but we're 4 engineers shipping features every week — a full rewrite could mean 6+ months with no new features."

Every engineer who inherits a scary monolith feels the same pull: burn it down and start clean. But the council pulled the question apart and found the slowness usually isn't the architecture — it's fear, manufactured by near-zero test coverage. A rewrite is the most expensive possible cure for a fear problem. The cheaper, faster path is a safety net: characterization tests on your worst module first, on a tight clock.

Decision Record

Don't rewrite. Refactor incrementally, starting with characterization tests on the reporting engine and a 6-week checkpoint measured by cycle-time improvement.

90% · Strong consensus

Fear, not architecture: the bottleneck is ~15% test coverage, not coupling — the same engineers ship boldly on side projects, so the caution is environmental.
Rewrites lose: large rewrites fail 50–70% of the time and even successes run 2–3x over — a year of zero features while competitors move.
Tests are the unlearning loop: each passing test is proof that "touching this is safe," which is what dissolves conditioned caution; a clean codebase alone can't buy that.
Start with the worst module: the reporting engine (highest churn + defects) is the adversarial target, not a cherry-picked easy win — fix it first.
Characterization tests first: they lock in current behavior, bugs included; "tests pass" is the starting line for fixing those bugs, not the finish line.
Bound it with a 6-week checkpoint: baseline 4-day cycle time and 15% coverage; if the needle doesn't move, reassess instead of grinding on faith — and pre-sell that metric so it isn't killed as "no new features."

WhyThe real bottleneck is fear from missing tests, not architecture. A rewrite would be a costly, year-long detour that doesn't directly solve the team's caution; targeted testing creates a safety net that unblocks faster onboarding and bolder changes.

Main riskThe reporting engine's high defect rate and possible lost institutional knowledge may make test-writing slower and reveal more hidden coupling than expected, risking an open-ended effort without visible progress if the 6-week checkpoint isn't tightly managed.

What would change itIf at the 6-week review cycle time hasn't meaningfully decreased and the team still relies on manual-QA caution, the incremental test-first approach should be re-evaluated in favor of more radical extraction.

Next stepAssign an owner (paired to cover retention and domain knowledge) to begin characterization tests on the reporting engine, define the exact success metric (e.g., reporting-engine cycle time from 4 days to 1 day), and align your manager on that scorecard before writing the first test.

Run this on your own decision →Free · local-first · bring your own API key

Inside the debate

The council4 agents · 4 models

The Analyst

GLM-5.2

The Strategist

DeepSeek V4 Pro

The Integrator

GPT-5.5

The Humanist

Kimi K2.7

4 independent frontier models debated this — not one model playing 4 roles.

The Analyst· GLM-5.2

Rewrites have a track record, and it's bad: 50–70% fail — cancelled, never shipped, or abandoned. Netscape's rewrite helped kill the company. Even successes run 2–3x over estimate, so "6+ months" with 4 engineers is realistically a year of zero customer-facing output while competitors move. And "slow to change" and "intimidates new hires" are symptoms, not a diagnosis. Get cycle time and a churn-vs-defect heat map first — otherwise you're prescribing surgery without imaging.

The Strategist· DeepSeek V4 Pro

"Intimidates new hires" isn't a productivity metric — it's a talent-pipeline signal, and the data the Analyst wants measures code health, not team health. The risks aren't symmetric either: a failed rewrite costs a year of velocity; a failed retention strategy costs the company. If the monolith is quietly driving people out, a slow incremental path isn't preserving optionality — it's preserving a death spiral. Test the assumption cheaply before betting either way.

The Integrator· GPT-5.5

"Rewrite" and "refactor" aren't the only paths — and the real question is why it's slow. Two opposite causes: architectural coupling (the code won't let you move) or organizational fear (no tests, every change risks a cascade). A rewrite fixes coupling but does nothing for fear — you'll rebuild the same defensive habits and have a clean monolith that's just as slow in 18 months. The tell: are risky PRs rejected as "too risky" (fear) or "this breaks three other things" (coupling)?

The Analyst· GLM-5.2

Careful with the spike everyone likes: you'll pick the most separable module, so it's cherry-picked, not random. Failure is strong signal (deep coupling); success tells you almost nothing — a demo driving a commitment, not data driving a decision. If you spike, pick a high-churn module at random. And one question sits upstream of all of this: is this acute (losing customers, can't hire) or chronic friction? You ship every week — that flips the entire calculus.

The Humanist· Kimi K2.7

The data settles it: chronic, the team is staying, and they ship boldly on side projects — so this is structural fear, not personality. The ~15% test coverage is the human problem nobody named. A new hire takes three weeks to a first PR because without tests there's no map and every merge feels like defusing a bomb. Each test is a small proof that "touching this is safe" — that's the unlearning loop a rewrite can't buy. Start with the reporting engine.

The Strategist· DeepSeek V4 Pro

One strategic guardrail: at six weeks, someone above you will ask "why no new features?" and measure this by the wrong yardstick. Pre-position it now. The first metric that moves is cycle time on the reporting engine — not overall feature velocity. Get your manager to agree that "4-day cycle time → 1-day on the reporting engine" counts as the win, or the checkpoint becomes the moment the effort gets killed for looking like nothing happened. And don't put the lukewarm engineer on this solo — pair them.