Should I rewrite my legacy system or refactor it?

It depends on whether the core architecture is fundamentally broken. If the architecture is sound but the code is messy, refactor. If the architecture itself prevents you from achieving your goals, consider a strangler fig migration — an incremental rewrite that replaces the system piece by piece while it stays live.

How much does a rewrite cost compared to a refactor?

Rewrites typically cost 2–5× more than estimated due to undocumented business rules, scope creep, and the parallel maintenance burden. Refactoring has more predictable costs because it's incremental and delivers value continuously.

What is the strangler fig pattern?

The strangler fig pattern is a migration strategy where you build new functionality around the old system, gradually shift traffic to the new implementation, and retire legacy code only once its replacement is proven in production. It combines the benefits of a rewrite with the safety of a refactor.

When to Rewrite vs. Refactor — A Decision Framework for CTOs

Every CTO, VP Engineering, or technical founder eventually faces the same question: should we rewrite this system from scratch, or refactor what we have?

It is one of the most consequential decisions in software — and one of the most frequently made on instinct rather than analysis. The developers want a rewrite because the code is painful to work in. The business wants a refactor because it sounds faster and cheaper. Both are often wrong.

Here is a structured way to make the decision.

The rewrite temptation

Rewrites are seductive because they promise a clean start. No legacy constraints. Modern stack. Full test coverage. Everything done "the right way" from day one.

The reality is less appealing. Most rewrites:

Take 2–3× longer than estimated. The old system encodes years of business rules, edge cases, and regulatory requirements that were never documented. You only discover them when you miss them in the new system.
Cost 2–5× more than estimated. Longer timelines mean more developer-months, more project management overhead, and more opportunity cost from features that aren't being built.
Frequently get cancelled. The business runs on the old system while you build the new one. If the rewrite takes too long, leadership loses confidence, funding gets cut, and you're left maintaining two half-working systems.

Joel Spolsky called rewrites the "single worst strategic mistake" a software company can make. That was in 2000. The evidence since then has only reinforced his point.

The refactor reality

Refactoring — incrementally improving the existing codebase without changing its external behavior — is the conservative choice. It's also often the right one.

But refactoring has its own failure modes:

Refactoring without clear goals becomes busywork. "Let's clean up the code" is not a strategy. Refactoring should target specific, measurable outcomes: reduce deployment time from 2 hours to 10 minutes, eliminate the class of bugs caused by shared mutable state, make the order processing module testable.
Some systems resist incremental improvement. If the fundamental architecture is wrong — a monolith with no module boundaries, a database schema that conflates unrelated concerns, a language runtime that's end-of-life — refactoring within that architecture may not solve the underlying problem.
Refactoring takes discipline. It requires writing tests for code you didn't write, understanding business logic nobody documented, and making changes that are invisible to stakeholders. It's not glamorous, and it's hard to maintain momentum.

The decision framework

Instead of asking "rewrite or refactor?" as a binary choice, evaluate your system against five dimensions:

1. Is the core architecture fundamentally wrong?

If the system's structure makes it impossible to achieve your goals — regardless of code quality — a refactor won't help. Examples:

The application is a single-threaded monolith and you need to handle 100× more concurrent users.
The database schema mixes accounting, user management, and product inventory in a single table.
The runtime is end-of-life with no migration path (e.g., Python 2, PHP 5.x with mysql_* functions throughout).

If yes: Rewrite is likely necessary — but consider a strangler fig approach rather than a big-bang rewrite.

If no: Refactor. The architecture is sound; the code just needs improvement.

2. How much institutional knowledge exists?

Does your team understand the system? Can they explain why decisions were made? Is there documentation?

High knowledge, low documentation: Refactor. The team can guide the improvement.
Low knowledge, low documentation: Dangerous territory either way. Start with a system audit before making any decision.
No knowledge at all (e.g., acquired system, original team gone): A rewrite may be more practical than trying to understand and improve a system nobody can explain.

3. What's the cost of feature freeze?

During a rewrite, the old system continues to run but typically receives no new features. Can the business tolerate that?

If the system needs new features immediately: Refactor. You can improve the system while continuing to deliver business value.
If the business can wait 6–12 months: A rewrite becomes more viable — but estimate carefully.

4. What's the risk tolerance?

A rewrite has a bimodal outcome: it either succeeds spectacularly or fails expensively. A refactor has a more predictable outcome: gradual, measurable improvement.

Low risk tolerance (regulated industry, production-critical system): Refactor. You cannot afford a failed cutover.
High risk tolerance (early-stage startup, internal tool): A rewrite is more acceptable if the timeline is short enough.

5. What does the data say?

Measure before deciding. You need:

Metric	What it tells you
Deployment frequency	How much friction exists in the delivery pipeline
Mean time to recovery (MTTR)	How quickly you can fix production issues
Change failure rate	What percentage of deployments cause incidents
Test coverage on critical paths	How much confidence you have in changes
Developer onboarding time	How long before a new developer is productive
Cyclomatic complexity (P95)	How resistant the codebase is to safe modification

If deployment frequency is high, MTTR is low, and the system is mostly working — refactor. The system's problems are at the code level, not the architecture level.

If deployment frequency is near-zero, MTTR is measured in days, and every change breaks something — the architecture is the problem, and refactoring within it won't help.

The hybrid option: strangler fig

The best answer is often neither a pure rewrite nor a pure refactor, but a controlled extraction. The strangler fig pattern lets you:

Identify the worst part of the system.
Build a replacement for that part on a modern stack.
Route traffic to the new implementation gradually.
Delete the old code once the replacement is proven.
Repeat for the next worst part.

This combines the benefits of a rewrite (clean new code) with the safety of a refactor (no big-bang cutover). It takes longer than a rewrite would in theory, but it delivers value incrementally and carries dramatically lower risk.

Making the call

Situation	Recommendation
Architecture is sound, code is messy	Refactor
Architecture is broken, team understands the system	Strangler fig
Architecture is broken, no institutional knowledge	Audit first, then decide
Runtime is EOL, no migration path	Strangler fig rewrite
Business needs features now	Refactor (you can ship while improving)
Internal tool, small user base	Rewrite may be acceptable
Revenue-critical production system	Strangler fig (never big-bang)

If you're facing this decision and want a second opinion grounded in data rather than instinct — a system audit maps the architecture, quantifies the debt, and gives you the evidence to make the call with confidence. It's a two-to-three week engagement, not a multi-month commitment.