The Software Factory Is Coming
What shape will autonomous software development take, how soon could it arrive, and what does it mean for a $300+ billion market?
What if building enterprise software looked less like hiring a development team and more like placing an order at a factory? I spent weeks stress-testing this idea against the evidence. Here's what holds up, what doesn't, and six futures for who captures the value.
The Vision
Imagine a platform where anyone — a founder, a department head, a solo entrepreneur — describes the software they need. In the background, a coordinated fleet of AI agents turns that description into a working product. The platform provider continuously optimizes this pipeline: reducing time-to-value, lowering token costs, improving output quality. Think of what Amazon did with logistics infrastructure, but for software creation.
The user wouldn’t need to understand code. They’d need to articulate what they want — and the platform would help them do that well. A sophistication slider would let them choose between a quick prototype and an enterprise-grade system, with pricing scaled accordingly.
If this works at scale, the implications are enormous. The global enterprise software market stands at roughly $317 billion in 2025 (over $660 billion using Gartner’s broader definition that includes infrastructure and security software). SaaS alone is growing at 19% annually. Even capturing a fraction of the value currently locked up in custom development, integration, and maintenance represents a massive opportunity.
To understand how realistic this is, it helps to break the software lifecycle into a value chain and ask: where can AI actually deliver today?
| Stage | % of lifecycle cost (approx.) | AI capability — Feb 2026 | Gap |
|---|---|---|---|
| Requirements & design | 5–10% | Good for structured prompts. Poor at capturing unstated business context, domain nuance, regulatory constraints | Large |
| Code generation | 10–15% | Strong and accelerating fast. ~80% on SWE-bench Verified. Multi-agent orchestration proven at scale | Small |
| Testing & QA | 15–25% | AI generates unit tests well. Integration, security, and performance testing remain gaps. Only ~7% adoption for AI-assisted testing | Medium |
| Deployment & ops | 5–10% | CI/CD automation mature. AI-assisted monitoring emerging | Medium |
| Maintenance & evolution | 40–60% | Early-stage but moving. AI bug triage, security patch automation, and dependency updates emerging. AI starting to scan telemetry and suggest fixes | Large |
The central irony: the stage where AI excels — code generation — accounts for only 10–15% of total lifecycle cost. The stage that dominates the economics — maintenance and evolution, which research consistently puts at 60% or more — is where AI has the most room to grow. That gap defines who captures value and how fast the factory arrives.
The Evidence — February 2026
Here’s what makes this analysis different from the hundreds of “AI will eat software” takes: I’m trying to be specific about what’s real, what’s hype, and what changed in the last few months.
What’s working in production
Boris Cherny, creator and head of Claude Code at Anthropic, hasn’t written a single line of code by hand since November — with 100% of his output authored by AI — while shipping 10 to 30 pull requests per day. Anthropic reports a 200% increase in engineer productivity since adopting Claude Code internally. Cowork, a desktop tool for non-developers, was built entirely by AI agents in 10 days.
Spotify’s co-CEO Gustav Söderström revealed on the company’s February 2026 earnings call that their most experienced developers “have not written a single line of code since December.” Using an internal system called Honk, built on Claude Code, engineers fix bugs and ship features from Slack on their phones — and merge to production before arriving at the office. Spotify shipped over 50 new features in 2025.
Rakuten tested Claude Code on implementing an activation vector extraction method in vLLM — a 12.5-million-line codebase. Claude Code finished the job in seven hours of autonomous work, achieving 99.9% numerical accuracy.
Claude Code now accounts for 4% of all public GitHub commits, with daily active users doubling in the past month. SemiAnalysis predicts 20%+ of all daily commits by end of 2026.
Frontier experiments
Anthropic’s C compiler (February 2026): 16 AI agents worked in parallel for two weeks, producing a 100,000-line Rust compiler that builds Linux 6.9 on three architectures. It passes 99% of the GCC torture test suite. Cost: $20,000. Previous model generations couldn’t produce anything functional — the jump happened within one model generation.
Cursor’s autonomous browser (January 2026): Hundreds of AI agents coordinated to build a web browser from scratch — 3 million lines of code in one week. The CEO’s verdict? “It kind of works.” Independent reviewers found CI/CD pipelines failing throughout most of the experiment.
Vibe coding platforms: Lovable reportedly reached $100M ARR in 8 months. Replit jumped from $10M to $100M in 9 months after launching their Agent. Demand is massive. But users consistently hit a wall: around 15–20 components, context retention degrades and AI starts creating more bugs than it fixes.
"The thought of programmers deploying software they've never personally verified is a real concern. So, while this experiment excites me, it also leaves me feeling uneasy."
The skeptics aren’t wrong — but the data is aging fast
The strongest counter-evidence comes from METR’s randomized controlled trial (July 2025): sixteen experienced open-source developers completed 246 real tasks and were 19% slower with AI tools. Bain & Company described enterprise savings as “unremarkable.” The Stack Overflow 2025 Developer Survey showed favorable views of AI tools dropping, with 46% questioning output accuracy. But a critical caveat: the METR study measured Cursor Pro with Claude 3.5/3.7 Sonnet — models that are now two generations old. Since that study, Claude Opus 4.5 and 4.6 arrived, Claude Code went from experimental to 4% of GitHub commits, and Anthropic’s engineers report 200% productivity gains. The evidence base has a shorter shelf life than the publication cycle.
What's proven: AI produces production-quality code when directed by skilled engineers. Multi-agent orchestration at scale is feasible. Demand is massive. At the frontier — Anthropic, Spotify, Rakuten — productivity gains of 200%+ are reported.
What's not proven: That these frontier results generalize to average teams, legacy codebases, and complex enterprise environments. Whether the factory can work without the kind of skilled technical judgment that Cherny and Spotify's senior engineers bring.
The key question: The distance between "experts directing AI" and "anyone ordering software from a platform" is where the remaining difficulty lives. How fast does that gap close?
The Timeline
In his recent conversation with Dwarkesh Patel (February 2026), Anthropic CEO Dario Amodei put specific confidence levels on his predictions. He describes a “country of geniuses in a data center” — AI systems matching or exceeding Nobel Prize–level intellectual capabilities. His confidence: 90% this arrives within 10 years. His personal hunch — framed as a 50/50 bet — is one to two years.
"We don't have the country of geniuses in the data center now. That is very clear. But I have a hunch — this is more like a 50/50 thing — that it's going to be more like one to two, maybe one to three years."
On coding specifically, he calls end-to-end automation essentially guaranteed on a 10-year horizon. He notes AI already writes 90% of lines at some organizations — but stresses this is “a very weak criterion” and worlds apart from not needing software engineers.
What makes his framing most relevant to the factory vision is the distinction between capability and economic diffusion. Even after the capability exists, trillions in revenue take years to materialize — due to procurement, compliance, change management, and enterprise inertia. His metaphor: we had the COVID vaccine fast, but it took a year and a half to distribute. Amodei has every incentive to be optimistic. But his 10-year confidence seems well-grounded. The 50/50 hunch on one-to-two years is exactly that — a hunch.
Cherny’s framing is complementary. He considers coding “largely solved” and sees AI already moving beyond code generation into maintenance-adjacent work — scanning feedback, bug reports, and telemetry to suggest features and fixes autonomously. His advice to product teams: build for the model six months from now, not today.
Mapping these predictions against the value chain:
| Timeframe | Capability | Factory implication |
|---|---|---|
| Now–2027 | End-to-end coding for well-specified tasks. Multi-agent orchestration proven. | Factory works for prototypes, simple apps, and internal tools. Skilled engineers direct AI for production work. |
| 2027–2029 | "Country of geniuses" capability arrives. Testing and deployment gaps close. | Factory viable for medium-complexity applications. Early adopters run real operations on AI-generated software. |
| 2029–2032 | Economic diffusion. Enterprises adopt at scale. Maintenance and governance mature. | Factory extends to managed IT landscapes. SaaS vendors face existential pressure. |
| 2032–2035 | Deep domain knowledge captured. Complex regulatory logic handled reliably. | SAP-class systems rebuildable from specifications. The concept of buying "software" as a product begins to dissolve. |
Six Possible Futures
This is the part I find most useful for strategic thinking. Rather than predicting a single outcome, I mapped six paths the factory could take. These are not mutually exclusive — the real future will combine elements of several. The probabilities express which dynamic I believe will dominate.
One or a few dominant platforms emerge as the "factory" — AWS-scale but for application generation. Winner-take-most dynamics. Most SaaS vendors either die or become domain knowledge providers feeding into these platforms. Network effects from accumulated optimization data could drive consolidation, but open source and vertical complexity push back hard.
The factory never becomes a separate platform because frontier model providers build it directly into their offerings. "Build me an app" becomes as natural as "write me an email." This is where Claude artifacts, Cowork, and ChatGPT's code interpreter are already heading. The most capital-efficient outcome — billions of R&D already flowing this direction. The main gap: frontier labs lack enterprise sales motions and domain expertise.
No single factory wins. Instead, dozens of vertical-specific factories emerge — healthcare, fintech, logistics — each embedding domain knowledge that horizontal platforms can't match. History favors this: Shopify dominated commerce without becoming the everything-platform. But if models become general enough, domain knowledge can be supplied through context rather than specialization.
Frontier-quality models become freely available. Orchestration frameworks commoditize. The factory becomes a recipe, not a product. The evidence: DeepSeek, Qwen3-Coder, and Meta's Llama have compressed the gap between open and closed models to six-to-twelve months on key benchmarks. MCP and A2A protocols now sit with the Linux Foundation. But raw model capability is only part of the factory — and the convenience premium is real. Even with open-source web frameworks available for twenty years, most businesses use AWS and Shopify.
Enterprises resist far longer than expected. Procurement, security, compliance, internal politics, and existing vendor contracts create massive inertia. AI-generated software stays limited to internal tools and prototypes. The boring outcome — but don't underestimate it. Only 6% of companies fully trust AI to run core business practices.
The concept of "an application" disappears. People interact with agents that dynamically assemble capabilities on the fly. No app to build, deploy, or maintain. No factory needed because there's no product to manufacture. The most radical outcome — least likely on a 10-year horizon, but possibly inevitable on a longer one.
What This Means
Follow the logic one step further and the factory doesn’t just build software — it manages entire IT landscapes. “We give you the IT you need and maintain it. You think about your business.” This is what cloud providers promised but never fully delivered, because the abstraction layer was too low. If the factory can generate, deploy, integrate, and maintain — that’s the real unlock. Not Software-as-a-Service. Business-as-a-Service.
Three dynamics cut across all six paths. The value of understanding what a business needs — and translating between human intent and technical capability — goes up, not down. Open source acts as a gravitational force across every path, compressing margins and distributing capability even where it doesn’t “win” outright. And the generation layer is racing ahead while the reliability layer — trust, verification, governance — is only now beginning to follow. Whoever closes that gap first captures the most value.
If you’re a technology leader: the factory changes what you build in-house versus what you buy. The strategic question isn’t “should we use AI coding tools” — it’s which of these six futures you’re positioning for, and what capabilities become core versus commodity in each.
If you’re a developer: the floor is rising fast. Cherny and Spotify’s engineers suggest the latest models have crossed a production threshold. Invest in the skills at the top of the value chain — requirements, architecture, the judgment calls that can’t be verified against a test suite.
If you’re a founder: Lovable and Replit prove people will pay for AI-generated software. But the durable moat isn’t code generation — that’s commoditizing in real time. The defensible position is the reliability layer: whoever makes AI-built software trustworthy enough to run a business on.
The factory is coming. Faster than skeptics expect. Slower than optimists promise. The open question isn’t whether — it’s who builds it, what shape it takes, and who captures the value.