When Multi-Agent Architecture Is the Wrong Answer

The Google AI Overview for "multi-agent architecture" reads like a marketing pitch. Pure-positive language about scalability, modularity, increased complexity handling, enhanced reliability. Every benefit is real on the right problem. None of the AI Overview mentions the cases where multi-agent is the wrong answer — and there are several, named explicitly by Anthropic's own engineering retrospective on the company's multi-agent Research feature.
This article is the contrarian counterweight. The AI Overview tells you when multi-agent works. This tells you when it doesn't.
Anthropic's published exclusions
Anthropic publishes Claude. They published the multi-agent Research feature retrospective. They have every commercial incentive to celebrate the architecture. And they specifically wrote, in the same blog post:
Some domains that require all agents to share the same context or involve many dependencies between agents are not a good fit for multi-agent systems today. For instance, most coding tasks involve fewer truly parallelizable tasks than research, and LLM agents are not yet great at coordinating and delegating to other agents in real time.
That's a deliberate non-recommendation from the company that publishes Claude Code. The reason matters. Anthropic's own coding tool is single-agent, despite the company having more multi-agent expertise than almost anyone. They didn't ship a multi-agent Claude Code because they concluded the architecture would make the product worse.
The exclusions Anthropic names work as a generalizable rule. Multi-agent is a poor fit when the work requires:
- Shared evolving context that all agents need to operate on
- Many dependencies between agents (each waiting on the others)
- Real-time coordination and delegation
When any of those three conditions is true, multi-agent imports overhead without delivering its parallelism advantage. The system gets slower, more expensive, and harder to debug — the inverse of what the architecture was supposed to provide.
The cost cliff most teams don't model
The other reason multi-agent is the wrong answer in many places: the unit economics don't work. Anthropic's published number — multi-agent systems use roughly 15× the tokens of an equivalent chat interaction, and roughly 4× the tokens of a single-agent system — is the constraint to internalize.
For a workflow that runs at high volume with thin margins, that 4-15× multiplier is decisive. A customer support agent that handles 50,000 tickets a month at $0.10 per ticket on a single-agent setup becomes $0.40-$1.50 per ticket on multi-agent. The single-agent version costs $5,000 a month. The multi-agent version costs $20,000-$75,000. The performance gain — usually marginal on routine tickets, because routine tickets don't need the parallelism multi-agent provides — doesn't justify the multiplier.
A workable economic test before committing to multi-agent: multiply your expected single-agent monthly token spend by 15. Ask whether you'd still ship at that price. If the answer is no, multi-agent isn't economically viable for that workload regardless of how much it might marginally improve quality.
Five workflows where multi-agent is almost always wrong
Five concrete categories where the structural mismatch shows up.
Coding tasks. Anthropic's explicit exclusion. Coding requires shared evolving context — the codebase, recent changes, build errors — that doesn't decompose cleanly into parallel subtasks. Single-agent tools (Claude Code, Cursor, GitHub Copilot's Workspace) have outperformed multi-agent coding experiments consistently.
Customer-facing chat. Two-second latency budgets are common. Multi-agent adds latency at every handoff. A user waiting six seconds for an answer drops off, even if the answer is better than the single-agent version. The latency cost outweighs the quality gain for most chat applications.
High-volume low-margin processing. Email triage, basic content moderation, simple classification — anything where the per-task value is under a few dollars and the volume is high. The 15× token multiplier turns the project's unit economics negative. Single-agent or simple workflow patterns ship; multi-agent doesn't.
Decisions that need to be deterministic. Multi-agent introduces additional sources of variance — each agent's prompt, the orchestrator's coordination, the merging of subagent outputs. For decisions that must be repeatable for audit purposes, the additional non-determinism is a problem. Stick with single-agent or pure rules-based systems for these.
Workflows with serial dependencies. When step 2 can't start until step 1 is done, parallelism is illusory. The "multi-agent" version is just a sequential pipeline. Build it as a sequential pipeline. Calling it multi-agent imports orchestration overhead for no benefit.
Why teams build multi-agent anyway
Three reasons in roughly the order I see them.
The architecture diagram looks impressive. Multi-agent diagrams have many boxes connected by many arrows. They photograph well in a board deck. Single-agent looks boring on a slide. Vendor pitches lean into the diagram bias.
Frameworks make it easy to spin up four agents in an afternoon. LangGraph, AutoGen, CrewAI all let you stand up a multi-agent system fast. Easy to spin up isn't the same as cheap to run, simple to debug, or correct under load. The teams who build multi-agent because the framework made it easy usually rebuild it as single-agent within six months.
The engineer wanted to learn multi-agent. A real and honest motivation, but not a reason for the workload to be multi-agent. Find a project where the architecture genuinely fits and use that one to build the skill.
None of these are good reasons. Each one shows up in failed multi-agent rollouts.
A working test before you build multi-agent
Five questions. If you can't answer yes to at least four, build single-agent or sequential pipeline instead.
1. Is the work decomposable into 3+ independent subtasks that can run in parallel?
2. Does each subtask need different specialization (different prompts, different tools, different context)?
3. Does the per-task value support 4-15× the cost of the single-agent equivalent?
4. Is latency at the level of seconds (not milliseconds) acceptable?
5. Do you have observability and eval discipline to debug a multi-agent system in production?
Five yeses, build it carefully. Three or fewer, single-agent ships faster, costs less, and is easier to maintain. The multi-agent version isn't an upgrade — it's a tax you'll pay forever.
The honest multi-agent recommendation
Build single-agent first. Measure where it actually fails. Add a second agent only to address a specific failure mode the data confirms. Keep the architecture as small as the work allows.
This isn't anti-multi-agent. The 90.2% performance gain Anthropic reported on research tasks is real. The architecture is genuinely powerful when the work matches the shape. The problem is that most workloads don't match the shape, most teams build multi-agent before they've confirmed the shape, and the resulting systems pay 4-15× the cost of single-agent versions for marginal or negative quality gains.
The teams who run AI well in 2026 don't have the most agents. They have the right number — usually one. The most successful AI engineering doesn't end with "and then we added a second agent." It ends with "and then we shipped." Build the simplest version. Make it work. Add complexity only when the data forces the choice. Skip multi-agent for workloads that don't fit the shape Anthropic itself documented. The architecture is a tool, not a goal.
Frequently Asked Questions
Why does Anthropic say multi-agent is bad for coding?
Their published reasoning: most coding tasks involve fewer truly parallelizable subtasks than research, and current LLM agents are not yet great at coordinating and delegating to other agents in real time. A coding session typically requires shared evolving context that multi-agent systems don't handle well.
Are there enterprise tasks where multi-agent is almost always wrong?
Tasks with strict latency budgets (sub-second responses), high-volume low-margin tasks where the 15× token cost exceeds the value per task, regulated decisions where audit trails need to be deterministic, and any workflow where steps depend serially on each other. Each of these is a structural mismatch with multi-agent's strengths.
Is single-agent enough for most production AI work?
For most mid-market business workflows, yes. Anthropic's broader Building Effective Agents guidance recommends starting with single-agent and only adding multi-agent when a measured limit forces the choice. Most teams over-architect.
Sources
- Anthropic Engineering — How we built our multi-agent research system
- Anthropic Research — Building Effective Agents
- LangChain — Choosing the Right Multi-Agent Architecture
- Azure Architecture Center — AI Agent Orchestration Patterns
- NIST — AI Risk Management Framework
- McKinsey QuantumBlack — The state of AI in 2026

Founder, Tech10
Doreid Haddad is the founder of Tech10. He has spent over a decade designing AI systems, marketing automation, and digital transformation strategies for global enterprise companies. His work focuses on building systems that actually work in production, not just in demos. Based in Rome.
Read more about Doreid


