Multi-Agent Architectures: Orchestration Patterns That Work in Production

Supervisor, router, chain, and consensus patterns for multi-agent systems. Failure modes, recovery strategies, and production code.

Robert Ta's Self-Model CEO & Co-Founder

· March 20, 2026 · 1 min read

TL;DR

Four orchestration patterns dominate production multi-agent systems: supervisor (central coordinator), router (classifier-based dispatch), chain (sequential handoff), and consensus (parallel execution with aggregation)
Each pattern trades off between coordination overhead, failure isolation, latency, and complexity — there is no universal best pattern
Failure recovery is the difference between a demo and a production system: timeouts, circuit breakers, fallback agents, and state checkpointing are non-negotiable

Multi-agent systems decompose a complex task into subtasks handled by specialized agents. The promise: agents that are experts in their domain produce better results than a single generalist agent handling everything. The reality: the orchestration layer that coordinates these agents is where production systems succeed or fail.

This guide covers the four orchestration patterns that work in production, their failure modes, and the recovery mechanisms that make them reliable.

orchestration patterns covered

0–8

agents in a typical production system

of production issues come from orchestration, not individual agents

reasonable timeout for a multi-agent pipeline

Pattern 1: Supervisor

The supervisor pattern uses a central coordinator agent that receives the user request, decomposes it into subtasks, delegates each subtask to a specialist agent, and synthesizes the results into a final response.

supervisor_pattern.py

1class SupervisorAgent:← Central coordinator that plans, delegates, and synthesizes
2    def __init__(self, planner_llm, specialists: dict[str, Agent]):
3        self.planner = planner_llm
4        self.specialists = specialists
5
6    async def run(self, request: str) -> SupervisorResult:
7        # Step 1: Plan — decompose into subtasks← The supervisor decides what needs to happen and in what order
8        plan = await self.planner.generate(
9            prompt=f'Decompose this request into subtasks: {request}',
10            schema=TaskPlan
11        )
12
13        # Step 2: Delegate — send each subtask to the right specialist
14        results = {}
15        for task in plan.tasks:
16            agent = self.specialists[task.agent_type]
17            try:← Each delegation has a timeout — no unbounded waits
18                result = await asyncio.wait_for(
19                    agent.execute(task),
20                    timeout=task.timeout_seconds
21                )
22                results[task.id] = result
23            except asyncio.TimeoutError:
24                results[task.id] = TaskResult(status='timeout', output=None)
25
26        # Step 3: Synthesize — combine results into final response← Synthesis handles partial failures — missing results are noted, not fatal
27        return await self.synthesize(request, plan, results)

When to use: The supervisor pattern works well when tasks have clear dependencies (task B depends on the output of task A), when the decomposition logic is complex enough to benefit from LLM planning, and when you need a single point of coordination for monitoring and debugging.

Failure modes:

Supervisor bottleneck: Every request passes through the supervisor. If the planning LLM is slow or the synthesis step is expensive, the supervisor becomes the throughput bottleneck.
Planning errors: The supervisor might decompose the task incorrectly — assigning a subtask to the wrong specialist, missing a required subtask, or creating unnecessary subtasks. Planning errors cascade into wasted computation and incorrect results.
Single point of failure: If the supervisor fails, the entire pipeline fails. Unlike the router pattern, there is no fallback path that bypasses the coordinator.

Mitigation: Add a planning validation step where the supervisor checks its own plan against a schema before executing. Cache plans for recurring request types to avoid repeated planning overhead. Implement supervisor-level circuit breakers that switch to a simplified pipeline when the supervisor is degraded.

Pattern 2: Router

The router pattern uses a classifier (LLM-based or trained) to dispatch each request to a single specialist agent based on the request type. There is no central planning or synthesis — each request goes to exactly one agent.

router_pattern.py

1class RouterAgent:← Classifier-based dispatch — each request goes to one specialist
2    def __init__(self, classifier, specialists: dict[str, Agent], fallback: Agent):
3        self.classifier = classifier
4        self.specialists = specialists
5        self.fallback = fallback
6
7    async def run(self, request: str) -> RouterResult:
8        # Classify the request← Classification must be fast — this runs on every request
9        classification = await self.classifier.classify(request)
10
11        # Route to specialist or fallback
12        if classification.confidence > 0.8:← High confidence: route to specialist
13            agent = self.specialists.get(classification.category)
14            if agent:
15                return await self.execute_with_fallback(agent, request)
16
17        # Low confidence or unknown category: use fallback← Fallback handles anything the router cannot classify
18        return await self.fallback.execute(request)
19
20    async def execute_with_fallback(self, agent: Agent, request: str) -> RouterResult:
21        try:
22            result = await asyncio.wait_for(agent.execute(request), timeout=15)
23            if result.quality_score < 0.5:
24                return await self.fallback.execute(request)  # Quality gate
25            return result
26        except (asyncio.TimeoutError, AgentError):
27            return await self.fallback.execute(request)

When to use: The router pattern works well when requests fall into distinct categories, each category maps cleanly to a specialist, and you do not need to combine multiple specialists’ outputs. Customer support triage (billing questions, technical support, account management) is the canonical router use case.

Failure modes:

Misclassification: The router sends a request to the wrong specialist. The specialist produces a confident but incorrect response because it is operating outside its domain. This is harder to detect than a failure — the output looks plausible but is wrong.
Category gaps: A request that does not fit any category gets routed to the fallback agent. If the fallback is a generalist model, quality drops. If there is no fallback, the request fails.
Boundary ambiguity: Requests that span multiple categories (a billing question that requires technical context) get routed to one specialist that only has part of the required knowledge.

Mitigation: Monitor misclassification rates by comparing the router’s classification with post-hoc evaluation of the specialist’s response quality. Add quality gates after specialist execution — if the output quality is below threshold, fall back to a generalist or re-route to a different specialist. Track category distribution and boundary cases to identify when new specialist categories are needed.

Pattern 3: Chain

The chain pattern connects agents in a fixed sequence. Each agent processes the request (or the previous agent’s output) and passes its result to the next agent. Think of it as a pipeline where each stage adds a transformation.

chain_pattern.py

1class ChainOrchestrator:← Sequential pipeline — each agent transforms and passes to the next
2    def __init__(self, agents: list[ChainAgent]):
3        self.agents = agents
4
5    async def run(self, request: str) -> ChainResult:
6        context = ChainContext(original_request=request, current_input=request)
7        checkpoints = []
8
9        for i, agent in enumerate(self.agents):← Each agent sees the previous agent's output as input
10            try:
11                # Checkpoint state before each step
12                checkpoints.append(context.snapshot())
13
14                result = await asyncio.wait_for(
15                    agent.process(context),
16                    timeout=agent.timeout_seconds
17                )
18                context.current_input = result.output
19                context.chain_history.append(StepResult(agent=agent.name, output=result))
20
21            except asyncio.TimeoutError:← On failure: either skip the step or roll back to last checkpoint
22                if agent.required:
23                    return ChainResult(status='failed', step=i, context=context)
24                # Optional step: skip and continue
25                context.chain_history.append(StepResult(agent=agent.name, skipped=True))
26
27        return ChainResult(status='complete', output=context.current_input, context=context)

When to use: The chain pattern works well when the task has a natural sequential structure: extract entities, then classify them, then generate a summary based on the classified entities. Each agent in the chain has a clear, narrow responsibility. The chain pattern is also the easiest to debug because the execution path is fixed and each intermediate output can be inspected.

Failure modes:

Error propagation: If agent 2 in a 5-agent chain produces a bad output, agents 3-5 operate on corrupted input. The final output may look plausible but is based on an early-stage error that is hard to trace back.
Latency accumulation: Total latency is the sum of all agents’ latencies. A 5-agent chain where each agent takes 3 seconds has a 15-second minimum latency.
Rigidity: The fixed sequence cannot adapt to inputs that would benefit from a different processing order or skipping unnecessary steps.

Mitigation: Add quality checkpoints between agents — verify the output of each stage before passing it to the next. Mark agents as required or optional so the chain can skip non-essential steps on failure. Implement state checkpointing so the chain can resume from the last successful step rather than restarting.

Pattern 4: Consensus

The consensus pattern runs multiple agents in parallel on the same task and aggregates their outputs. This is useful when reliability matters more than speed — multiple independent opinions reduce the probability of a single agent’s error reaching the user.

consensus_pattern.py

1class ConsensusOrchestrator:← Run multiple agents in parallel, aggregate for reliability
2    def __init__(self, agents: list[Agent], aggregator, min_agreement: float = 0.6):
3        self.agents = agents
4        self.aggregator = aggregator
5        self.min_agreement = min_agreement
6
7    async def run(self, request: str) -> ConsensusResult:
8        # Run all agents in parallel← Parallel execution — total latency is max(agent latencies), not sum
9        tasks = [agent.execute(request) for agent in self.agents]
10        results = await asyncio.gather(*tasks, return_exceptions=True)
11
12        # Filter out failures
13        valid_results = [
14            r for r in results if not isinstance(r, Exception)
15        ]
16
17        if len(valid_results) < 2:
18            return ConsensusResult(status='insufficient_responses', confidence=0)
19
20        # Aggregate — check agreement← Measure agreement between agents to assess confidence
21        consensus = self.aggregator.aggregate(valid_results)
22
23        if consensus.agreement_score < self.min_agreement:
24            return ConsensusResult(
25                status='low_agreement',
26                confidence=consensus.agreement_score,
27                outputs=valid_results  # Return all for human review
28            )
29
30        return ConsensusResult(
31            status='consensus_reached',
32            output=consensus.merged_output,
33            confidence=consensus.agreement_score
34        )

When to use: High-stakes decisions where a single agent’s error is costly — medical triage, financial analysis, legal review. Also useful when you want to detect uncertainty: low agreement between agents signals that the task is ambiguous or that the agents lack sufficient information.

Failure modes:

Correlated errors: If all agents use the same underlying model, they tend to make the same mistakes. Running GPT-4 three times gives you three copies of the same bias, not three independent opinions. Use different model families or different prompting strategies to get genuine diversity.
Aggregation difficulty: For open-ended generation (as opposed to classification), merging multiple responses is non-trivial. The aggregator itself may introduce errors or lose nuance from individual responses.
Cost multiplication: Running N agents in parallel costs Nx. For tasks where a single agent is usually correct, the consensus pattern wastes compute.

Mitigation: Use diverse agent configurations — different models, different prompting strategies, different context. For classification tasks, use majority voting. For generation tasks, use an LLM aggregator that identifies the consensus elements across responses. Set a cost budget and only apply consensus to high-stakes requests identified by a lightweight classifier.

Choosing a Pattern

Pattern Selection Criteria

×Supervisor: tasks have dependencies, need central coordination
×Router: requests fall into distinct non-overlapping categories
×Chain: task has a natural sequential processing order
×Consensus: reliability matters more than latency or cost

What Each Pattern Optimizes For

✓Supervisor: flexibility and complex task decomposition
✓Router: latency and throughput (single agent per request)
✓Chain: debuggability and narrow agent responsibilities
✓Consensus: reliability and uncertainty detection

In practice, production systems combine patterns. A router dispatches requests to different chains. A supervisor uses consensus for high-stakes subtasks. A chain includes a router step that selects the next agent based on intermediate results.

Failure Recovery: The Production Requirement

Every orchestration pattern needs four failure recovery mechanisms to be production-ready.

1. Timeouts

Every agent call must have a timeout. Without timeouts, a single slow agent can block the entire pipeline indefinitely. Set timeouts based on the agent’s observed latency distribution — typically p99 latency plus a buffer.

2. Circuit Breakers

If an agent fails repeatedly (3-5 consecutive failures), stop sending it requests for a cooldown period. This prevents cascading failures where a degraded agent consumes resources while producing bad outputs. After the cooldown, send a single probe request to check if the agent has recovered.

3. Fallback Agents

Every specialist agent needs a fallback path. This might be a generalist agent that handles the task at lower quality, a cached response from a similar previous request, or a graceful degradation message that tells the user what the system cannot do right now.

4. State Checkpointing

For long-running multi-agent pipelines, checkpoint intermediate state so the pipeline can resume from the last successful step rather than restarting from scratch. This matters for chains and supervisor patterns where early steps may complete successfully before a later step fails.

Observability

Multi-agent systems require observability beyond what single-agent systems need. Track:

Per-agent latency: Identify bottleneck agents and set informed timeouts.
Per-agent error rate: Detect degraded agents before they affect users.
Orchestration overhead: Time spent in routing, planning, and synthesis versus time in agent execution.
Token consumption per agent: Identify agents that consume disproportionate tokens relative to their contribution.
End-to-end traces: Full request traces that show the path through the agent system, including retries and fallbacks.

Where Clarity Fits

Clarity’s self-model API adds user context to multi-agent orchestration. A supervisor that understands the user can make better delegation decisions — routing technical users to detailed analysis agents and non-technical users to summary agents. A router that knows the user’s history can disambiguate requests that would otherwise be ambiguous. The self-model is the shared context that makes agent coordination user-aware.

Key Takeaways

Four patterns cover most production multi-agent architectures: supervisor, router, chain, and consensus — each trades off between coordination, latency, reliability, and cost
The orchestration layer is where production systems succeed or fail — individual agent quality matters less than how agents are coordinated
Failure recovery (timeouts, circuit breakers, fallbacks, checkpointing) is the difference between a demo and a production system
Observe per-agent metrics, not just end-to-end metrics — bottleneck agents and cascading failures are invisible in aggregate dashboards
Combine patterns: production systems use routers that dispatch to chains, supervisors that use consensus for high-stakes subtasks, and chains with routing steps

Building AI that needs to understand its users?

Talk to us →

Key insights

“A multi-agent system is a distributed system with all the failure modes of distributed systems plus all the failure modes of LLMs. Design accordingly.”

Share this insight

“The supervisor pattern works until the supervisor becomes the bottleneck. The router pattern works until the routing model makes a bad decision. Every pattern has a failure mode — know yours.”

Share this insight

Stay sharp on AI personalization

Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.

Daily articles on AI-native products. Unsubscribe anytime.

We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.

Subscribe to Self Aligned →

Scaling Agents from 1 to 50 Without Losing Context

Multi-agent systems lose user context at handoff boundaries. Self-models provide a shared understanding layer that scales agent architectures without context loss.

Robert Ta's Self-Model

13 min read