The hidden bottleneck isn't your model. It's your architecture.
You can have the best LLM, the perfect prompts, and unlimited context. But if your architecture is wrong, you'll get 100× slower performance, 30% accuracy drops, and coordination failures. The good news? Architecture is something you control. These results show exactly how to design systems that scale.
Most teams optimize the wrong things. Here's what actually moves the needle.
Note: All numerical values and tables on this page should be verified against the paper PDF for the most accurate and up-to-date results.
Same LLM. Same task. Only architecture changes. The results are stark: architectural choices alone create order-of-magnitude performance differences. This isn't theory—it's measured across multiple frameworks with controlled experiments.
| Architecture design | Latency | Accuracy | Coordination |
|---|---|---|---|
| Lightweight orchestration | ~1.3× | high | high |
| Role-based pipelines | 5–30× | medium | medium |
| Simulation-style agents | 100×+ | low | fails |
Results from controlled experiments in MAFBench across multiple frameworks with identical LLM models and tasks.
The execution model, scheduling overhead, and interaction semantics determine how agents are coordinated. Results show that orchestration overhead can create latency multipliers ranging from 1.3× to over 117× compared to direct LLM calls.
| Orchestration type | Latency impact |
|---|---|
| Direct calls | baseline |
| Graph workflows | moderate |
| Simulation loops | extreme |
The choice between accumulation and retrieval strategies fundamentally affects performance. Accumulation leads to runtime blow-up and recall degradation as context grows. Retrieval-based approaches maintain stable performance with bounded cost. Experiments show memory scores ranging from 6.1% (accumulation) to 23.8% (retrieval).
| Memory strategy | Accuracy | Cost |
|---|---|---|
| Retrieval | stable | bounded |
| Accumulation | drops | grows rapidly |
Free-form reasoning allows LLMs to plan naturally, while rigid schema constraints force predefined formats that break LLM reasoning patterns. Results show accuracy changes ranging from +15% (free-form iterative) to -30% (rigid schema), with runtime multipliers from 1.2× to 30×.
| Interface | Accuracy change |
|---|---|
| Free-form | improves |
| Rigid schema | −30% |
Procedural guidance and expert conditioning enable agents to specialize effectively. Role clarity through procedural expert-guided conditioning improves F1 scores by up to 58 points compared to generic agent designs.
| Conditioning | F1 gain |
|---|---|
| None | baseline |
| Expert-guided | +58 |
Connectivity patterns determine information flow and coordination limits. Fully connected topologies enable success rates above 90%, while geometric topologies with isolated clusters drop below 30%. The topology structure directly limits coordination capability.
| Topology | Success rate |
|---|---|
| Fully connected | >90% |
| Small-world | medium |
| Geometric | <30% |
Understanding the root causes is the first step to building better systems.
Framework orchestration introduces overhead at every interaction point. Sequential execution models force agents to wait, while simulation-style loops create exponential call chains. Architecture alone drives latency from 1.3× to over 117× with identical models.
Accumulation strategies grow context linearly with each interaction, leading to exponential runtime cost increases. Retrieval-based approaches maintain bounded cost by accessing only relevant information when needed. Memory structure, not context size, determines performance.
LLMs reason through natural language patterns. Forcing rigid schema constraints conflicts with these patterns, causing formatting failures and reasoning breakdowns. Free-form interfaces that match LLM capabilities improve accuracy, while rigid schemas reduce it by up to 30%.
Communication topology determines information diffusion speed and coordination capability. Isolated clusters prevent agents from reaching consensus, while fully connected networks enable fast agreement. Topology structure alone can change coordination success from above 90% to below 30%.
These architectural choices don't just affect benchmarks—they determine whether your system ships, scales, and succeeds.
Learn how to make the architectural choices that actually matter.
→ Architecture Guide: Design Systems That Scale