Member-only story
Scaling to 120+ AI Agents Without Losing Control
Devang Vashistha26 min read·Just now--
How two-tier orchestration keeps multi-agent systems debuggable
When Single-Agent Systems Fall Apart
You know the moment. You built a perfectly capable AI agent that writes code, answers questions, and searches through your docs. It works great. Then you ask it to review code for security issues and synthesize three different research papers. It returns something that’s half right and half wrong, delivered with full confidence.
I used to think this was a model problem. Better prompts, bigger context window, maybe switch to the latest Sonnet release. Wrong. The problem is architectural, and no amount of prompt engineering fixes it.
A single agent with 40+ tools, a 2,000-word prompt over five different domains, and retrieval tuned for one job at a time collapses. Context windows get bloated. Tool selection becomes a mess. Quality tanks.
This happened to me with Screech, a personal agent I built for my side projects. It started simply, basically a smarter search over my notes. Then I kept adding: code generation, documentation, code reviews, security audits, and research synthesis. The single-agent approach worked beautifully until it very suddenly didn’t.
The stack is not exotic. It’s VoltAgent for runtime and workflows, SurrealDB as the “one DB to store everything” experiment, and Claude as the default model tier.