Two Layers, One System: Separating the Brain from the Muscles

The architectural pattern delivering the most consistent results in production separates the system into two distinct responsibilities: a deterministic control layer that manages workflow progression, and a creative execution layer where AI agents do their actual work. Getting this separation right is, in my assessment, the single highest-leverage design decision in any agentic development system.

This isn't just a custom pattern emerging from early adopters. It's being formalized into production toolkits. Google's Agent Development Kit (ADK), for instance, codifies exactly this separation in its agent type taxonomy. It distinguishes between LLM-based agents (non-deterministic, using models like Gemini for reasoning and creative work) and workflow agents (deterministic, executing sub-agents in fixed sequences, parallel branches, or iterative loops). The ADK's SequentialAgent, ParallelAgent, and LoopAgent types are all deterministic workflow primitives. They don't reason. They orchestrate. The reasoning happens inside the LLM-based agents they dispatch. When I see a major cloud platform encoding the same architectural pattern that emerged independently across multiple production teams, that tells me the pattern has crossed from experimentation into industrial consensus.

Let me explain with something I observed during a large mining company engagement. Early in our exploration of multi-agent systems for operational analytics, we experimented with letting the agent system self-organize. The agents would assess what needed to happen next, choose their own sequencing, and decide where to store their outputs. On small, contained problems, this worked surprisingly well. The system could hold the entire problem in its working memory and make reasonable choices about what came next.

But when we scaled this to a complex, multi-module codebase with interdependent components and several features being developed in parallel? The wheels came off. The agent system would jump ahead to implementation before the design was settled. It would create dependency tangles between tasks. Sometimes it would fall into a loop, reconsidering the same design question repeatedly without converging on a decision. Anyone who's managed a team of junior engineers will recognize these failure modes. Except with agents, they happen faster and at larger scale.

A November 2025 research paper by Drammeh and colleagues (arXiv:2511.15755) put hard numbers on this. Across 348 controlled experiments comparing single-agent and multi-agent orchestrated approaches, the single-agent setup produced useful, actionable outputs only 1.7 percent of the time. The orchestrated multi-agent system achieved actionable results in every single trial. That's not an incremental improvement. That's the difference between a tool you can rely on and one you can't.

The takeaway is clear: AI agents are remarkably capable at producing work when given a well-defined, bounded task. They are remarkably poor at managing their own workflow, deciding what to work on next, or coordinating across phases. So design the system accordingly: use conventional, deterministic software to handle the coordination, and let agents focus purely on the creative work within each phase.

The Control Layer: Boring by Design

The control layer is deliberately unintelligent. No language models, no neural networks, no stochastic behavior. It's a rule-based state machine (think CI/CD pipeline or workflow engine) that moves work through a sequence of defined phases based on explicit conditions.

Its responsibilities are straightforward:

Phase gating. A requirement must be complete and pass evaluation before implementation tasks can be derived from it. An architecture proposal must be reviewed before coding begins. These transitions follow explicit rules, not judgment.

Dependency resolution. If Task C depends on the output of Task A, the engine ensures Task A is finished before Task C begins. This is graph traversal, nothing more.

State tracking. Every artifact in the system (a requirement document, a design proposal, a task specification) carries a status: draft, under review, approved, done. The engine reads these statuses to determine what's ready to proceed, what's waiting on a dependency, and what's finished.

Agent dispatch. When a requirement reaches 'approved' status, the engine triggers task generation. When all tasks for a requirement reach 'done,' the engine marks the requirement complete. These are conditional triggers, not decisions. If X then Y.

Think of it like a conveyor belt in a manufacturing plant. The belt itself doesn't know anything about the products on it. It just moves items from station to station according to fixed rules. The intelligence lives at the stations (the agents), not in the conveyor.

The Execution Layer: Constrained Creativity

Within each phase, agents handle the work that actually requires intelligence: interpreting a business requirement and decomposing it into technical specifications, proposing technology choices with reasoned tradeoffs, writing implementation code and accompanying tests, generating documentation.

The most reliable implementations I've seen assign specialized agents to distinct tasks rather than relying on a single general-purpose agent. You have one agent whose entire job is analyzing requirements. Another focuses exclusively on architectural decisions. A third writes code. A fourth acts as a shared knowledge service that the others query when they encounter questions they can't answer from their immediate context.

This mirrors how well-functioning engineering organizations actually work. Your best requirements analyst doesn't write production code. Your best systems architect doesn't manage the test suite. Specialization allows each participant (human or agent) to be evaluated against a clear, narrow definition of quality, which makes the evaluation much more actionable than trying to assess a generalist who does everything adequately but nothing excellently.

The tradeoff is real: you need well-designed interfaces between the specialized agents, and the orchestration layer needs to manage those handoffs reliably. But the predictability gains are worth the engineering investment, especially in enterprise contexts where predictability is worth more than speed.

A January 2026 survey paper on multi-agent orchestration (arXiv:2601.13671) documented this pattern across industries. In one case, a large bank applied an agentic 'digital factory' to modernize legacy software across hundreds of applications. Different agents documented existing code, generated new modules, reviewed peers' output, and ran integration tests. Parallel execution with continuous quality checks cut development time by over 50 percent. The key wasn't any individual agent's intelligence. It was the orchestration layer managing handoffs, enforcing quality gates, and maintaining the dependency graph.

This resonates with something I learned building analytics platforms in Australia. The most common failure mode wasn't bad models or bad data. It was bad handoffs between teams. Data engineers built pipelines, analytics teams built models on top, and the two drifted apart because nobody managed the interface. Multi-agent systems face the identical coordination challenge, and the solution is the same: explicit contracts, automated validation at every handoff, and a central system that enforces sequencing.

This article is from The Agentic SDLC by Carlos Aggio.