The typical agent roster in a mature implementation includes four to five purpose-built agents, each defined by a narrow scope, explicit input/output contracts, and a clear rubric for what constitutes acceptable output:
The Requirements Agent receives a natural-language feature description and produces a structured specification artifact. It identifies boundary conditions, queries the project's knowledge base for existing conventions, and generates acceptance criteria that can be objectively verified. Its output follows a mandated template with defined sections and machine-parseable metadata. When it encounters a question it can't answer from existing project documentation, it doesn't improvise. It escalates to the knowledge service.
The Architecture Agent consumes an approved requirement and proposes the technical approach. It cross-references the project's established technology patterns, identifies where established precedent exists (and follows it), and proposes new conventions where gaps exist. Critically, it documents not just what it recommends but why, including which alternatives it considered and rejected. This produces a decision record that's immediately useful for both human reviewers and future agents working on related features.
The Task Agent takes a requirement and its associated architecture and decomposes them into concrete, implementable work packages. Each package specifies which files to create or modify, what other packages it depends on, and what the acceptance criteria are. The agent validates that its packages form a directed graph with no circular references and that every acceptance criterion from the parent requirement is addressed by at least one package.
The Coding Agent receives a single task package and produces implementation code plus tests. It reads the project's coding standards, respects the architectural boundaries defined in the architecture proposal, and works within strictly bounded scope: one task at a time, not the entire feature.
The Knowledge Service acts as the collective memory of the system. Other agents invoke it through structured calls whenever they encounter questions their immediate context can't answer. I cover this in depth in Chapter 7 because its design has outsized impact on overall system quality.
The Economics of Context Windows
Beyond the organizational clarity, there's a pragmatic reason specialization outperforms generalism in agent systems: context window budgets. A language model's context window (the total amount of information it can hold while reasoning) is a fixed, finite resource. Every token spent loading background context is a token unavailable for actual reasoning about the task at hand.
A generalist agent attempting to hold the project overview, architecture documentation, coding standards, the current requirement, the task specification, all relevant source files, and the test suite simultaneously will burn most of its context budget on retrieval and navigation, leaving little headroom for the creative work of actually producing good output.
Specialized agents load only what's relevant to their specific job. A requirements agent doesn't need source code. A coding agent doesn't need the full set of architecture decision records. Each agent operates with a focused context, which translates directly into higher quality output per token spent.
Internally, each specialized agent follows what the research community calls the ReAct paradigm: a cycle of Reasoning (the model analyzes its inputs and forms a hypothesis), Action (invokes a tool or delegates to another agent), and Observation (captures the result and feeds it into the next reasoning step). This isn't a novel concept, but frameworks like ADK have formalized it with structured abstractions that handle the transitions between stages. What matters for practitioners is that each agent's ReAct loop operates within its narrow scope. The architecture agent reasons about technology choices, not business requirements. The coding agent reasons about implementation, not architecture. Scope containment keeps each reasoning cycle focused and productive.
The interoperability question is also maturing rapidly. Two emerging open protocols are worth understanding. The Model Context Protocol (MCP) standardizes how agents connect to external data sources and tools, functioning as a universal adapter layer that eliminates custom point-to-point integrations. The Agent-to-Agent protocol (A2A) enables agents built on different frameworks to discover each other's capabilities and coordinate work through structured task requests. Together, these protocols mean that the specialized agents in your development factory don't have to live inside a single vendor ecosystem. Your requirements agent could run on one framework, your coding agent on another, and they can still collaborate through standardized interfaces. For enterprise teams, this matters because it prevents vendor lock-in and allows you to choose the best tool for each specific role.
Modern agent platforms have formalized this through what they call 'skills': modular instruction packages (typically SKILL.md files) that encode domain-specific expertise for a specific type of work. Each specialized agent is effectively a skill definition: a bounded set of instructions, templates, reference materials, and evaluation criteria. Skills are portable across projects, versionable, and (critically) testable independently of the broader system.
This article is from The Agentic SDLC by Carlos Aggio.