4 The Agentic SDLC Reference Architecture

What to standardise, what to leave to your teams, and why the tool you pick at the runtime layer matters less than the supply chain underneath it.

4.1 The wrong unit of architecture

You are sitting through your fifth agent-platform pitch this quarter. The decks are interchangeable. Each vendor draws their own product at the centre of the same ring of capabilities — a planner here, a code-assistant there, a review bot, a deploy companion, an incident assistant — and each promises that adopting the suite will end the sprawl. Your inbox already has three previous decks making the same promise about three earlier suites. Your platform engineering lead has flagged that two of the pilots from last year are now running in parallel against overlapping budgets. The procurement instinct is to pick a winner and consolidate. The architectural instinct says you have seen this movie before.

The seven-tool sprawl is not a procurement mistake to be corrected by better procurement; it is an architectural mistake about what an enterprise should standardise. Every vendor in those decks is solving the visible problem — pick the most capable agent for one phase of the lifecycle — and ignoring the invisible one: the output of one phase has to become the input of the next phase, across teams, across business units, across vendors, for years. Tools change on a quarterly cadence. The architecture that lets the work compound across tools changes on a decade cadence. Standardising on a tool when you should be standardising on the architecture below the tool is how organisations end up with the same problem in a more expensive form, eighteen months later.

The argument of this chapter is that the unit worth standardising on is not the agent and not the platform but the layered supply chain that connects them. Once that supply chain is named, several of the questions that have looked like procurement debates resolve into engineering decisions your platform team is already qualified to make.

By the end of this chapter you will know three things: the Three-Layer model that explains who participates in the agentic SDLC; the Five-Layer landscape that explains what supply chain you architect across them once your organisation runs more than one team using AI capabilities; and the three decisions you owe your CTO this quarter — which lifecycle phase to start with, where to invest before agents arrive, and what posture to take on build, buy, or compose.

4.2 The Three Layers

At every phase of the software development lifecycle, work happens across three distinct participant layers. Understanding them is the difference between a coherent AI strategy and a collection of disconnected tool purchases.

The Human Layer is where judgment, accountability, and strategic decisions live. Humans set objectives, make architectural choices, define quality standards, and bear responsibility for what ships. No current AI system replaces these functions. The question is not whether humans remain in the loop (they do) but which decisions require human judgment and which are better delegated.

The Agent Layer is where AI capabilities execute within defined boundaries. Agents generate code, produce reviews, draft tests, surface patterns in data, and automate repetitive cognitive work. They operate with varying degrees of autonomy, from passive suggestion to autonomous task execution, but always within constraints set by the Human Layer. The PROSE framework from Chapter 1 defines those constraints: Progressive Disclosure determines what context agents receive, Safety Boundaries determine what they can do with it.

The Platform Layer is the infrastructure that enables both humans and agents: source control, CI/CD pipelines, identity and access management, observability, artifact registries, and the APIs that connect them. This layer is often invisible in AI adoption conversations, which is precisely why adoption stalls. An agent that can generate code but cannot run tests, read build output, or access your dependency graph is an agent working blind.

The layers are not independent. They form a stack where each depends on the one below it:

flowchart LR
    subgraph HUMAN ["HUMAN LAYER"]
        direction LR
        HR["Roles: Product, Architecture,<br/>Engineering, QA, Operations"]
        HF["Functions: Decisions,<br/>Governance, Accountability"]
    end

    subgraph AGENT ["AGENT LAYER"]
        direction LR
        AC["Capabilities: Generate,<br/>Analyze, Test, Review"]
        AB["Boundaries: Scoped authority,<br/>context-dependent, auditable"]
    end

    subgraph PLATFORM ["PLATFORM LAYER"]
        direction LR
        PI["Infrastructure: SCM, CI/CD,<br/>Auth, Observability"]
        PN["Integrations: APIs, Webhooks,<br/>Context Sources, Artifacts"]
    end

    HUMAN -- "sets constraints,<br/>delegates tasks" --> AGENT
    AGENT -- "escalations,<br/>approvals, results" --> HUMAN
    AGENT -- "invokes tools,<br/>reads context" --> PLATFORM
    PLATFORM -- "build results,<br/>test output, telemetry" --> AGENT

Figure 4.1: The three-layer reference architecture: Human, Agent, Platform

This structure is not a proposal. It is a description of what already exists in any organization using AI coding tools — whether they have designed it deliberately or not. The agent in your developer’s editor is already operating across these three layers. The question is whether the boundaries, the context flow, and the governance are intentional.

4.3 Mapping the Layers Across the Lifecycle

The three layers apply at every phase of software delivery. The table below maps what each layer does in each phase and, critically, which capabilities are available today versus emerging or directional.

Maturity tiers: Now = available across two or more vendors. Emerging = available in one or two tools, or in limited preview. Directional = announced, demonstrated, or on public roadmaps but not production-ready.

	Ideate	Plan	Code	Build	Test	Review	Release	Operate
	INTENT		BUILD				OPERATE
Human	Set objectives and scope	Make architecture choices	Review agent output	Own build config	Define test policy	Final code sign-off	Go/no-go decision	Own incident response
Agent	Research prior art, surface conflicts	Draft ADRs, decompose tasks	Multi-file code generation	Diagnose build failures	Generate tests, find coverage gaps	Automated review, catch defects	Draft changelogs, flag breaking changes	Correlate alerts, suggest actions
Platform	Knowledge bases, collaboration tools	Issue trackers, project management	IDE, SCM, context APIs	CI/CD pipelines, dependency management	Test frameworks, infrastructure	Pull request APIs, policy engines	Deployment pipelines, gates	Monitoring, alerting, log systems
Maturity	Emerging	Emerging	Now	Now	Emerging	Now	Emerging	Directional

Executives do not need to think in eight phases. They need three buckets that map to planning cadences, budget lines, and organizational accountability:

Intent (Ideate + Plan) answers “what are we building and why?” Agent assistance here is mostly emerging. Research agents that surface prior art, planning agents that draft architecture decision records and decompose epics into tasks. These exist in early forms, but no tool reliably automates the judgment calls that make planning valuable.

Build (Code + Build + Test + Review) answers “how do we turn intent into verified software?” This is where agent capabilities are most mature. Code generation, build diagnostics, test generation, and automated code review all have production-ready implementations across multiple vendors. This is also where most organizations start, and where the Vibe Coding Cliff from Chapter 1 hits hardest if context is not structured.

Operate (Release + Operate) answers “how do we get software to users and keep it running?” Agent assistance in release management is emerging; in incident response, it is directional. Correlating alerts to recent deployments, drafting incident timelines, suggesting rollback actions — these capabilities exist today as point solutions inside specialised observability and incident-management products, but not yet in workflows integrated end-to-end with the development lifecycle.

The practical implication: based on vendor maturity and published adoption patterns, most organizations appear to have concentrated investment in the Build bucket, with minimal coverage in Intent and almost none in Operate. This is not a failure. It reflects where the technology is mature. But it means the next high-value investments are in Plan, Test, and Review — where the work is expensive, the feedback loops are slow, and structured context makes the difference between useful automation and expensive noise.

4.3.1 Three-Tier Honesty

Vendor presentations tend to show the full architecture as if it were all available today. It is not. Presenting the vision as current reality is what vendor whitepapers do. This book tags every capability honestly.

Phase	Agent capability	Maturity	Justification
Ideate	Research synthesis, prior-art surfacing	Emerging	Available in conversational tools; no reliable autonomous implementation
Plan	ADR drafting, task decomposition, estimation	Emerging	Early implementations exist across multiple coding-assistant vendors; accuracy varies significantly
Code	Multi-file generation, refactoring, boilerplate	Now	Production-ready across multiple coding-assistant vendors
Build	Build failure diagnosis, dependency resolution	Now	CI integration available in multiple tools; quality depends on structured error output
Test	Test generation, coverage gap analysis	Emerging	Generation works; strategic test design still requires human judgment
Review	Automated code review, defect detection	Now	Shipping in multiple code-review assistants; effectiveness depends on documented standards
Release	Changelog drafting, breaking-change detection	Emerging	Partial implementations in CI tools; end-to-end release automation is not production-ready
Operate	Alert correlation, incident timeline drafting	Directional	Research demos and early integrations; no vendor ships reliable autonomous incident response

The pattern is clear. Build-phase capabilities are mature. Intent-phase and Operate-phase capabilities are early. If a vendor tells you they have end-to-end lifecycle automation today, ask which cells in this table they would tag as Now, and how they define the term. The honest answer will tell you more about the vendor than any feature demo.

4.4 The Five-Layer Landscape: the AI-capability supply chain

The previous section told you who participates in the agentic SDLC. This section tells you what supply chain you architect across them once your organisation runs more than one team using AI capabilities. The Three-Layer model is fine for one team and one tool; the moment a second team adopts a different runtime against the same source-control platform, against the same identity provider, with overlapping but non-identical knowledge bundles, you have a supply-chain problem.

flowchart BT
    L1["<b>Platform</b><br/>Cloud, source control,<br/>identity, audit substrate"]
    L2["<b>Context &amp; Capabilities</b><br/>The AI capabilities your<br/>organisation authors and reuses"]
    L3["<b>Governance and Distribution</b><br/>agent dependencies, registry,<br/>package manager, ownership"]
    L4["<b>Agent Harness</b><br/>The runtime your developers<br/>and business users run"]
    L5["<b>SDLC phases</b><br/>Plan, Spec, Build, Review,<br/>Test, Deploy, Operate"]

    L1 -- "hosts" --> L2
    L2 -- "declared into" --> L3
    L3 -- "resolves and ships" --> L4
    L4 -- "executes as" --> L5

Figure 4.2: The Five-Layer landscape, drawn as a supply chain. Each edge is the relationship the lower layer carries to the layer above. Platform at the bottom; SDLC phases at the top.

Read the diagram bottom-to-top. Platform is the cloud, source-control, identity, and audit substrate your platform team already operates. There is nothing agent-specific about it. Context & Capabilities is where the AI capabilities your organisation authors and reuses live — the procedures (how to review a pull request, how to draft an architecture decision record), the lenses (security reviewer, accessibility reviewer), and the knowledge bundles your teams ground agents against, all expressed as text, version-controlled with the rest of your code. Governance and Distribution is the catalogue and approval rules that ship those capabilities between teams: the declared agent dependencies, the internal registry the platform team operates, the package manager that resolves and installs, and the ownership rules that say who is allowed to approve a change. Agent Harness (an “agent runtime”) is the runtime application your developers — and increasingly your business users — actually run. Different vendors, same load contract. SDLC phases is the application output: the runtime executes as a Plan step, a Spec step, a Build step, a Review step.

This is a supply chain, not a stack. The artefact that flows up the layers is the AI capability itself, behaving the way a software dependency behaves in the supply chain you already operate: declared by a consuming team, resolved against a registry, installed at a known version into a runtime, and replaced when a new version is published. The reason the cold open’s seven-tool sprawl does not compound is that today, in most organisations, this supply chain does not exist. Every team’s capabilities live inside their chosen runtime and travel only within that runtime’s vendor boundary. The supply chain is what your enterprise has to architect for capabilities to compound across teams, across runtimes, and across the second and third year of the programme.

The platform-governance plane your enterprise already bought — the identity and data-governance controls your security and compliance functions have standardised on, regardless of vendor — covers the Platform layer cleanly. It is necessary but not sufficient. It does not cover layers two and three, because those layers concern a class of artefact that did not exist when those controls were procured: an AI capability authored by a team, declared as a dependency by a consuming team, resolved against a catalogue, and loaded into a runtime. That is the new investment, and it decides whether the supply chain is one you operate or one each vendor operates inside their own boundary.

This is also where the audit conversation lands. When a regulator or an internal auditor asks which AI capabilities ran in production last quarter and at what versions, the answer comes from the third layer. The auditable record of which AI capabilities ran in production, at what version, and with whose approval, lives at the Governance and Distribution layer — not at the runtime, which can change quarterly, and not at the platform, which knows about identity and access but not about which procedure an agent executed. That is the sentence to put on a slide for the next conversation with your CISO.

This chapter draws the architecture; it does not yet draw the mechanics. How a capability is declared, how versions are pinned, how the registry resolves transitive dependencies, how a runtime materialises a bundle into the directory layout it parses — all of that belongs to the practitioner block, where the architectural mechanics of how the supply chain composes are catalogued in detail. The governance overlay that sits on top of the Five-Layer landscape — who approves which capabilities, what evidence is captured at each step, how risk tiers map to release controls — is the subject of the next chapter.¹

4.5 What Changes About Roles

The three-layer model clarifies what happens to human roles when agents enter the lifecycle. The answer is not that roles disappear. The answer is that the proportion of activities within each role shifts.

Human role	What stays human	What agents handle	What shifts
Product Manager	Strategic prioritization, stakeholder alignment, go/no-go decisions	Research synthesis, competitive analysis, requirement drafting from rough notes	More time on judgment, less on information gathering
Architect	System design decisions, technology selection, cross-team coordination	ADR drafting, dependency analysis, pattern detection across codebases	More time on review, less on documentation
Developer	Code review, architectural compliance, complex problem-solving	Routine implementation, boilerplate, test generation, refactoring	More time specifying intent, less time typing code
QA Engineer	Test strategy, edge case identification, exploratory testing	Test generation, coverage analysis, regression detection	More time on test design, less on test writing
SRE / Ops	Incident ownership, capacity planning, reliability decisions	Alert correlation, runbook execution, incident timeline drafting	More time on system understanding, less on routine response

The pattern across every row: agents absorb the mechanical and information-processing work, while humans focus on the judgment, strategy, and accountability work. This is not a temporary state. It reflects the fundamental properties of language models described in Chapter 1: they process and generate; they do not decide or bear responsibility. The compounding business case for accumulating organisation-specific context — what the chapter on the business case calls the context moat — sits underneath every row of this table.

The table accounts for the existing roles. The Five-Layer landscape also implies new ones. Authoring procedures and lenses at layer two is the work of a Domain Specialist, often paired with an Agentic Workflow Engineer when the procedure has non-trivial composition. Operating the third-layer registry, the package-manager flow, and the ownership rules is the work of an Agent Operations Specialist who, in most organisations, sits inside the platform team that already runs the source-control system and the existing software-package registries.

Which leaves the talent question every leader asks at this point: extend the platform team you have, or stand up a new function? The defensible answer for most organisations is: extend. Layers one, three, and most of four describe work the platform team is already qualified for — they already operate registries, ownership conventions, and runtime catalogues for software packages, and the agentic supply chain is mechanically the same shape. Layer two is where new authoring talent joins the picture, and that is the place to invest in net-new capability rather than reorganisation. The chapter on team structures works the answer in detail, including the org-design implications of making this call wrong in either direction.

4.6 The Architecture Decision Matrix

The reference architecture is an adoption map, not a prerequisite checklist. Any phase can run as a single agent-assisted loop — or expand into governed, multi-team workflows as maturity grows. The question for leaders is where to start and how to expand.

The matrix below maps adoption decisions across two dimensions: the lifecycle phase and the investment required. Use it to scope your first pilot and plan the expansion path.

Phase	Start here if…	First investment	Maturity prerequisite	Expected timeline
Code	Your developers already use AI tools	Custom instructions encoding your conventions	Linter, test suite, CI pipeline	2-4 weeks
Review	PR review is a bottleneck	Agent-assisted review with human sign-off	Documented quality standards, clear review criteria	4-8 weeks
Test	Test coverage is low or tests are brittle	Agent-generated tests with human-defined strategy	Test framework, coverage tooling, defined test policy	4-8 weeks
Plan	Planning is slow and produces inconsistent artifacts	ADR templates, specification structures for agent drafting	Issue tracker, documented architecture decisions	8-12 weeks
Build	CI failures consume significant developer time	Agent-assisted build diagnostics and fix suggestions	CI/CD pipeline with structured error output	4-8 weeks
Release	Release process is manual and error-prone	Agent-drafted changelogs and breaking-change detection	Semantic versioning, structured commit history	8-12 weeks
Ideate	Research and discovery are ad hoc	Agent-assisted research synthesis and prior-art surfacing	Knowledge base, searchable decision history	12-18 weeks
Operate	Incident response is slow to diagnose	Agent-assisted alert correlation and timeline drafting	Observability stack, structured runbooks	12-18 weeks

Three observations from the matrix:

Start where the tooling is mature and the payoff is immediate. Code, Review, and Test have production-ready agent capabilities across multiple vendors. They are also the phases where structured context produces the most measurable improvement. Most organizations should start here.

Invest in context before investing in agents. Every row in the matrix lists a maturity prerequisite — and most of those prerequisites are documentation, structure, and tooling that should exist regardless of AI adoption. If your conventions are not documented, your tests are not reliable, and your CI pipeline does not produce structured output, no agent tool will compensate. Fix the foundation first. The architectural mechanics of how that context is then declared, packaged, and shipped between teams are the subject of the practitioner block.

Expand based on evidence, not ambition. Move to the next phase when the current one is producing measurable results — faster reviews, fewer convention violations, higher test coverage. Do not expand because a vendor demo looked impressive. The metrics chapter provides the indicators that distinguish real improvement from optimistic interpretation.

4.7 Build, Buy, or Compose

For each context domain, leaders face a decision: build internal tooling, buy from a platform vendor, or compose solutions from multiple sources. The shape of the decision is the one every CIO recognises from cloud transformation a decade ago — compose buys you optionality at the cost of integration burden; buy gets you fast at the cost of long-term lock-in to one vendor’s roadmap; build gets you fit at the cost of headcount. The novelty is not the decision; it is the layer at which it now has to be made.

Context domain	Build	Buy	Compose
Work context	Internal knowledge base, custom decision-record tooling	Internal-knowledge-base SaaS, collaboration-platform wikis	API connectors bridging collaboration tools to agent context
Data context	Custom domain model documentation, internal domain taxonomies	Data catalog platforms, governed metadata services	Federation layers that expose data definitions to coding agents
Code context	Internal capability authoring, custom rules, agent configurations	Runtime-vendor integrated context surfaces	Open-source community-shared capability bundles, registry-distributed configurations

The pattern: work and data context often require building or composing because they are organization-specific. Code context is the most composable because the formats are increasingly standardised across the runtime category, and community-shared configurations distributed through the third-layer registry can provide a starting point that teams customise. The Build column is heaviest where the asset is proprietary by definition; the Buy column is heaviest where the substrate is generic (a wiki is a wiki); the Compose column is heaviest at the code-context layer, which is precisely where the third-layer registry is doing the work the platform team understands.

The framing is the one cloud transformation taught the field. No single vendor covers the AI-capability supply chain end-to-end today, and the organisations that learned the cloud lesson the hard way know that “single throat to choke” is a procurement comfort that becomes an architectural cost over the second and third year. A vendor that owns layers one, three, and four of your supply chain owns the cadence at which you can change runtimes, the cost at which you can move capabilities between teams, and the leverage they hold in the next renewal. Plan for a composed solution. Evaluate vendors on how their pieces fit your supply chain, not on whether they claim to cover everything.

4.8 Start Anywhere, Expand Deliberately

The architecture presented in this chapter is designed to be adopted incrementally. There is no prerequisite checklist that must be completed before you begin. The following thresholds are starting points based on patterns observed across early-adopter teams; calibrate them to your baseline. The most common — and most effective — starting point:

Month 1. Pick one team, one phase (usually Code), and one investment (custom instructions encoding your top five conventions). Measure agent output quality before and after. Gate to expand: agent-generated code passes linting on first attempt 70% or more of the time, and the team has documented at least five conventions in machine-readable form.

Month 3. Extend to Review. Add agent-assisted code review with human sign-off on every PR. Measure review turnaround time and defect escape rate. Gate to expand: agent-assisted PRs achieve a review rejection rate no worse than the team’s human-only baseline, and median review turnaround time has decreased by 15% or more.

Month 6. Add Test. Use agents to generate test cases for new features, with human-defined test strategy. Measure coverage change and test maintenance cost. Gate to expand: test coverage has increased by 10 percentage points or more on agent-covered modules, and agent-generated tests require human rework less than 30% of the time.

Month 12. Evaluate Plan and Build phases. By this point, your team has accumulated six months of structured context, and your agents are materially more effective than they were on day one — the compounding flywheel at work. Gate to expand: the team’s human intervention rate on agent tasks has declined by 20% or more from the Month 3 baseline, and at least two context feedback cycles have produced measurable improvement in agent output quality.

Month 18. Assess readiness for Operate phase automation. This requires the most mature infrastructure and the strongest governance — the next chapter covers the governance requirements in detail. Gate: the team has structured runbooks for 80% or more of common incident types, and agent-assisted alert correlation achieves 90% or more accuracy in retrospective testing against the past quarter’s incidents.

This is a planning horizon, not a schedule. Some organizations will move faster; some will spend longer at each stage. The sequence matters more than the timeline: start where the tooling is mature and the context is structured, expand where the evidence supports it, and invest in context continuously.

The pattern generalises beyond the SDLC. Software delivery is the canonical case in this book because the substrate, the tools, and the early evidence are most mature there, but the Five-Layer landscape describes a shape — a procedure-shaped piece of enterprise work, expressed as text, declared as a dependency, distributed through a registry, executed by a runtime — that fits any procedure-shaped enterprise function. Legal review, financial close, M&A diligence, marketing approvals, regulatory filings: each is a procedure your enterprise already runs, with its own specialists, quality bar, and audit obligations, and each will be re-architected through the same supply chain over the same horizon. This book itself was written with the assistance of agents working through a similar capability supply chain, and the agentic era has begun.

The reference architecture gives you a shared vocabulary for planning. The Five-Layer landscape gives you the supply chain to architect on top of it. But architecture without governance is a blueprint without building codes. The next chapter addresses the hardest question in AI-assisted delivery: who is accountable when agents are participants in your software lifecycle, and how do you build the trust frameworks that make autonomous work auditable and safe?

A reconciliation of the four-, five-, and seven-layer formulations practitioners encounter in the field — including the seven-layer agentic computing stack introduced in Chapter 1 — is provided in the appendix-grade Rosetta chapter at the end of this book.↩︎