21 The Reference Architecture, Earned

Composition is recursive. Almost everything else in this part is a consequence of that.

Pull request #4711 on the payments service is a twelve-file diff that touches payments/charge.py and bumps stripe-python from 7.2 to 7.4. Maya labels it needs-review and switches windows.

Behind the label, the team’s review Skill has loaded. It fans out to six Personas in their own subagent threads — Tech Lead, Senior Backend Engineer, Security Reviewer, Accessibility Reviewer, Documentation Editor, Product Reviewer — each reading the diff against its own rubric and its own slice of context. The Security Reviewer hits the version bump, treats it as a class of change its rubric does not cover in line, and dispatches a CVE Triage Skill in a fresh thread. CVE Triage loads its own pair of Personas — Threat Modeller, Compliance Check — against the advisory feed and the org’s allow-list, returns a verdict, persists its plan, and exits.

By the time Maya’s notification fires, three Skills have run, eight Personas have weighed in, and a Tech Lead synthesiser has turned the reasoning traces into a single comment on the PR. She reads one verdict. The audit trail underneath it is reconstructible from a lockfile.

That is the architecture this part has been building.

21.1 Composition is recursive

The five-layer picture from Chapter 4 (Figure 4.2) names the vocabulary. It does not yet name the property of the system that carries the most weight at scale, which is this: the same Skill–Persona–Context triplet repeats at every depth. A Skill dispatches Personas. A Persona, in its own thread, may dispatch further Skills. Each of those Skills may dispatch further Personas. One composition rule, applied many times, is the whole shape.

The figure below is that rule applied once, with Maya’s PR substituted for the variables. The team’s Agentic SDLC sits at the top of the frame; the Review phase is the cell the team is currently in, and that phase triggers the top-level Skill, payments-review v2.3.0. From the Skill downward, the recursion takes over — Personas, then Context, then (via the subagent thread from Security) the same shape one frame deeper, with CVE Triage v1.4.2 and its own Personas and Context — all of it still inside the same Review phase. The substrate row at the foot of the frame names the three layers of Figure 4.2 that carry this structure but are not the subject of this chapter. It is the bridge to the manifest below, because the manifest is what makes the recursion declarable on disk.

%%{init: {'theme':'base','themeVariables':{'fontSize':'16px','fontFamily':'Helvetica, Arial, sans-serif'}}}%%
block-beta
  columns 3

  L0H["AGENTIC SDLC — the team's lifecycle"]:3
  SPEC["Spec"] PLAN["Plan"] BUILD["Build"]
  TEST["Test"] REVIEW["Review"] REFINE["Refine"]
  RELEASE["Release"] sA["&nbsp;"] sB["&nbsp;"]

  GAP0["▼ Review triggers"]:3

  D1H["DEPTH 1 — payments-review v2.3.0
(SKILL.md · PR #4711)"]:3

  L2HA["PERSONA POOL — 6 .agent.md files"]:3
  P_TL["Tech Lead"] P_BE["Backend"] P_SR["Security"]
  P_AX["A11y"] P_DE["Docs"] P_PR["Product"]

  L3HA["CONTEXT POOL — 6 sources"]:3
  C_DIFF["Diff (git)"] C_CODE["Code"] C_POL["Policy"]
  C_JIRA["Jira (MCP)"] C_RFC["RFC (web)"] C_FIG["Figma"]

  GAP1["&nbsp;"]:3

  D2H["DEPTH 2 — CVE Triage v1.4.2
(SKILL.md · same shape, deeper · still inside Review)"]:3

  L2HB["PERSONA POOL — 2 .agent.md files"]:3
  P_TM["Threat Modeller"] P_CC["Compliance"] sC["&nbsp;"]

  L3HB["CONTEXT POOL — 2 sources"]:3
  C_FEED["CVE feed (web)"] C_AL["Allow-list (files)"] sD["&nbsp;"]

  GAP2["&nbsp;"]:3

  SUB["SUBSTRATE (Ch. 4) — lower three layers
Agent Harness · Governance · Platform"]:3

  P_SR --"subagent thread"--> D2H

  classDef bandlabel fill:#ffffff,stroke:#cfcfcf,color:#444,font-weight:bold
  classDef depthlabel fill:#eaeaea,stroke:#666,color:#000,font-weight:bold
  classDef lifecycle fill:#eef6ff,stroke:#3a6ea5,color:#000
  classDef persona   fill:#fff8e0,stroke:#a8862c,color:#000
  classDef context   fill:#eef9ee,stroke:#3a8a3a,color:#000
  classDef dim       fill:#f4f4f4,stroke:#bdbdbd,color:#6b6b6b,stroke-dasharray:3 3
  classDef gap       fill:#ffffff,stroke:#ffffff,color:#ffffff
  classDef arrowhint fill:#ffffff,stroke:#ffffff,color:#3a6ea5,font-style:italic
  classDef substrate fill:#e4e4ea,stroke:#7d7d86,color:#3a3a3a

  class D1H,D2H depthlabel
  class L0H,L2HA,L3HA,L2HB,L3HB bandlabel
  class REVIEW lifecycle
  class SPEC,PLAN,BUILD,TEST,REFINE,RELEASE dim
  class P_TL,P_BE,P_SR,P_AX,P_DE,P_PR,P_TM,P_CC persona
  class C_DIFF,C_CODE,C_POL,C_JIRA,C_RFC,C_FIG,C_FEED,C_AL context
  class sA,sB,sC,sD gap
  class GAP1,GAP2 gap
  class GAP0 arrowhint
  class SUB substrate

Figure 21.1: Maya’s PR #4711. The team’s Agentic SDLC sits at the top; the Review phase triggers the payments-review Skill, which dispatches Personas against Context. The same Skill→Persona→Context shape repeats at depth two when the Security Reviewer dispatches CVE Triage inside the same Review phase.

The figure opens one cell of Chapter 4’s five-layer picture. The Agentic SDLC band at the top is SDLC Phases, with Review highlighted as the cell Maya’s example occupies. Everything below the ▼ Review triggers arrow — the Skill, its Personas, its Context, and the depth-two repetition — is what Context & Capabilities looks like once a phase has fired. The substrate row at the foot is the rest of the stack at work: the version pins (v2.3.0, v1.4.2) are Governance & Distribution doing its job; the subagent thread edge from Security Reviewer to CVE Triage is the Agent Harness doing its; every file behind every band lives on the Platform. The next snippet is how the cell is declared on disk. A Skill bundle’s manifest names the nested Skills it depends on the same way any other package names the packages it depends on:

# packages/payments-review/apm.yml
name: payments-review
version: 2.3.0
description: Multi-lens code review for the payments service.
author: payments-platform
dependencies:
  apm:
    - org/cve-triage#v1.4.2
    - org/perf-review#v0.9.1

The Personas live as .agent.md files inside the bundle; the nested Skills are real package edges in dependencies.apm, resolved by apm install and pinned in apm.lock.yaml like any other dependency. The graph that results has the same shape a code dependency graph has — a directed graph of well-typed primitives, resolved against a manifest, pinned by a lockfile, owned via CODEOWNERS. The same package manager that handles one level of composition handles seven. There is no separate orchestration product to procure, no separate runtime to operate, no separate review pipeline to staff. The substrate that carries one Skill carries the tree.

That is what lets a small primitive vocabulary scale to enterprise composition. The Review Skill in Maya’s example, the CVE Triage Skill it nested, and the Compliance Check Skill that CVE Triage might nest in turn are not three different kinds of object. They are the same object at three different depths. The harness loads them the same way; the registry resolves them the same way; the lockfile pins them the same way; the pull request that changes one of them looks like the pull request that changes any of them.

The architect’s instinct to “treat it like the codebase” is right because the dependency graph is the codebase’s shape. The operational disciplines you already have — semver, lockfiles, content hashing, codeowners, deprecation windows — port across without modification. Recursion is the architecture’s only structural move; almost everything else this part has named is a consequence of it.

21.2 What makes the recursion governable

Recursion without a bound is a generator for unbounded compute and an unauditable trail. Two conventions, applied at every level, convert it from a hazard into an architectural property.

First, every Skill carries an eval as discipline. A Skill that cannot say what its output looks like and how to tell whether it is correct is not a Skill; it is a prompt that has been mistaken for one. The eval is the parent’s stop condition. Without it, the parent has to either re-dispatch (paying again) or trust without checking (paying later). With it, the parent reads the artefact and moves on. The eval lives alongside the bundle today — as test inputs, expected outputs, and a runner the bundle author maintains — not yet as a manifest field; the discipline is what matters, and the schema will catch up.

Second, every dispatch persists its plan. The subagent thread writes its working plan to a file the parent can read. When the Security Reviewer in Maya’s PR dispatched CVE Triage, the triage thread did not return only its verdict; it returned a verdict and a plan trace, persisted alongside the run. The parent reads the verdict; the auditor — six months later, with a regulator on the line — reads the plan.

Together, the two conventions bound the recursion in space and in time. The eval bounds it in space: each level returns one well-typed artefact. Plan persistence bounds it in time: each level is reconstructible after the fact. What you get back from a dispatch is not “an agent did something”; it is a Skill, of a known version, against a known Context, returning an artefact that passed a known eval, with the trace on disk.

A Skill that fails eval-and-plan is a prompt with ambitions.

21.3 A Panel, walked end to end

Walk Maya’s PR through the architecture step by step.

Trigger. Maya labels PR #4711 needs-review. A label-watcher in CI picks up the label, checks out the diff at the head commit, and invokes the harness against the Review Skill bundle resolved as org/payments-review#v2.3.0 in the project lockfile. Twelve files changed, including payments/charge.py (logic) and requirements.txt (stripe-python 7.2 → 7.4).
Load. The review Skill’s manifest declares six Persona dependencies and a Context block: the diff via git, the codebase as files, the linked Jira ticket via the Jira MCP server, the team’s security policies as files in the policy repo, the originating RFC fetched from the wiki, and any Figma frames linked from the PR description loaded as multi-modal artefacts.
Fan out. The harness — Copilot CLI, Claude Code, Cursor, or whichever runtime the team has set up — reads the compiled context, sees the Review Skill’s six Persona dependencies, and opens six subagent threads. There is no user invocation here: the harness infers the dispatch from the loaded Skill the way a build tool infers compile steps from a manifest. Each thread receives the diff, its own slice of context, and its own context budget. The lenses cannot contaminate each other because they cannot see each other.
Recursion fires. The Security Reviewer’s rubric covers code-level concerns in line, but treats third-party dependency changes as a class of change requiring its own Skill. On encountering the stripe-python bump, it dispatches org/cve-triage#v1.4.2 in a fresh thread. CVE Triage loads two Personas (Threat Modeller, Compliance Check) against its own Context (the package’s advisory feed, the project SBOM, the org allow-list). It returns:

CVE Triage trace — stripe-python 7.2 → 7.4
Advisories in range: 1 (CVE-2024-XXXX, severity moderate, fixed in 7.4.1).
Allow-list status: package present at any version ≥ 7.0.
Verdict: ALLOW with pin to 7.4.1.
Plan persisted: runs/2025-03-04/cve-triage-4711.md.
Verdicts return. Each of the six Personas writes a one-line verdict and a longer reasoning trace:
- Tech Lead: approve, modulo synthesis.
- Senior Backend Engineer: approve; flag charge.py:142 as a candidate for extraction in a follow-up.
- Security Reviewer: approve with pin to stripe-python==7.4.1; CVE-2024-XXXX trace attached.
- Accessibility Reviewer: not applicable; no UI surface touched.
- Documentation Editor: request changes; docs/api.md still references the 7.2 retry behaviour.
- Product Reviewer: approve; matches RFC-217 acceptance criteria.
Synthesise. A Tech Lead Persona acting as synthesiser receives the six verdicts and the CVE trace. It is structured to preserve dissent rather than average it: any “request changes” survives unless explicitly overridden by a higher-priority lens. The Documentation Editor’s request survives. The Security Reviewer’s pin survives. The synthesiser composes:
Verdict: REQUEST CHANGES. Two follow-ups before merge:
1. Pin stripe-python==7.4.1 (Security; CVE-2024-XXXX trace attached).
2. Update docs/api.md retry-behaviour note for the 7.4 surface (Docs).
  Otherwise: approved against RFC-217. Recommend filing the charge.py:142 extraction as a separate ticket.

Maya reads one comment, acts on it. She accepts the stripe-python==7.4.1 pin into the diff, opens DOCS-892 against docs/api.md, and re-labels the PR needs-review to fire a second pass. A typical persisted-trail layout looks like:

runs/2025-03-04/4711/
├── plan.md
├── personas/
│   ├── tech-lead.md
│   ├── senior-backend.md
│   ├── security-reviewer.md
│   ├── accessibility-reviewer.md
│   ├── doc-editor.md
│   └── product-reviewer.md
├── nested/
│   └── cve-triage-stripe-python-7.4.md
├── synthesis.md
└── lockfile-snapshot.apm.lock.yaml

Six months later, an auditor reading the lockfile and the persisted plans can reconstruct the entire decision tree without re-running anything.

The Tech Lead synthesiser never saw the threat-modelling thread; it saw the Security Reviewer’s verdict with the CVE trace attached as ground truth. Maya never saw any of the eight Persona threads; she saw one verdict.

The recursion is invisible from above and inspectable from below.

21.4 What this changes for the architect

Three implications worth surfacing.

MCP is one access mechanism, not the architecture. The Model Context Protocol — an open standard stewarded by LF Projects under the Linux Foundation — sits in the Context half of the triplet, alongside files, CLI invocations, fetched URLs, and multi-modal artefacts. It is genuinely useful: a Persona can read a Jira instance, an internal API, or a SharePoint corpus through an MCP server without knowing the transport details. But an MCP catalogue is not a substitute for a Skill registry, an eval discipline, or a lockfile. Treating MCP as the architecture rather than as one mechanism is the most common shape of mistake an architect new to this substrate makes; it is worth naming on first contact.

The patterns you already know apply. A Skill is a Module with a Facade — one entry point, one stable contract, swappable internals. A Persona is a Strategy — same input shape, different evaluation policy. A harness orchestrating a Panel is a Mediator with a Scatter-Gather distribution. The substrate is new; the patterns are not. Chapter 18 (Section 18.3) carries the full Rosetta and the four-layer reconciliation.

The decisions worth making this week are about your registry and your CODEOWNERS, not your agent vendor. A team that has installed the supply chain can change its mind about harnesses, frameworks, and cloud providers without re-authoring its primitive set; a team that has not, cannot. Chapter 22 (Chapter 26) carries the five-day plan.

TL;DR — One rule, applied recursively

Composition is recursive. The same Skill–Persona–Context triplet repeats at every depth; the dependency graph has the same shape a code dependency graph has.
Eval and plan-persistence are what make it governable. The eval bounds the recursion in space; persisted plans bound it in time. A Skill missing either is a prompt with ambitions.
Each level reads the level below as one primitive. The synthesiser sees one verdict; the auditor reads the trail underneath. Invisible from above, inspectable from below.
MCP is one access mechanism, not the architecture. It lives in the Context half of the triplet alongside files, CLI invocations, and fetched URLs. Treating it as the substrate is the most common first-contact mistake.
Your existing instincts are the right instincts. Modules, strategies, mediators, lockfiles, CODEOWNERS — the substrate is new; the disciplines port across without modification.