flowchart LR
AUDIT["AUDIT"] --> PLAN["PLAN"]
PLAN --> WAVE["WAVE"]
WAVE --> VALIDATE["VALIDATE"]
VALIDATE -->|All waves done| SHIP["SHIP"]
VALIDATE -->|Next wave| WAVE
VALIDATE -->|Failure| ADAPT["ADAPT"]
ADAPT --> PLAN
13 The Execution Meta-Process
Here is what nobody tells you about using AI agents on a real codebase: the agent is the easy part. The hard part is knowing what to ask, in what order, and when to stop asking and start verifying.
Chapters 10 through 12 gave you the building blocks — PROSE constraints to design by (Chapter 10), context engineering to feed agents accurately (Chapter 11), and multi-agent orchestration to coordinate at scale (Chapter 12). This chapter puts them into motion. It describes the execution methodology that turns those building blocks into shipped code: how you move from “I need to change 75 files” to “the PR is merged with zero regressions” — and what happens at every decision point in between.
The methodology is a five-phase meta-process. It works regardless of which AI coding tool you use, because it operates at a level above any specific tool’s mechanics. The tool dispatches agents, runs tests, and manages files. You manage the process.
13.1 The Five Phases
Every large change follows these phases, in order:
Audit — understand the codebase from the perspectives that matter for this change. Plan — define scope, decompose into dependency-ordered waves, assign agent teams. Wave — execute each wave as a batch of parallel agents, one file per agent. Validate — test after every wave, spot-check critical changes, commit. Ship — final verification and merge.
The ADAPT loop connects wave execution back to planning. When a wave fails (a test breaks, an agent gets stuck, a dependency was missed), you diagnose, adjust the plan, and re-execute. The loop is not a sign of failure. It is the mechanism that makes the process resilient.
Each phase has a specific purpose, a specific human decision, and a specific output. What follows is the specification for each.
13.1.1 Phase 1: Audit
Purpose. Build a multi-perspective understanding of the code you are about to change. The audit surfaces what you know, what you’ve forgotten, and what you never knew.
How it works. You dispatch expert agents, typically 2 to 4 running in parallel, each with a distinct audit lens. An architecture expert examines architectural patterns, coupling, and separation of concerns. A domain expert examines the specific subsystem (logging conventions, auth flows, API contracts). A security expert checks for vulnerabilities. Each agent produces ranked findings with severity levels, exact file-and-line citations, and remediation guidance.
The human decision. You review the audit reports and decide what matters. Not every finding warrants action. Some are informational. Some are follow-up items for a different PR. The audit gives you the information; the plan reflects your judgment about which findings to act on now.
Key rule. Audits are read-only. The agents explore the codebase. They do not modify it. This separation is important: it means you can dispatch audit agents freely, without worrying about partial changes or file conflicts.
Enabling capabilities. Background agents with parallel dispatch and session isolation. Each audit agent runs in its own context window with read-only access to the codebase, so findings are independent and agents cannot interfere with each other.
Output. A set of prioritized findings with citations. This becomes the input to planning.
13.1.2 Phase 2: Plan
Purpose. Transform audit findings into an executable specification: what changes, in what order, by which agents, with what constraints.
How it works. You define the scope (what’s in, what’s out, what’s deferred), the agent teams (which personas own which concerns), and the wave structure (dependency-ordered batches of work).1 The plan includes principles — priority-ordered values that anchor every decision when trade-offs arise — and constraints — what must not change.
A well-structured plan looks like this:
Scope
Auth resolver deduplication, verbose coverage gaps,
CommandLogger migration, unicode cleanup.
Out of scope: New auth providers, CLI help text changes.
Teams
Architecture: python-architect leads.
Owns: type safety, separation of concerns, dead code.
Logging/UX: cli-logging-expert leads.
Owns: verbose coverage, CommandLogger, symbols.
Waves
Wave 0 (foundation): Protocol types, method moves — fully parallel.
Wave 1 (core): Verbose coverage — depends on Wave 0 APIs.
Wave 2 (migration): CommandLogger migration — depends on Wave 1 patterns.
Wave 3 (polish): Unicode cleanup — depends on Wave 2 completeness.
Principles (priority order)
1. SECURITY — no token leaks, no path traversal.
2. CORRECTNESS — tests pass, behavior preserved.
3. UX — world-class developer experience in every message.
4. KISS — simplest correct solution.
5. SHIP SPEED — favor shipping over perfection.
Constraints
Do NOT modify test infrastructure.
Do NOT change CLI command signatures.
Do NOT alter public API return types.The human decision. You approve the plan. This is the highest-leverage moment in the entire process.2 A mediocre plan with perfect execution produces mediocre software. A great plan with imperfect execution produces great software — because the test gates catch the imperfections. Take your time here. Review the wave dependencies. Question whether the scope is right. Ask whether the wave structure accounts for the files that will change in multiple phases.
Key rule. No implementation starts until you approve the plan. This is the single most important gate.
Enabling capabilities. Background exploration agents to validate planning assumptions: mapping dependency graphs, tracing call chains, verifying that the wave structure accounts for shared files. The planning itself is human judgment; the agents accelerate the information gathering that informs it.
Output. An approved plan with scope, teams, waves, principles, and constraints.
13.1.3 Phase 3: Wave Execution
Purpose. Execute the plan in dependency-ordered batches, with validation between each batch.
How it works. Each wave is a set of tasks with no unmet dependencies. The orchestrating tool dispatches parallel agents for each task in the wave, grouped so that no two agents edit the same file simultaneously. Each agent receives precise instructions: which files to change, what patterns to follow, what constraints to respect, and what verification to run before reporting completion. When all agents in a wave finish, the full test suite runs. If tests pass, the wave is committed. If tests fail, you triage.
The wave structure produces a clean commit history (one commit per wave) and guarantees that you can bisect regressions to a specific batch of changes.
The human decision. You approve each wave launch (or, if you trust the plan, authorize the full sequence). You triage any test failures. You intervene on escalation.
Key rule. Every wave ends with green tests and a commit. No exceptions.
Enabling capabilities. Parallel agent dispatch with session isolation: each agent works in a clean context window, receiving only the instructions and code relevant to its task. The orchestrating session coordinates dispatch, collects results, and triggers the test runner between waves. Integration with the project’s existing test suite (or CI pipeline) provides the gate.
Output. A commit per wave, with all tests passing.
13.1.4 Phase 4: Validate
Purpose. Confirm the final state before shipping.
How it works. The full test suite runs: unit tests, acceptance tests, and optionally integration or end-to-end tests. You spot-check critical changes: the files with the most complex modifications, the boundary conditions you specified in the plan, the areas where agents were most likely to make subtle errors.
The human decision. You decide whether the changes are ready to ship.
Enabling capabilities. The project’s CI pipeline or test runner — the same infrastructure your team already uses. No special tooling is needed for validation beyond what exists for normal development. The value comes from the process (test after every wave, spot-check critical files), not from additional tools.
Output. A validated changeset.
13.1.5 Phase 5: Ship
Purpose. Commit, push, and merge.
How it works. Update the changelog if it wasn’t updated during wave execution. Push the branch. If CI passes, merge. The commit history (one commit per wave, each with passing tests) makes the PR reviewable and bisectable.
Output. A merged PR.
13.2 Wave Decomposition
The wave structure is where planning becomes engineering. A poorly decomposed set of waves produces merge conflicts, stale context, and cascading failures. A well-decomposed set produces clean parallel execution with natural validation boundaries.
13.2.1 The Dependency Graph
Waves are ordered by dependency. Wave 0 contains tasks with no dependencies: foundational changes that other waves build on. Wave 1 contains tasks that depend on Wave 0 outputs. Wave 2 depends on Wave 1. The dependency is directional and strict: no task in wave N may depend on a task in wave N+1.
flowchart LR
W0["Wave 0"] -->|"No deps"| W1["Wave 1"]
W1 -->|"Wave 0 outputs"| W2["Wave 2"]
W2 -->|"Wave 1 stable"| W3["Wave 3"]
The most common dependency pattern is foundation-before-migration. Type definitions, protocol changes, and method signatures go in Wave 0. Code that uses those new interfaces goes in Wave 1+. If you put both in the same wave, agents will try to both define and consume new APIs simultaneously, and the consumer agents will work against a file state that doesn’t yet include the definitions.
13.2.2 The One-File-One-Agent Rule
The one-file-one-agent rule from Chapter 12 shapes wave design more than any other constraint. Within a wave, no two agents may edit the same file. If two logically independent changes both touch the same file, they go in separate waves, or they’re assigned to a single agent that handles both changes in sequence.
13.2.3 Sizing Waves
The size of a wave affects execution time and risk. Smaller waves (2 to 4 agents) complete faster and are easier to debug when something goes wrong. Larger waves (6 to 10 agents) have higher throughput but are dominated by the slowest agent, and a single failure in a large wave means triaging more changes.
| Factor | Smaller waves (2-4 agents) | Larger waves (6-10 agents) |
|---|---|---|
| Execution time | 3-5 minutes | 8-12 minutes (slowest agent dominates) |
| Debug difficulty | Low — few changes to inspect | High — more changes interacting |
| Commit granularity | Fine — easy to bisect | Coarse — harder to isolate regressions |
| Overhead | Higher — more validation cycles | Lower — fewer cycles |
The decision heuristic: start with smaller waves. Combine tasks into larger waves only when they are genuinely independent (different files, different concerns, no shared state) and when the validation overhead of extra cycles outweighs the debugging advantage.
13.2.4 The Self-Sufficiency Test
Before finalizing a wave, apply this test to each task: can an agent complete this task without asking me a question? If the answer is no — because the task depends on an ambiguous design decision, because the scope is unclear, because the target file has undocumented conventions — the task isn’t ready. Either refine the instructions, split the task, or move it to a later wave where its dependencies are resolved.
Tasks that fail the self-sufficiency test are the primary source of mid-wave escalations. Catching them during planning eliminates interruptions during execution.
13.3 PR #394: The Worked Example
The meta-process is abstract until you see it execute. This section summarizes a real PR — an auth and logging architecture overhaul on APM, an open-source agent package manager (the author’s implementation of the distribution layer described in Chapters 9 and 10). The full execution log — including 5 escalation events, 8 plan iterations, and anti-pattern mappings — is in The APM Auth + Logging Overhaul case study in Part IV.
Scope. 5 cross-cutting concerns (auth resolver deduplication, verbose logging coverage gaps, CommandLogger migration, unicode symbol cleanup, and test coverage) touching 75 files across the entire source tree. The canonical metrics — files, lines, test counts, agent dispatches, escalations — are in the APM Overhaul case study. This section focuses on how the meta-process phases played out.
A note on timing. The ~90-minute wave execution time reflects an experienced practitioner working with mature instrumentation: battle-tested instruction files, established personas, and conventions already externalized into codebase instrumentation (Chapter 9). Your first time through will take roughly 3×. The first run is an investment in infrastructure that makes every subsequent run faster.
13.3.1 Timeline
| Phase | Duration | Agents | Outcome |
|---|---|---|---|
| Audit | 3 min | 2 parallel (architecture + logging/UX) | Severity-ranked findings with file-line citations |
| Planning | 5 min | — | 8 iterations; all findings in scope; 2 teams defined |
| Wave 0 — Foundation | 5 min | 2 parallel | Resolver dedup + symbol definitions. Tests green. |
| Waves 1–2 — Core | 8 min | 5 parallel | Verbose logging + CommandLogger migration. Tests green. |
| Wave 2b — Recovery | 7 min | 2 replacement | install.py agent stalled (context exhaustion). Split + re-dispatch. |
| Wave 3 — Polish | 4 min | 1 | Unicode symbol cleanup. Tests green. |
| Ship | 2 min | — | Spot-check, full suite green, CI passed, merged. |
13.3.2 The Three Practitioner Roles in Action
Across wave execution, the human intervened exactly three times, each mapping to one of the three practitioner roles from Chapter 8:
- Architect (during planning): decided to include all severity levels in scope rather than deferring moderate findings. This required judgment about priorities, release timeline, and the cost of context-switching back later.
- Escalation Handler (during Wave 2): diagnosed an agent stall on install.py (58 call sites exhausted the context window), decided to split remaining work across two replacement agents rather than retrying.
- Reviewer (during Wave 2b): triaged a test failure caused by an ordering issue in the migration (a function call was migrated but its setup code wasn’t) and directed a targeted fix.
Three interventions, three roles, no overlap. These are the categories of human judgment that the meta-process surfaces, not eliminates. The case study documents two additional escalation events (a token type correction and a silent NameError) that were caught during checkpoint verification and mapped to anti-patterns in Chapter 14.
13.4 Checkpoint Discipline
A checkpoint is the pause between waves. It is the mechanism that makes the meta-process safe.
13.4.1 Why Test After Every Wave
The alternative (executing all waves and testing at the end) is faster in the best case and catastrophic in the worst case. If wave 3 introduces a regression, and you haven’t tested since wave 0, you don’t know whether the regression was introduced in wave 1, 2, or 3. You can’t bisect. You can’t revert a single wave. You’re debugging a composite changeset that spans the entire execution.
Testing after every wave makes each wave an independently verifiable unit. If wave 2 breaks a test, you know the regression was introduced in wave 2. You can inspect the wave 2 diff, identify the cause, fix it, and continue, without touching waves 0 or 1.
The cost is real: a full test suite run after every wave. In the PR #394 case, that was ~2,850 tests taking approximately 2 minutes per run, across 5 checkpoints (4 waves + final validation) — roughly 10 minutes of testing total. The alternative — debugging a 75-file composite changeset without bisection points — would have cost hours.
13.4.2 What Happens at a Checkpoint
Each checkpoint has four components:
Test gate. The full test suite runs. If any test fails, the wave is not committed. The failure is triaged: either fixed immediately (if the cause is obvious) or escalated to the human.
Spot-check. The human reviews a sample of changes. Focus on boundary conditions, pattern compliance, and scope discipline. Did the agent handle the edge case? Did it follow existing patterns or invent new ones? Did it change only what was specified?
Commit. Every wave gets its own commit with a descriptive message. This creates a clean, bisectable history.
Plan review. Optionally, the human reviews the remaining plan and adjusts if the current wave revealed something unexpected: a dependency that was missed, a task that should be split, a wave that should be reordered.
13.4.3 The ADAPT Loop
When a checkpoint fails (tests are red, an agent is stuck, a dependency was missed), the meta-process doesn’t stop. It adapts.
flowchart LR
D["DETECT<br/>Failure or stall"] --> DG["DIAGNOSE<br/>Root cause"]
DG --> A["ADJUST<br/>Modify plan"]
A --> E["EXECUTE<br/>Re-run wave"]
E -.->|"If new issue"| D
The ADAPT loop connects wave execution back to planning. It is not a fallback. It is a designed part of the process — the mechanism that handles the irreducible uncertainty of working with non-deterministic systems on complex codebases.
In the PR #394 case, the ADAPT loop fired once during wave execution: during Wave 2, when an agent stalled on install.py (58 call sites). The diagnosis was that the file was too large for a single agent session. The adjustment was to split the remaining work across two agents. The re-execution completed successfully. Total cost: 7 minutes. The case study documents additional escalation events caught during checkpoint verification.
The key discipline: adaptation is conservative. You add tasks, split tasks, reorder waves. You do not skip validation. You do not merge unvalidated work. The checkpoint discipline holds even — especially — when things go wrong.
13.5 Session Management
Session discipline follows the principles established in Chapter 12. In the meta-process context, two additional considerations apply:
Phase transitions are natural reset boundaries. Each wave is a logically complete unit of work. Starting a fresh session for each wave means every agent works with a clean context window, loaded with only the instructions and code relevant to its specific task. The orchestrating session is the exception: it runs long intentionally, accumulating task status, wave history, and decision rationale.
Parallel agents produce cleaner context, not just faster execution. The meta-process uses parallel isolated agents rather than a single agent executing tasks sequentially. This is a context quality decision as much as a throughput decision.
13.6 Adapting the Meta-Process
The PR #394 case (see case study for full metrics) involved multiple waves including one recovery wave. Not every change is that large. The meta-process scales in both directions.
13.6.1 Small Changes (fewer than 10 files)
For focused changes within a single concern (fixing a bug, adding a feature to one module, refactoring a small subsystem), the full wave structure is overhead. The meta-process compresses:
- Audit becomes a single expert agent reviewing the relevant files.
- Plan becomes a mental model: you know the scope, there’s one wave.
- Execution is a single wave with 1-2 agents.
- Validate and Ship are unchanged.
The checkpoint discipline still applies: test before committing. The planning discipline still applies: know what you’re changing before you change it. What changes is the formality, not the structure.
13.6.2 Large Changes (more than 100 files)
For changes that span a significant portion of the codebase (a framework migration, a cross-cutting security hardening, a major API version bump), the meta-process extends:
- Audit uses more expert agents (4-6), each covering a different subsystem or concern.
- Plan requires more waves (6-10), with careful dependency mapping and explicit scope boundaries for each.
- Execution may use a two-team structure with distinct agent personas: an architecture team for cross-cutting changes and a domain team for concern-specific changes.
- The ADAPT loop is more likely to fire, and the plan should anticipate it. Leave slack in the wave structure for recovery waves.
The scaling property to preserve: each wave remains independently verifiable. If a 200-file change is decomposed into 8 waves of 25 files each, each wave is still a self-contained, testable unit. The total complexity grows; the complexity of any single checkpoint does not.
13.7 What the Meta-Process Produces
When followed, the meta-process produces four things that manual development typically does not.
Bisectable history. Every wave is a separate commit with passing tests. If a bug surfaces after merge, you can bisect to the exact wave that introduced it. This is not possible with the typical “one giant commit per feature” or “squash everything” approaches.
Auditable decisions. The plan documents what was decided and why. The checkpoint records document what happened at each validation point. The escalation records document what required human judgment and what the judgment was. A reviewer reading the PR has a complete record of the decision chain, not just the final code.
Reproducible process. The meta-process is the same regardless of who executes it or which tool orchestrates it. A different developer, with the same codebase, the same context files, and the same plan, would produce substantially similar output. This is our hypothesis, not a tested claim. Controlled reproducibility experiments would strengthen it. The non-determinism of AI agents is bounded by the determinism of the process around them.
Proportional cost. The time spent scales with the change scope, not the full codebase size — though we have verified this only at the scale documented in the reference case study. The agent works on the files in the plan. The test suite validates the behavior. Nothing else matters.
The meta-process is the operational core of everything this book teaches. PROSE provides the constraints. Context engineering provides the information. Agent primitives encode the knowledge. The meta-process is how they combine into shipped code.
But following a process correctly is only half the challenge. The other half is recognizing when it’s going wrong — when an anti-pattern is forming, when a failure mode is emerging, when the process is producing output that looks correct and isn’t. The next chapter documents what those failures look like and how to prevent them.
This decomposition pattern echoes Conway’s Law: “Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” Here, agent team boundaries mirror module boundaries by design, not accident.↩︎
Brooks makes the same argument in The Mythical Man-Month (1975): “Plan to throw one away; you will, anyhow.” The meta-process inverts this — invest disproportionately in the plan so you don’t have to throw the execution away.↩︎