5 Governance for AI-Assisted Delivery

An AI agent on your team just merged a pull request that touches your payment processing code. Who approved it? What data did the agent access during generation? Can your auditor trace the decision chain from business requirement to deployed change? If you cannot answer these questions today, your governance framework has a gap exactly where your AI investment is growing fastest.

5.1 The Governance Gap

Software governance has always assumed human actors at every decision point. Code review policies name individuals. Access controls map to employee identities. Audit trails trace decisions to people who can explain their reasoning in a meeting. Compliance frameworks (SOC 2, ISO 27001:2013 and its 2022 revision, PCI DSS) require demonstrating that authorized individuals made deliberate choices about what code runs in production.

AI agents break this assumption.

An agent that writes code, opens a pull request, responds to review feedback, and triggers a deployment pipeline is a participant in your software delivery lifecycle. It is not a tool in the way a linter or compiler is a tool; those produce deterministic output from deterministic input. An agent interprets instructions, makes judgment calls about implementation, and produces different output from the same input on different runs. Your governance framework almost certainly has no category for that.

The gap shows up in three places:

Audit trails end at the human-agent boundary. Your version control system records that a commit was authored by a developer. It does not record that the developer delegated the work to an agent, what instructions the agent received, what context it consumed, what alternative approaches it considered and rejected, or how much of the final code the developer actually reviewed versus rubber-stamped. The commit history tells a story that is technically accurate and functionally misleading.

Approval workflows assume reviewers understand the code. A human reviewer approving agent-generated code faces a fundamentally different task than reviewing human-written code. Human code reflects the author’s thought process; reviewers can follow the logic because it was produced by a mind that works like theirs. Agent code is the output of a statistical process that optimizes for statistically likely token sequences, which correlates with plausibility but does not guarantee correctness. It can be syntactically correct, pass all tests, and still contain subtle misunderstandings of intent that a human author would never produce. Review processes designed for human code are necessary but insufficient for agent code.

Security boundaries were designed for human threat models. Your data classification policies, network access rules, and secret management practices assume that the entity accessing sensitive systems is a human employee whose behavior is constrained by training, judgment, and legal accountability. An agent with access to your codebase, your CI/CD pipeline, and your cloud credentials operates under a different set of constraints: specifically, whatever constraints you explicitly configure. What you do not restrict, the agent will eventually touch.

None of this means agents are ungovernable. It means your existing governance framework needs extension, not replacement. The sections that follow provide the structure for that extension.

5.2 Governance Readiness Checklist

Governance for AI-assisted delivery spans six areas. Each exists on a maturity spectrum. The checklist below is designed for self-assessment: locate your organization on each row, then prioritize the gaps that carry the most risk in your context.

Two background terms make the rows below easier to read. Part III distinguishes strong-form supervised execution — every agent action that crosses the boundary into the deterministic system is auditable, reversible, and policy-checked at execution time — from weak-form supervised execution — a human reviews the agent’s output after the fact, but the action itself was not gated. Most organizations begin with weak-form supervision (PR review of agent-generated diffs) and move row-by-row toward strong-form supervision as the stakes rise. The checklist’s “Enterprise” column is, in effect, the strong-form bar for that row. The “Basic” column is the weak-form floor.

#	Capability	None	Basic	Enterprise
1	Audit trails	No record of which code was agent-generated. Commits attributed to the prompting developer with no distinction.	Agent contributions tagged in commit metadata or PR labels. Prompt history retained for a defined period.	Full provenance chain: instruction given, context consumed, output produced, human review decision, and rationale — all queryable and linked to compliance artifacts. Part III implements this as the agent-stack-trace and lockfile patterns.
2	Agent access controls	Agents run with the developer’s full credentials. No distinction between human and agent access scope.	Agents operate under scoped tokens with reduced permissions. File system and network access restricted to declared boundaries.	Least-privilege agent identities with per-task credential issuance, automatic expiration, and separate audit logging for agent actions. Part III names the mechanism behind this row capability-based security — the runtime decides per-invocation which tools and contexts the agent can reach.
3	Approval workflows	Standard code review applies identically to human and agent code. No additional scrutiny for agent output.	Agent-generated PRs are flagged for enhanced review. Critical paths (auth, payments, data access) require human sign-off regardless of author.	Risk-tiered review: agent output touching sensitive systems routed through security-aware reviewers with checklist-based verification. Approval latency tracked as a metric. This is the strong-form supervision bar — the seam (the boundary where agent action meets the platform) is gated, not just observed.
4	Data boundary enforcement	No controls on what data agents can access during code generation. Proprietary code, secrets, and customer data may enter agent context.	Agents restricted from accessing production data and secrets. Code sent to external models reviewed against data classification policy.	Data loss prevention integrated into agent workflows. Context filters prevent classified data from entering model prompts. Residency requirements enforced per jurisdiction. Part III calls this discipline bounded-scope grounding — context is loaded with explicit provenance and an explicit boundary, not pulled in opportunistically.
5	Cost controls	No visibility into agent-related compute or API spend. Costs absorbed into general cloud bills.	Per-team or per-project token budgets. Alerts on unusual consumption. Monthly cost reporting.	Real-time cost attribution per agent task. Automated circuit breakers on runaway sessions. Cost-per-feature tracking integrated into project planning.
6	Compliance reporting	Cannot demonstrate to an auditor how agent-generated code is governed. Compliance posture unknown.	Periodic manual reports on agent usage, access scope, and review rates. Policies documented but enforcement is process-dependent.	Automated compliance dashboards. Agent governance artifacts generated alongside code. Audit-ready evidence exportable on demand. Policy enforcement is systemic, not procedural.

Most organizations operating at Phase 3 (agentic coding, as described in Chapter 2) will find themselves in the “None” or “Basic” column for at least four of these six areas. That is expected. The purpose of the checklist is not to achieve “Enterprise” everywhere. It is to ensure you are not at “None” in any area that carries material risk for your business.

A note on the Enterprise column. The rightmost column describes a target state. Some of its requirements (full provenance chains, real-time cost attribution per agent task, automated compliance dashboards) exceed what current-generation tooling delivers out of the box. Treat the Enterprise column as directional, not immediate. When your compliance team asks “when do we get there,” the honest answer is: some capabilities are available now with custom integration; others depend on tooling maturity that, as of mid-2025, remains ahead of generally available products. Plan accordingly, and do not let the aspiration prevent progress on Basic.

Where to start. Audit trails and agent access controls are the two capabilities that unblock everything else. Without knowing what agents did and limiting what they can do, the other four capabilities have no foundation. If your assessment shows “None” in these areas, start here.

flowchart TD
    START["Assess 6<br/>governance capabilities"] --> Q1{"Any capability<br/>at 'None'?"}
    Q1 -->|Yes| FIX["Start here:<br/>Audit Trails +<br/>Access Controls"]
    Q1 -->|No| Q2{"All at<br/>'Basic' or above?"}
    Q2 -->|Yes| EXPAND["Safe to expand<br/>agent adoption"]
    Q2 -->|No| FIX
    FIX --> REASSESS["Reassess<br/>quarterly"]
    EXPAND --> MATURE["Invest toward<br/>'Enterprise' per<br/>regulatory scope"]
    REASSESS --> Q1

Figure 5.1: Governance capability quick-start assessment

The governance floor: no capability at “None” before expanding agent adoption. Start with audit trails and access controls — they unblock everything else.

5.2.1 Compliance Framework Mapping

A checklist without regulatory context is a conversation starter, not a decision tool. The matrix below maps each capability row to the compliance frameworks where it is critical. Use it to prioritize: find your regulatory scope in the columns, then focus on the rows marked as critical for that scope.

#	Capability	SOC 2	ISO 27001	PCI DSS	HIPAA	EU AI Act
1	Audit trails	Critical — CC8.1 (change management), CC7.2 (monitoring)	Critical — A.12.4 (logging and monitoring)	Critical — Req. 10 (track and monitor access)	Critical — §164.312(b) (audit controls)	Critical — Art. 12 (record-keeping)
2	Agent access controls	Critical — CC6.1, CC6.3 (logical access, least privilege)	Critical — A.9.2, A.9.4 (access management, access control)	Critical — Req. 7, Req. 8 (restrict access, identify users)	Critical — §164.312(a) (access control)	Relevant — Art. 14 (human oversight)
3	Approval workflows	Critical — CC8.1 (change management)	Relevant — A.14.2 (secure development)	Critical — Req. 6 (secure systems)	Relevant — §164.308(a)(5) (security awareness)	Critical — Art. 14 (human oversight of high-risk AI)
4	Data boundary enforcement	Critical — CC6.7 (data transmission), C1.1 (confidentiality)	Critical — A.13.2 (information transfer)	Critical — Req. 3, Req. 4 (protect stored data, encrypt transmission)	Critical — §164.312(e) (transmission security)	Relevant — Art. 10 (data governance)
5	Cost controls	Relevant — CC3.1 (risk assessment)	Relevant — A.12.1 (operational planning)	Not directly scoped	Not directly scoped	Not directly scoped
6	Compliance reporting	Critical — CC4.1 (monitoring activities)	Critical — A.18.2 (compliance review)	Critical — Req. 12 (security policy)	Critical — §164.308(a)(8) (evaluation)	Critical — Art. 13 (transparency)

How to read this. If you are SOC 2-scoped, rows 1, 2, 3, 4, and 6 are critical — you will face audit findings if any of these are at “None.” If you handle payment data under PCI DSS, rows 1, 2, 3, and 4 are your floor. If you ship to EU markets and your product touches high-risk categories, the EU AI Act makes rows 1, 3, and 6 non-negotiable. Start where your regulatory exposure intersects with your lowest maturity.

5.3 Risk Taxonomy

Agent-introduced risk falls into six categories. Each has specific mechanisms, concrete manifestations, and identifiable owners. The taxonomy is not theoretical — these are risks that organizations adopting agentic development are encountering now.

The consolidated table below captures all six risk categories with representative examples, mitigations, and owners. Three risks that introduce novel failure modes — quality degradation, knowledge atrophy, and supply chain integrity — are expanded in the sections that follow.

Category	Risk	Example	Mitigation	Owner
IP & data exposure	Proprietary code sent to external model	Agent context includes auth module source; developer uses cloud-hosted model without enterprise data agreement	Enforce enterprise-tier agreements with training opt-out. Deploy context filters. Maintain data classification policy covering agent workflows.	Security / Legal
	Training data reproduced in output	Agent generates a near-exact copy of a GPL-licensed implementation, merged without review	Integrate license-scanning tools into CI. Flag agent-generated code for IP review in sensitive components.	Legal / Engineering
Quality degradation	Plausible incorrectness	Agent implements a data pipeline that passes all tests but silently drops null values the business logic depends on	Require property-based or invariant tests for agent-generated code in critical paths. Review output against ADRs.	Engineering leads
	Convention drift	Fifty agent-generated files use three different error-handling patterns; none match the team standard	Encode conventions as structured context (instruction files, linters, architectural rules) that agents consume during generation.	Tech leads / Architects
Dependency & concentration	Model outage	Primary model provider has a 4-hour outage during a release sprint; team cannot complete agent-assisted tasks	Maintain fallback model configurations. Ensure critical workflows degrade gracefully to human-only execution. Test fallback quarterly.	Platform / Engineering
	Vendor lock-in	Organization has 2,000 tool-specific instruction files; switching tools requires rewriting all of them	Use portable, vendor-neutral formats for context artifacts. Separate content from format.	Architecture / Platform
Knowledge atrophy	Debugging skill loss	Junior engineers cannot diagnose a production issue because they never debugged code without agent assistance	Require regular unassisted development exercises. Pair juniors with agent output for review practice.	Engineering managers
	Architectural reasoning decay	Team cannot redesign a subsystem because no one has practiced trade-off decisions outside agent-provided constraints	Rotate architecture review responsibilities. Include constraint-design tasks in sprint work.	Architecture / CTO
Regulatory liability	Implicit compliance violation	Agent generates a logging module that captures user IP addresses and geolocation where this requires explicit consent	Define compliance constraints as explicit agent context for regulated code paths. Require compliance-aware review.	Legal / Security
	Accountability gap	Regulator asks who decided to store customer data in a specific format; the decision was made by an agent in a 50-file PR	Maintain decision logs for agent-generated code in regulated areas. Include “compliance-relevant choices” in PR review checklists.	Engineering leads / Legal
Supply chain & context integrity	Prompt injection via dependency	A transitive dependency README includes hidden instructions causing the agent to exfiltrate environment variables	Restrict agent context to vetted, first-party sources for sensitive operations. Apply context sanitization.	Security / Platform
	Compromised instruction files	Attacker subtly modifies an agent instruction file via PR, causing generated auth code to include a backdoor pattern	Apply code review and change-management controls to instruction files with the same rigor as production code.	Security / Engineering leads

5.3.1 Quality degradation: the silent failure mode

The failure mode of a weak model is obvious: the code doesn’t work. The failure mode of a strong model with poor context is insidious: the code works, passes tests, and silently violates architectural invariants that no test covers.

Agent-generated code introduces quality risks that are fundamentally different from human-written bugs. Plausible incorrectness means agents produce code that reads well and compiles cleanly but misunderstands the intent — a function that returns correct results for all test cases but uses O(n²) complexity where O(n) was required, or a database query that produces correct output but bypasses the caching layer. Hallucinated dependencies means agents reference APIs or methods that don’t exist or have been deprecated; when the hallucination happens to compile, the failure is deferred to production. Convention drift means agents without access to your team’s conventions produce code that works but doesn’t belong — inconsistent error handling, non-standard logging, creative-but-wrong module structure. Each instance is minor. At scale, it degrades the codebase coherence that lets your team navigate and modify code confidently.

5.3.2 Knowledge atrophy: the aviation parallel

This is the least discussed and most consequential long-term risk. When agents handle tasks that humans used to perform, humans get less practice at those tasks. Over months and years, the team’s collective ability to perform those tasks without agent assistance erodes.

Knowledge atrophy is not hypothetical. It follows patterns well-documented in aviation and financial analysis. Airline pilots who rely on autopilot for routine flying are measurably less proficient at manual flying — a fact the industry addresses with mandatory manual-flying requirements. Financial analysts who rely on automated models are less able to identify model failures, which is why regulatory frameworks require human understanding, not just human approval.

In software development, the specific atrophy risks are:

Debugging skills. If agents write the code and agents fix the bugs, junior engineers never develop the debugging intuition that comes from struggling with code they wrote themselves.
Architectural reasoning. If agents make implementation decisions within provided constraints, engineers get less practice reasoning about trade-offs outside those constraints, the kind of reasoning required when the constraints themselves need to change.
Review depth. If reviewers habitually approve agent-generated code that passes tests, the skill of deep code review (reading for intent, not just correctness) atrophies.

Knowledge atrophy does not produce failures in the short term. It produces an organization that cannot recover when agent assistance is unavailable, cannot evaluate whether agent output is correct in novel situations, and cannot train the next generation of engineers. The mitigation is not to avoid agents — it is to design deliberate practice into your development process, the way aviation designs manual-flying requirements into pilot training.

5.3.3 Supply chain and context integrity: the new attack surface

Your agents consume context — instruction files, documentation, configuration, code from dependencies — and that context is an attack surface. Supply chain risk for AI-assisted development extends beyond traditional dependency vulnerabilities into a new category: context poisoning.

Prompt injection via context means an agent that reads repository files, fetches documentation, or consumes dependency metadata can be influenced by adversarial content planted in those sources. A malicious instruction in a dependency’s README or a carefully crafted comment in imported code can alter agent behavior. This is not speculative — prompt injection is an active area of security research and a documented attack vector against LLM-integrated systems.

Compromised instruction files are especially dangerous because your agent instruction files are code that governs code. If an attacker gains write access (through a compromised dependency, a supply chain attack, or a malicious contribution), they can influence every line of agent-generated code without modifying a single source file.

The defensive discipline: bounded-scope grounding. The technique Part III names for working against context-poisoning attacks is bounded-scope grounding: every piece of context that enters the agent’s working set arrives with an explicit source, an explicit scope (where it applies), and an explicit boundary (where it stops applying). The instruction file at the repository root applies everywhere; the instruction file under payments/ applies only there; an external dependency’s documentation does not enter the agent’s context unless the operator explicitly grants it. The discipline is not new — it is the same provenance discipline applied to source code under the supply-chain practices of the last decade. Apply it to context, and prompt injection via context narrows from a routine risk to a deliberately bounded one. Part III operates the mechanism in detail; the governance point is that context provenance belongs in the same review process as code provenance, and the audit trail for both should converge.

Organizational Policy: The Permanent Governance Gap

One limitation of AI-assisted governance that no model improvement will resolve: organizational policy awareness lives nowhere in training data. An agent can enforce coding standards from a rules file. It can run 22 automated policy checks in CI. But it cannot know that your organization’s legal team requires review for any feature touching PII, or that a PR linking a personal asset from a corporate repository creates a compliance risk — unless that policy is explicitly encoded in the context layer. This is why governance primitives must include organizational policies, not just technical standards. Part III makes the underlying mechanism concrete: Chapter 12 explains how to instrument the codebase so policy gates fail loudly rather than silently, Chapter 16 explains how those gates sit at the seam where the agent’s action meets the deterministic system, and Chapter 21 explains how policy primitives are versioned and shared the way application code is. The Growth Engine case study documents this finding in detail: fifteen agent personas across seven expert panels missed a compliance constraint that a human caught in seconds.

5.3.4 The Organizational Knowledge Gap

The Growth Engine finding above is worth treating as the central governance fact this chapter sets up, not a footnote. It is the single result that most clearly separates “AI got better” from “governance got harder,” and it is the result every CTO should expect to see replicated in their own organization within the first year of agent adoption.

What happened. A multi-agent panel of fifteen specialist personas — staffed across seven expert panels covering market analysis, technical architecture, security review, compliance posture, and customer-impact assessment — reviewed a proposed product change end to end. The review produced a structured set of findings, prioritized recommendations, and an explicit go/no-go verdict. A single human reviewer, with thirty seconds and no specialist knowledge, identified a compliance constraint the entire panel had missed. The constraint was a Microsoft-internal policy about the cross-org use of a specific data classification. It is documented in the company’s own policy library. It has been in force for years. None of the fifteen specialist agents flagged it. The case study reports the finding verbatim: “fifteen personas across seven panels missed it because organizational policy is not in the training data.”

Why this is permanent, not transient. The temptation is to read this as a defect of current models that will improve with scale. It will not. Organizational policy is, by construction, not in any training corpus: it is internal, it is privileged, it changes faster than training cycles, and it is often deliberately not written down outside privileged systems. Even if a future model were trained on every public document on the internet, it would still know nothing about your CELA review triggers, your data classification taxonomy, your cross-business-unit data-sharing rules, or your jurisdiction-specific approval thresholds. This gap is a property of what training data is, not a property of how good the model is. It is a permanent boundary of the methodology.

Why specialist agents do not close the gap. It is tempting to assume that a security-specialist agent or a compliance-specialist agent will know the policies a generalist agent does not. The Growth Engine result rules this out: the failed review was a panel of specialists, and the specialists confidently produced authoritative-sounding analysis precisely in the area where they were missing the constraint. A specialist agent without explicit access to your policy library is a generalist agent with a more confident voice, not a more knowledgeable one. The fluency of the output is what makes the gap dangerous: the panel did not say “we don’t know” — it said “we have considered everything and recommend proceeding.”

What the gap means for governance design. Three implications follow:

Policies must be encoded as primitives, not assumed as background. Every organizational policy that the agent could plausibly need to apply must exist as a versioned, machine-readable artefact in the same context layer the agent already loads. Part III makes the encoding concrete; the governance point is that policies which exist only in PDFs, wikis, or institutional memory are, from the agent’s perspective, equivalent to not existing at all.
Policy gates belong at the seam, not in the review. The reflexive response to a policy gap is “we will add another review step.” That response is wrong: weak-form supervision (review-after-the-fact) is a poor fit for low-prevalence, high-stakes constraints, because reviewers quickly learn to skim. The strong-form alternative is to encode the policy as an executable check that runs at the seam where the agent’s action meets the platform — a CI check, a pre-commit gate, an automated approval rule. The gate either passes or it blocks; there is no “the reviewer was tired today” failure mode.
Human-in-the-loop is not a substitute for encoded policy — it is a fallback when the encoding is incomplete. Mature governance treats the human reviewer as the escape valve for the cases the encoded gates cannot anticipate, not as the primary enforcement mechanism. Inverting this — assuming a human will catch what the encoding misses — fails at any scale where review volume exceeds review attention.

The Growth Engine result is not a story about a single bug; it is a story about the shape of every governance gap your organization will encounter in the next twenty-four months. Treat it as the binding evidence that the Enterprise column of the readiness checklist above is not aspirational ornament — it is the required engineering response to a permanent property of the methodology.

5.4 Regulatory Landscape

This section provides awareness of regulatory frameworks that intersect with AI-assisted software development. It is not legal advice. Specific requirements vary by jurisdiction, industry, and use case. Consult qualified legal counsel for your organization’s compliance obligations.

That said, ignorance is not a viable compliance strategy. The frameworks below are the ones most likely to affect engineering organizations using AI agents in production.

5.4.1 EU AI Act

The EU AI Act, which entered into force in August 2024 with phased enforcement through 2027, classifies AI systems by risk tier. Code-generating agents are not, by default, classified as high-risk, but the software they produce may be. If your agents generate code for systems that the Act classifies as high-risk (medical devices, critical infrastructure, safety components), the governance requirements for those systems extend to your development process, including how the code was generated.

Key requirements that affect AI-assisted development: transparency obligations (users must know when they are interacting with AI), record-keeping requirements (logs of AI system behavior), and human oversight provisions (meaningful human control over AI system outputs). Organizations shipping to EU markets should evaluate whether their agent-assisted development process can satisfy these requirements for the risk tier of their product.

5.4.2 SOC 2

SOC 2 audits evaluate controls related to security, availability, processing integrity, confidentiality, and privacy. If your organization undergoes SOC 2 audits, the auditor will eventually ask how AI-generated code changes are governed. The question is when, not whether.

The relevant controls span change management (how agent-generated changes are authorized and reviewed), access management (what systems and data agents can reach), and monitoring (how agent behavior is logged and reviewed). Organizations that cannot produce audit trails for agent-generated changes (who requested it, what the agent accessed, who approved the result) will face findings in their next audit cycle.

5.4.3 Data residency

Model API calls transmit code to infrastructure operated by the model provider. For organizations subject to data residency requirements, whether from regulation (GDPR, sector-specific rules) or contractual obligation, the location where agent context is processed matters. Most major providers offer regional deployment options at enterprise tiers. Verify that your agent tooling configuration routes data through compliant infrastructure, and document the verification.

Framework	Relevance to agent-assisted development	Key requirement	Recommended posture
EU AI Act	Software built by agents may inherit risk classification of the deployed system.	Transparency, record-keeping, human oversight for high-risk applications.	Map your products to risk tiers. Evaluate whether your agent governance satisfies the tier’s requirements.
SOC 2	Auditors will ask about change management for agent-generated code.	Demonstrable controls for authorization, review, and monitoring of all code changes.	Extend existing change management controls to cover agent-generated changes explicitly. Build audit trail capability.
GDPR / Data residency	Agent context may be transmitted to model provider infrastructure in different jurisdictions.	Data processing must comply with residency and transfer requirements.	Verify model API routing. Use enterprise agreements with data processing addenda. Document compliance.
PCI DSS	Agents generating code that handles payment data must operate within PCI scope.	Restrict agent access to cardholder data environments. Log all agent interactions with payment systems.	Include agent access in your PCI scope assessment. Apply the same controls as human developer access.
HIPAA	Agents generating code for health data systems must comply with PHI protections.	Agent context must not include protected health information unless compliant safeguards are in place.	Exclude PHI from agent context. Use on-premises or BAA-covered model deployments for health data systems.

5.5 Board Reporting Template

Leaders need to communicate AI agent adoption status to executive and board audiences. The template below provides a one-page format that covers the four areas boards ask about: what is happening, what it costs, what the risks are, and what decisions are needed.

A status snapshot is a status email. A governance artifact shows where you are, where you are going, and whether you are on track. The template includes targets and trends for every metric row — without them, the board cannot distinguish progress from noise.

AI-Assisted Development — Quarterly Status

Section	Metric	Current	Target	Trend
Adoption	Developers using agent tools	e.g., 120 of 400 (30%)	80% by Q4	↑ from 18% last quarter
	PRs with agent-generated code	e.g., 22%	40% by Q4	↑ from 12%
	Phase maturity	e.g., Phase 2 (conversational)	Phase 3 (agentic) by year-end	Advanced from Phase 1 in Q1
Value	Cycle time (agent-assisted vs. baseline)	e.g., −18% on eligible tasks	−25%	↑ improving (was −11%)
	Deployment frequency	e.g., 3.2/week	4/week	→ flat
	Developer satisfaction (survey)	e.g., 7.4/10	≥7.5	↑ from 6.8
Cost	Tool licensing	e.g., $42K/quarter	≤$50K	→ stable
	Model API / token spend	e.g., $28K/quarter	≤$35K	↑ from $19K (adoption growth)
	Total cost of ownership	e.g., $85K/quarter	≤$100K	↑ tracking to plan
Risk	Governance readiness (lowest capability)	e.g., Basic in 4/6 areas	Basic in 6/6 by Q3	↑ was None in 3/6
	Open audit findings (agent-related)	e.g., 2 open	0	↓ from 5
	Agent-related incidents	e.g., 1 this quarter	0	→ flat
	Data boundary compliance	e.g., Compliant	Maintain	→ stable
	Insurance / liability coverage	e.g., E&O and cyber reviewed; agent clause pending	Agent-specific coverage confirmed	In progress
Decisions needed		Budget approval for next quarter. Data classification policy update requiring board awareness. Vendor contract renewal. Risk acceptance for identified gaps.

The template is deliberately brief. Board reporting should communicate status and surface decisions, not educate the audience on how agents work. The trend column is the most important: it tells the board whether the investment is producing directional progress or whether intervention is needed.

5.6 From Rules to Runway

Governance has an image problem. Engineers associate it with bureaucracy: approval queues that slow delivery, compliance checklists that exist for auditors rather than developers. If you position AI governance as another layer of restriction, adoption will route around it.

The reframe is straightforward: governance enables velocity by establishing the trust boundaries within which teams can move fast. Consider the parallel to automated testing. Before comprehensive test suites became standard practice, every deployment required extensive manual verification. The “governance” (testing) slowed individual changes. But organizations with strong test suites deploy more frequently, not less, because each deployment carries lower risk and requires less manual scrutiny.

Agent governance works the same way. An organization with clear audit trails, scoped agent permissions, and risk-tiered review processes can give agents more autonomy in low-risk areas — because the controls exist to catch problems in high-risk ones. Without governance, every agent interaction carries ambiguous risk, which means cautious organizations restrict agent use broadly, and incautious organizations expose themselves to risks they cannot quantify.

The governance checklist in this chapter is not a ceiling. It is a floor. Build it, and you create the conditions for your teams to adopt agents aggressively where the risk is managed, rather than timidly everywhere because the risk is unknown.

5.7 Chapter Checklist

Use this as a starting point. Adapt the specifics to your organization’s risk profile, regulatory environment, and adoption stage.

Conduct a governance readiness self-assessment using the six areas. Use the compliance framework mapping to prioritize based on your regulatory scope.
Prioritize audit trails and agent access controls if you are currently at “None” in either.
Classify your agent-introduced risks across all six taxonomy categories. Assign owners.
Map your products to relevant regulatory frameworks. Evaluate gaps specific to agent-assisted development.
Review your agent instruction files and context sources for supply chain integrity. Apply change-management controls.
Establish a board reporting cadence. Use the template — with targets and trends — or adapt it to your existing format.
Review your code review process. Verify it accounts for the specific failure modes of agent-generated code, including implicit compliance decisions.
Document your data boundary policy for agent workflows. Verify enforcement is systemic, not procedural.
Design deliberate practice into your development process to mitigate knowledge atrophy.
Test your fallback. Verify your team can sustain delivery if agent assistance is unavailable for 48 hours.
Confirm your E&O and cyber insurance policies address agent-generated code. Raise the question with your CFO before the board does.
Schedule a quarterly governance review. Agent capabilities and regulatory requirements both move fast.

These Governance Principles Have Concrete Implementations

The governance framework above is not theoretical. The APM project implements each principle as CI-enforceable infrastructure:

Lock file audit trails pin every agent configuration to exact commit SHAs with full dependency provenance — producing SOC 2-ready evidence from standard git log queries.
Policy inheritance chains (Enterprise → Organization → Repository) ensure security baselines cascade automatically; child policies can only tighten constraints, never relax them.
CI enforcement gates run 22 automated checks (6 baseline + 16 organizational policy) and block deployments that violate policy — no human gatekeeper required.
Content scanning detects hidden Unicode attacks (bidirectional overrides, tag characters, variation selectors) before files reach agent-readable directories — addressing the prompt supply chain threat at the pre-deployment stage.

The pattern generalizes: governance primitives that can be expressed as CI checks should be. The ones that cannot — organizational policy, legal review triggers, risk classification — must be encoded as explicit context for agents and humans alike.

5.8 The Architecture Decision Matrix

The checklist above tells you what governance capabilities to build. The matrix below tells you, for each material decision an agent might participate in, who decides what, with which evidence, gated by which check. It is the governance overlay that sits on top of the Governance and Distribution layer of the reference architecture (Chapter 4) — the layer that ships primitives and lockfiles between teams. The Decision Matrix is what turns those primitives and lockfiles into auditable governance gates.

The matrix has six columns. Each row is a class of decision. Use it as a starting template; adapt the rows to the decision classes that recur in your organization’s pull-request reviews and incident postmortems.

Decision class	Agent role	Human role	Evidence captured	Gate	Reversibility
Code change in a low-risk module (internal tooling, dev docs, non-production scripts)	Author the diff; run tests; open the PR	Review and merge	Diff, test output, agent stack trace	PR review by one human reviewer	Reversible: revert the commit
Code change in a regulated path (auth, payments, data access, customer PII handlers)	Draft the diff; cite the constraint	Review with security-aware reviewer; sign off	Diff, agent stack trace, the loaded scope-attached rules, the cited constraint	Two human approvers; one must hold the relevant security or compliance label	Reversible at code level; downstream effects (data writes) may not be
Architecture decision affecting two or more teams (API contract, shared module boundary, cross-cutting convention)	Draft an Architecture Decision Record (ADR); enumerate alternatives	Decide; sign the ADR	Draft ADR, discussion thread, the prior ADRs the agent cited	Architecture review forum; named decision owner signs	Reversible only by another ADR; cost of reversal is the migration
Production deploy (any change reaching customer-facing systems)	Build artefact; produce deployment manifest; run pre-deploy checks	Approve the rollout; monitor canary	Artefact hash, lockfile, deployment manifest, pre-deploy check output	CI gate (deterministic); deployment approval (human)	Reversible by rollback within the deployment window; not reversible after data migrations land
External dependency adoption (new library, new agentic primitive bundle from outside the org)	Identify the candidate; produce the lockfile diff and the supply-chain summary	Review the supply chain; approve or reject	Lockfile diff, signature verification output, license report, the agent’s supply-chain summary	Security review; named library owner signs	Reversible only by removing the dependency and refactoring consumers
Installing or running a cost-bearing agentic workflow (frontier-model loops, high-volume automation)	Declare the workflow’s model tier, expected per-run cost, and stop condition; emit cost telemetry	Approve install and run against expected ROI; assign the budget pocket	Cost gradient (model tier, eval results), expected per-run cost, budget-pocket assignment, run telemetry	Cost-vs-value approval at install and at run; local and low-gradient workflows waved through	Reversible: uninstall or revoke; spend already incurred is not recoverable
Production incident response (active outage, data exposure, security event)	Surface diagnostic context; draft remediation candidates; never execute	The on-call engineer decides and executes	Incident timeline, the diagnostic queries the agent ran, the candidates it drafted, the chosen path	Human-only execution; agent runs in advisory mode	Decision-by-decision; the agent does not hold the write capability

Three principles read across every row.

The agent never holds the write capability for an irreversible action. Code commits are reversible (you can revert). Production deploys are partly reversible (you can roll back within a window). Data migrations and external API calls with side effects are not. Wherever the row says the action is not reversible, the human is in the gate, not adjacent to it. This is the strong-form supervised execution bar from the readiness checklist above, applied row-by-row: the matrix is what makes the abstract bar concrete for the decision classes your organization actually encounters.

Evidence is captured as artefacts, not as memory. Every row’s “Evidence captured” column lists files: the diff, the lockfile, the ADR, the agent stack trace, the loaded scope-attached rules. None of these are “the agent’s reasoning” or “the reviewer’s recollection.” When an auditor or a postmortem asks what was loaded into the agent at decision time, the answer is the lockfile and the trace, not a story. Part III names this discipline; Chapter 21 (Chapter 21) makes it operational at the package layer.

Gates are layered, not duplicated. A single decision often passes through more than one gate: a CI gate (deterministic checks), a code review gate (human judgement on the diff), a security review gate (human judgement on the supply-chain or constraint surface). The matrix names which gates apply to which decision class. Adding a gate slows the decision; removing a load-bearing one is how an agent-led pipeline produces a quietly non-compliant outcome that nobody notices until the audit. The matrix is how you keep that decision deliberate.

The cost-vs-value row is the newest of the six, and it has a chapter of its own. The chapter on the agentic SDLC bill builds the operating model this gate enforces — model tiering, intentional budget pockets, and a central catalogue of cost-effective workflows — so that approving spend becomes a deliberate bet rather than a default.

The matrix is the artefact your governance team should adapt and post next to the readiness checklist. The checklist tells you whether the capability exists; the matrix tells you whether each decision uses it. Together they are the governance layer your AI investment deserves.