26  What Comes Next

Everything in this book will be partially obsolete within eighteen months. The models will be better, the tools will be different, and capabilities we marked “directional” will be shipping. That is not a flaw in the book. It is the central argument. The methodology survives tool change. The primitives survive model change. The discipline survives everything.

This chapter applies the three-tier honesty framework (available now, emerging, directional) to the trajectory of the field itself. Where the evidence is strong, the predictions are specific. Where it is not, they are marked accordingly. And where the author is guessing, that is said plainly.


26.1 Near-Term: What Changes in the Next Twelve Months

Agent tool use becomes standard, not experimental. The shift from text generation to agents that execute — file operations, terminal commands, API calls, test runs — is underway but uneven.1 Within a year, tool-using agents will be the default interaction mode. This makes Safety Boundaries more critical, not less. A model that generates bad code wastes review time. A model that executes bad commands corrupts state. Guardrails that felt conservative in a text-generation world become essential in a tool-execution world.

Multi-agent orchestration moves from research to practice. Teams today primarily use single-agent interactions. Multi-agent patterns — planning agents dispatching specialists, review agents evaluating output, agents collaborating through shared artifacts — exist in research and early tooling.2 Within a year, they will ship in mainstream platforms. The orchestration disciplines in Chapters 10–12 — task decomposition, wave-based execution, escalation protocols — become operational necessities rather than advanced practices.

The agentic computing stack crystallizes through independent convergence. By mid-2025, at least three independent efforts arrived at the same layered architecture: manifest-based primitive distribution, framework-layer composition, and CI/CD-native execution. Anthropic’s plugin.json3, GitHub’s Agentic Workflows, and open-source frameworks like Squad4 and Spec-Kit5 didn’t coordinate — they converged because the layers reflect real boundaries in the problem. Open-source tools already provide manifest-based dependency resolution and security scanning at the primitive layer, the same architecture as npm or pip applied to agent configuration rather than runtime code. This is the pattern that produced HTTP → REST → Rails/Express → npm/pip → applications: each layer emerged when practitioners needed it, not when a standards body decreed it. Spec-Kit and Squad are to agentic development what Spring and React are to traditional computing — they make orchestration easier in one direction, constrain freedom in another, and consume primitives via package managers above the harness layer. The strategic signal: when independent implementations from different vendors converge on the same architecture, the architecture is real. Organizations investing in the primitive layer (Chapter 20) are building on the layer most likely to remain stable as the framework layer evolves.


26.2 Medium-Term: What Shifts Over One to Three Years

Agent governance becomes a first-class engineering discipline. Today, governance of agent output is handled through existing processes: pull requests, CI, manual approval. This works at current volumes. As output scales and multi-agent orchestration becomes common, dedicated governance infrastructure will emerge: audit trails for agent decisions, policy engines that enforce constraints at execution time rather than review time, cost controls that manage token spend across teams. The governance frameworks in Chapter 5 anticipate this, but the tooling barely exists. Within three years, agent governance platforms will be a category — the way CI/CD became a category over the past decade.

The boundary between “writing code” and “describing intent” blurs. As models improve at understanding architectural context and as context infrastructure matures, the human role shifts further toward specification and validation. The planning phase — defining what the system should do, what constraints it must respect, what trade-offs to accept — becomes proportionally more of the work. The execution phase becomes proportionally more automated. This does not eliminate engineering skill. It shifts where that skill applies: from implementation patterns to system design, constraint definition, and output evaluation. The practitioners who thrive will be the ones who treat specification as an engineering discipline, not a hand-wave before the “real work.”


26.3 Long-Term: Possibilities Over Three to Five Years

These predictions are directional. The author believes they describe where the field is heading. They are opinions, not forecasts.

Full SDLC-phase agent participation becomes achievable. The 5-layer landscape from Chapter 4 places SDLC phases as the topmost layer that consumes everything below it; the phases (Plan, Spec, Build, Review, Test, Deploy, Operate) describe agent participation across the full delivery cycle. Today, mature support exists primarily in Build and Review. Within five years, credible participation across all phases is plausible — not as autonomous replacements, but as capable participants handling routine work under human direction.

Context infrastructure becomes as foundational as CI/CD. Every serious engineering organization today has continuous integration and deployment. Context infrastructure (the context files, instruction hierarchies, and knowledge bases that make agents effective) will follow the same trajectory. Early movers treat it as competitive advantage. Eventually it becomes table stakes. Organizations without it will find agentic tools unreliable and conclude the technology “doesn’t work for us,” the same way organizations without CI concluded automated testing “doesn’t work at our scale.”

gantt
    title Three-Horizon Timeline
    dateFormat YYYY
    axisFormat %Y

    section Near-Term (0–12 mo)
    Tool-using agents standard              :active, n1, 2025, 2026
    Multi-agent orchestration ships         :active, n2, 2025, 2026

    section Medium-Term (1–3 yr)
    Agent governance as discipline          :m1, 2026, 2028
    Spec replaces implementation            :m2, 2026, 2028

    section Long-Term (3–5 yr)
    Full lifecycle agent participation      :l1, 2028, 2030
    Context infra foundational as CI/CD     :l2, 2028, 2030
Figure 26.1: Three-horizon technology timeline

26.4 What Will Not Change

These are the things the author is most confident about, precisely because they are structural rather than technological.

Context will remain finite and fragile. There will always be a limit to how much information an agent can effectively consider. The constraint that context must be structured, scoped, and curated is a property of the problem, not the current technology.

Output will remain probabilistic. Models will get better. They will not become deterministic. Reliability must be architected through constraints and validation, not assumed from model quality.

Explicit knowledge will remain more valuable than implicit knowledge. Agents will not read the minds of the team. Organizations that externalize their knowledge will outperform those that don’t.

Human judgment will remain the bottleneck and the differentiator. The scarce resource is the ability to define what should be built, evaluate whether it was built correctly, and decide what to do when it wasn’t.

Composition will remain necessary. No single agent will hold an entire large system in focus. The tools for composition will improve; the need for it will not diminish.

These five properties map directly to the PROSE constraints from Chapter 12. The constraints were not designed for today’s models; they were designed for the fundamental properties of human-AI collaboration.


26.5 Three-Tier Honesty Applied to This Chapter’s Own Claims

Claim Tier Confidence
Tool-using agents become the default interaction mode Available now High — shipping in multiple platforms
Multi-agent orchestration enters mainstream tooling Available now High — shipping in multiple tools
Agent governance becomes a distinct discipline Emerging Medium — need is clear, tooling is not
Specification replaces implementation as the core skill Emerging Medium — direction clear, timeline uncertain
Full lifecycle agent coverage becomes operational Directional Low-to-medium — plausible, not inevitable
Context infrastructure becomes as foundational as CI/CD Directional Medium — trajectory clear, timeline 5+ years
Agentic computing stack layers consolidate Emerging Medium — convergence visible, standardization not
The five core constraints hold Structural High — properties of the problem

The reader should calibrate accordingly. Invest confidently in the “available now” tier. Prepare for the “emerging” tier. Be aware of the “directional” tier without betting the organization on specific timelines.


26.6 When NOT to Use Agentic Workflows

Not every task benefits from agent orchestration. Applying the methodology where it does not fit wastes time and produces worse outcomes than working manually. Recognize these scenarios early:

The task requires fewer than 50 lines of change. If you can hold the full scope in your head, the overhead of persona design, wave planning, and checkpoint discipline is not worth it. Just write the code.

The domain knowledge is entirely implicit. If the conventions, constraints, and trade-offs cannot be externalized into instruction files – because they depend on political context, unwritten relationships, or organizational history that resists documentation – agents will produce plausible but wrong output. Instrument the codebase first (Chapter 11), then apply agents.

The cost of failure is low and iteration is cheap. For throwaway scripts, prototyping, and exploratory work, a single agent prompt with no orchestration is faster and sufficient. The methodology exists for production-grade work where reliability matters.

The work is inherently sequential and creative. Naming things, choosing abstractions, defining API contracts – these are judgment-dense tasks where agent suggestions help but orchestrated composition adds nothing. Use agents as sounding boards, not as orchestrated fleets.

The platform fights automation. The Growth Engine case study documents three automated approaches to Kit form automation, each hitting React’s virtual DOM. When the platform’s internals are undocumented and hostile to external manipulation, escalate to a human with a precise checklist rather than attempting a fourth approach.

The methodology’s value is recognizing which category a task falls into before committing to an approach.


26.7 Your First Week: What to Do Starting Monday

For leaders who read Parts I–II and practitioners who read Part III, here is the concrete version. Not principles. Actions.

26.7.1 Day 1: Audit One Module

Pick the module your team changes most frequently. Not the biggest module, the most-changed one. Run the methodology from Chapter 11: identify implicit knowledge, undocumented conventions, architectural decisions that exist only in your team’s memory. Write down what you find. You are not fixing anything today. You are measuring the gap between what an agent can see and what your team knows.

Deliverable: A list of 5–10 implicit conventions that an agent would violate on its first task in this module.

26.7.2 Day 2: Write Your First Three Primitives

Take the top three conventions from yesterday’s audit. Write each as an instruction file — one organizational standard, one architectural constraint, one domain-specific rule. Follow the format from Chapter 11, under the constraints from Chapter 12: scoped, testable, specific. Do not try to document everything. Three primitives that cover the most common mistakes are worth more than thirty that cover edge cases.

Deliverable: Three instruction files, committed to your repository.

26.7.3 Day 3: Test Against a Real Task

Pick a task from your current sprint — something an agent would plausibly handle. Run it twice: once without your new context files, once with them. Compare the output. Did the context files prevent the mistakes you predicted? Did they cause new problems? Record the before-and-after. This is your first data point, not your conclusion.

Deliverable: A before-and-after comparison with specific examples of what changed.

26.7.4 Day 4: Measure and Adjust

Review yesterday’s comparison honestly. Which files made a difference? Which were ignored or misinterpreted by the agent? Revise the ones that didn’t land. This is the calibration loop from Chapter 14: context files are not documentation, they are engineering artifacts that need testing and iteration like any other code.

Deliverable: Revised instruction files based on observed agent behavior.

26.7.5 Day 5: Share and Plan

Show your team the before-and-after. Not a presentation — a 15-minute demo at standup. Show the worst agent output without instrumentation and the improved output with it. Then plan: which modules get instrumented next? Who owns which instruction files? How do you keep them current as the code evolves?

Deliverable: A team agreement on next steps and ownership.

26.7.6 The Agent-Side Companion

The week described above is the human-side ramp. The agent side has a packaged companion built by this book’s author — danielmeppiel/genesis, open source, available to install in any AGENTS.md-compatible harness.

Author disclosure. Genesis is built by the author of this book. It is one of several emerging answers to the discipline these chapters argue for; it is not a prerequisite. The chapters stand without it. Read this section as worked example, not endorsement.

I wrote the book first; then I built the tool I wished existed when I started. The companion this part opened with is round one — one author’s first packaged answer to the discipline these chapters argue for, written by the same hand and put on disk in the form a harness already loads.6 Round two is your turn. The patterns will harden, the catalogue will fill in, the rejected near-misses will outnumber the accepted ones; that is the field maturing, not a roadmap. Treat this part’s chapters as the floor, the companion as one floor-plan, and your own first packaged answer as the next entry in the catalogue.

This is not a dependency on the book’s part — every chapter stands without it — and it is not the only such companion the field will produce. It is a worked example of what the practice looks like packaged as a primitive set, written by the same hand. If you choose to load it, when a chapter names a pattern (Wave, Panel, Supervised Execution), grep the skill for the same name. The agent will surface the operational form. Use whichever name lands first in the conversation.

26.7.7 For Leaders, Additionally

If you lead the organization rather than the team, Day 1 is different. Start with the readiness assessment from Chapter 7. Identify one team with the right combination of codebase maturity, process discipline, and cultural openness. Fund a structured pilot — not “give everyone licenses and see what happens,” but the phased adoption from the transition plan. Protect the investment in context infrastructure. It has the highest long-term return and the lowest short-term visibility, which means it is the one most likely to be cut.


26.8 What the Author Probably Got Wrong

Intellectual honesty requires identifying where this book’s assumptions are most likely to age poorly.

The pace of capability improvement may outrun governance. This book assumes organizations will have time to build governance infrastructure before agent capabilities demand it. If capabilities improve faster than organizational maturity — the historical pattern for every technology shift — many organizations will face a period where agents can do more than the organization is prepared to govern.

The emphasis on human-in-the-loop may prove too conservative. For high-stakes production code, human review will hold. For internal tooling, prototyping, and throwaway infrastructure, fully autonomous workflows may become practical sooner than this book suggests. The “always review” stance is safer but may leave real efficiency on the table in contexts where the cost of failure is low.

The multi-agent orchestration model may evolve past human orchestrators. The patterns in this book assume a human planner dispatching specialist agents. Future orchestration may involve agents that plan their own decomposition, negotiate resources, and maintain persistent state across sessions. The compositional principles will likely still apply, but the human-as-orchestrator model this book centers may be a transitional pattern, not an enduring one.

The documentation burden may not pay for itself. This book asks teams to externalize knowledge that was previously implicit. That is real work with real ongoing maintenance cost. If the productivity gains from agentic development are modest — 15–20% rather than the 2–3x some claim — then the time spent creating and maintaining context infrastructure could consume most of the gains. The break-even calculation is less obviously favorable than the book implies, and the author has not seen enough longitudinal data to be certain it tips the right way.

And the uncomfortable one: the author may be overestimating the durability of human judgment as the differentiator. This book argues that human judgment is the bottleneck agents cannot replace — and builds its entire methodology around that assumption. But there is a motivated reasoning risk in any book that argues humans are indispensable, written by a human who wants that to be true. If models develop genuine architectural reasoning — not pattern matching on training data, but the ability to evaluate trade-offs, anticipate failure modes, and make design decisions that hold up under pressure — then the “human judgment” moat this book describes is not structural. It is temporal. The author believes it is structural. The author also acknowledges that this belief is load-bearing for the entire framework, which means it is exactly the kind of assumption that deserves the most scrutiny and the least certainty.


26.9 The Starter Shape

Chapter 4 introduces the Starter Shape with a single hedged paragraph: a typical mid-sized engineering organization, twelve to eighteen months into agentic adoption, converges on a small, stable bundle of primitives whose composition is recognisably the same across teams. The deep treatment of that shape lives here, as the practical ramp Part IV can leave the reader with.

The shape is a small bundle, not a comprehensive one. The early signal — and we name it as a signal, not a settled finding — is that organizations that mature past the experimentation phase converge on something close to:

  • A handful of skill bundles (typically three to seven) covering the highest-leverage recurring tasks — code review, refactoring, test authoring, documentation generation, incident triage. Not thirty skills covering every conceivable workflow.
  • A small set of scope-attached rule files (.instructions.md with applyTo globs) carrying the team’s load-bearing conventions — error handling, logging, security boundaries, cross-platform encoding rules. The set is small because each rule is reviewed every time it loads; the budget is attention, not disk.
  • One agent file per repository (AGENTS.md or its harness-specific equivalent) that names what the agent should know about this repository before it does anything — the build commands, the test commands, the directories it should not touch, the conventions that apply globally.
  • A lockfile and a manifest (Chapter 20’s package layer) that pins the bundle’s transitive closure so that the agent that runs in CI today is reading the same primitives as the agent that runs in CI six months from now.
  • A panel-ready review configuration that lets the team escalate any non-trivial PR to a multi-specialist Panel (Chapter 16) when the change crosses the trivial threshold.

Three properties distinguish the starter shape from the ad-hoc shape teams typically begin with.

It is small enough to read. A new team member should be able to read the entire bundle in an afternoon and know what the agent will do on their first PR. If the bundle is too large to read, no one will read it, and the convention that nobody can recite is a convention the agent will violate undetected.

It is composed, not stacked. Each primitive has a single concern (the 3-concern triplet from Chapter 20) and depends explicitly on the primitives it builds on. The bundle has a dependency graph, not a flat list. When the team adds a new convention, it lands in the right primitive, not in a new one.

It is governed by the lockfile. Every release of every primitive is pinned. When a primitive ships an update, the lockfile diff appears in the next CI run, and the change goes through the same review the source diff goes through. Drift becomes visible; provenance becomes auditable.

The pattern generalizes beyond software. Every domain that runs work as repeatable procedures with reviewers and reviewable artefacts is a candidate for the same shape, and the early signal across domains is consistent — though here we are explicitly in the working-hypothesis register, with limited multi-domain longitudinal evidence.

  • Legal review. A bundle of skills for clause analysis, precedent retrieval, redline generation. Domain Specialists are attorneys. The recursion bound matters more, not less, because the cost of a missed clause is higher than the cost of a missed PR comment.
  • Mergers and acquisitions diligence. Skills for data-room intake, financial-statement triangulation, contract-flag synthesis. The Panel pattern is native: legal, financial, and operational specialists run in parallel against the same artefact.
  • Financial close. Skills for reconciliation, journal-entry review, variance triage. The OPERATIONS concern (cost, drift, rate-limit) dominates because the close runs to a calendar deadline.
  • Marketing campaigns. Skills for brief expansion, channel-specific copy generation, brand-voice review. The Domain Specialist is the brand owner; the Agentic Workflow Engineer is increasingly a marketing operations role rather than a software engineer.
  • Book authoring. The case study on writing this handbook — Part IV — runs the same starter shape end-to-end on this very handbook: skills for outline design, draft generation, review panel, fact-reference checking. The forward-pointer is not rhetorical; the case is the existence proof.

The starter shape is a starting point, not a ceiling. Teams will adapt and extend it; some will abandon parts that do not fit their domain. What matters for the practitioner reading this chapter is that the shape exists, is recognisable, and is reachable in months, not years — provided the team treats primitives as code (Chapter 20), runs the load discipline (Chapter 13), and bounds the attention budget (Chapter 14).


26.10 The Closing Argument

Three beats close this part, the handbook, and the case the handbook makes.

One: the SDLC is one instance of a much wider pattern. This handbook spent twenty chapters inside the software development lifecycle because the SDLC is where the pattern is most legible — a community of practitioners with shared vocabulary, observable artefacts, reviewable diffs, and a culture of measuring its own outcomes. But the architecture in Chapter 4, the primitives in Chapter 20, the composition patterns in Chapter 18, and the staffing in Chapter 6 are not specific to software. They are specific to any domain whose work is procedure-shaped, reviewable, and operable under a recursion bound. The SDLC was the first to package this practice; it will not be the last. Legal, finance, M&A, marketing, clinical decision support, scientific writing — any domain where a Domain Specialist can name a procedure and an Agentic Workflow Engineer can encode it has the same architecture available. The handbook is software-shaped because the proof had to land somewhere; the pattern is general.

Two: this handbook is itself an existence proof. The case study in Part IV is not a meta-curiosity. It is the operational answer to “does this work?” Every chapter you have read was authored under the editorial-pipeline Skill described in that case study — Domain Specialist (the author) defining the WHAT, Agentic Workflow Engineer (the same author, in a different role) encoding the HOW into a Skill that dispatches a Panel of specialist reviewers, Agent Operations Specialist (again the same author, at scale-of-one) running the OPERATIONS. The handbook you are holding is the artefact that the methodology in the handbook produces. If the case had to be made by a tool the field had not yet built, this section would be aspirational. The case is made by a tool that exists, that produced this artefact, and that you can install on Monday morning.

Three: accept that you are early, and start anyway. The field is moving faster than any book can capture. The specific tools will change. The formats will evolve. The capabilities will exceed what is described here. Use the principles, not the specifics. Structure your context, scope your tasks, compose simple building blocks, enforce safety boundaries, and organize your knowledge hierarchically. These disciplines work regardless of which model runs underneath or which tool wraps around it. REST did not make HTTP better. It gave engineers constraints to reason about distributed systems. Twenty-five years later, the constraints still hold, even though every specific technology from that era has been replaced. The aspiration for the architectural constraints in this book is the same: durable reasoning tools for a field that will not stop changing. The methodology is the floor, not the ceiling. Build on it.

The Agentic Era has begun.


  1. GitHub Copilot’s “agent mode” (2025), Cursor’s agentic features, and similar integrations in VS Code, JetBrains, and other IDEs demonstrate this shift. The Model Context Protocol (MCP) standardizes how agents access external tools, accelerating adoption.↩︎

  2. OpenAI’s Swarm framework, Microsoft’s AutoGen, and LangGraph represent early multi-agent orchestration libraries. GitHub Copilot coding agent and similar CI-integrated agents mark the beginning of production multi-agent workflows.↩︎

  3. Anthropic, “Claude Code Plugins,” https://docs.anthropic.com/en/docs/claude-code/plugins↩︎

  4. Brady Gaster, “How Squad Runs Coordinated AI Agents Inside Your Repository,” GitHub Blog, March 2026. https://github.blog/ai-and-ml/github-copilot/how-squad-runs-coordinated-ai-agents-inside-your-repository/↩︎

  5. GitHub, “Spec Kit — Build High-Quality Software Faster,” https://github.com/github/spec-kit↩︎

  6. danielmeppiel/genesis, https://github.com/danielmeppiel/genesis/tree/613f2a64a4a193cacd45fd4439a093044ae3178d. Open-source Agent Skill packaging composition substrate, design patterns, and runtime affordances across AGENTS.md-compatible harnesses (Copilot, Claude Code, Cursor, Codex, OpenCode). Install command and full cross-harness install context are at the Part III opener; this footnote is the citation, not the call to action.↩︎

📕 Get the PDF & EPUB — free download

Plus ~1 update/month max. No spam. Unsubscribe anytime.

Download the Handbook

CC BY-NC-ND 4.0 © 2025-2026 Daniel Meppiel · CC BY-NC-ND 4.0

Free to read and share with attribution. License details