9 The Agentic Runtime Machine

30-second glossary, before the story

The story below uses three filename shapes that recur across every harness. If they are new, here is enough to read on without losing the thread. Chapter 10 (Chapter 10) catalogues all seven primitive types with full examples and design tests.

.instructions.md — scoped conventions. YAML frontmatter applyTo glob (e.g. applyTo: "src/api/**"). When an agent thread touches a matching file, the harness preloads the body. Closest analogue: a .editorconfig the AI reads.
SKILL.md — a reusable decision framework the agent itself decides to load when the task description matches the skill’s declared purpose. Activated on demand, not on every file touch.
.agent.md — a specialist persona. A separate sub-thread is routed to it through the harness’s delegation tool, inheriting its declared model, tools, and instructions.

For the rest of this chapter, treat these three as concrete instances of the broader category this chapter will name agent source code.

It is a Monday morning. A senior engineer — call her Maya — has spent the last two sprints instrumenting a service repository for agentic development under GitHub Copilot. Conventions live in .github/instructions/api.instructions.md with an applyTo: "src/api/**" frontmatter. A reusable code-review framework lives in .github/skills/code-review/SKILL.md. A specialist persona lives in .github/agents/security-auditor.agent.md. Agents bind reliably; reviews are sharper; her teammates have stopped relitigating naming conventions in pull requests. The instrumentation works.

This morning her team lead asks her to evaluate Claude Code on the same project. Maya clones the repo onto a fresh machine, installs the CLI, runs claude from the project root, and asks it to add an endpoint following the existing API conventions. Claude Code drafts the endpoint without consulting the convention file. The new code uses the deprecated authentication helper. None of the rules in api.instructions.md made it into the model’s context. The skill in .github/skills/code-review/ did not surface during the review. The security-auditor persona — the one her team has been calibrating for a month — is invisible.

Same files. Same model family. Different harness. Silence.

There is nothing wrong with Maya’s primitives, and nothing wrong with Claude Code. What has happened is that Maya has been writing programs in a language whose compiler she did not know she was using. Copilot’s loader recognizes .instructions.md files with an applyTo glob and reads them into context whenever a thread touches a matching path.¹ Claude Code’s loader does not. Claude Code reads a file called CLAUDE.md from the project root and then walks the directory tree for nested CLAUDE.md files, applying the closest one to whatever subtree the agent is working in.² The two harnesses agree on what a “scoped rule” means — text the model sees automatically when working in a particular part of the codebase — but they disagree, completely, on the file name, the folder, and the scope predicate. A developer who cannot name this disagreement experiences it as magic that comes and goes. A developer who can name it has a one-minute fix: write a CLAUDE.md at the repo root that says for API code, follow the rules in .github/instructions/api.instructions.md. When Claude Code reads that CLAUDE.md, it will use the @path import directive to inline the instruction file’s contents into the model’s context at session start.

The point of this chapter is to give you the vocabulary. The runtime under your AI tools is not a single thing. It is the composition of four independently-replaceable parts, and most cross-vendor confusion is a mismatch on one of them. Once you can name the four parts, port stories like Maya’s stop being stories and become bug reports.

9.1 The four parts

Every agentic system you will ever touch is built from the same four parts. They are not branding labels; they are functional roles, and any working setup must fill all four.

flowchart TB
    Client["Client<br/>(terminal, IDE, workflow,<br/>scheduler, webhook receiver)"]
    Tools["Tools<br/>(classical CPU code<br/>orchestrated by the harness)"]
    Harness["Harness<br/>(the compiler that<br/>drives inference)"]
    Source["Agent Source Code<br/>(files loaded at<br/>session start)"]
    Model["Model<br/>(inference engine)"]

    Client <-->|"prompt / output"| Harness
    Tools <-->|"invoke / result"| Harness
    Harness <-->|"load"| Source
    Harness <-->|"inference I/O"| Model

Figure 9.1: The four parts of the agentic-system runtime machine

The model is the inference engine — GPT-5, Claude Sonnet, Gemini, whatever serves the inference endpoint. It takes text in and produces text out. By itself it has no tools, no memory of yesterday, and no awareness of your codebase. Most developers think of “the AI” as the model. Most failures attributed to the model are not the model’s fault.

The harness is the program that drives the model. It is what runs in your terminal when you type gh copilot, claude, cursor, codex, or opencode. The harness manages the conversation, calls tools on the model’s behalf, decides which files to load into context and when, and decides what to do with the model’s output. The harness is the part that varies the most across vendors, and it is the part developers most often confuse with the model.

Agent source code is the directory layout the harness consults to decide what context the model should see. It is not a passive store of documentation. It is executable configuration: a particular file at a particular path with a particular frontmatter shape causes the harness to inject text into the model’s context at a particular moment. Move the file, rename it, change its frontmatter, and the behavior changes — even though the model and the harness are unchanged. This is the layer Maya was unknowingly programming against.

The client is the process that decides when a session runs and what bootstrap context it carries. Clients can be interactive — a developer typing in a terminal (Claude Code, Copilot CLI), an IDE plugin forwarding a selection (VS Code, Cursor) — or programmatic — GitHub Agentic Workflows running over GitHub Actions, a cron daemon, a webhook receiver, a CI runner invoking the harness on every pull request. The client is orthogonal to the harness: a single orchestrator like GitHub Agentic Workflows can spawn sessions in any of several harnesses inside the same run.³ The client is also the first layer that can rewrite what lower layers see — a system prompt injected at the client level reaches the harness before any agent source code does, and the harness may reshape it further before the model sees a single token. Each layer mutates the input for the layer below it.

These four parts are independent. The same primitives can run under Copilot or Claude Code; the same harness can drive any of several models; the same client can spawn either harness. When something behaves differently between two setups, the productive question is not “what is wrong with the AI?” but “which of the four parts changed?”

Where the layers run

The four parts do not have to live on the same machine. Three canonical combinations cover most real deployments:

Full local. Client, harness, agent source code, and model all run on the developer’s laptop. Ollama or a local GGUF serves inference; the harness reads files from the local checkout. Nothing leaves the machine. Useful for air-gapped work, compliance-sensitive repos, or zero-cost experimentation.
Hybrid — local client, cloud model. The client, the harness, and the agent source code stay local; only inference calls cross the network. This is the default for most developers running Claude Code or Copilot CLI against a cloud-hosted model. Primitives remain local files; the model never persists them.
Full cloud. Client, harness, model, and agent source code all run in the cloud. GitHub’s Copilot Coding Agent is the canonical example: a pull-request event triggers a cloud-hosted harness that checks out the repo, loads primitives from the commit, drives inference, and pushes commits — all without touching a developer’s machine.

Layer-locality is a deployment decision, not an architectural one. The four-part model is the same in every case; only the network boundary moves.

9.2 Markdown that steers an LLM is code

Look at one of Maya’s instruction files. Thirty lines of markdown with a YAML frontmatter block declaring an applyTo glob. No braces, no semicolons, no compilation step a developer would recognize. It looks like documentation.

It is not documentation. It has every property of a code artifact:

It is parsed. The harness reads the frontmatter and treats the body as semi-structured input. Misformat the frontmatter and the file silently fails to bind.
It is linked. When it says follow the patterns in tests/test_auth_middleware.py — or when it uses @path (an import directive inside CLAUDE.md) to inline another file — that reference is a real edge in a real dependency graph.
It is loaded. At a defined moment in the session lifecycle, the harness reads the file’s contents and places them somewhere in the model’s context window. This load is observable; you can ask the harness’s verbose mode to print it.
It is executed. The text steers the next token the model emits. A poorly-worded instruction biases the output the way a buggy line of code biases the program. Never use _rich_info() in error handlers and avoid _rich_info() in error handlers are not the same instruction. The first prevents a regression. The second hedges enough that the agent will use it anyway.

The implication is uncomfortable but freeing: every property you expect of code applies to these files. They have versions. They have tests (usually informal — does the agent behave correctly with this rule loaded?). They have regressions. They have linters and dependency closures. Treating them as documentation that occasionally affects the agent leaves you debugging through guesswork. Treating them as code lets you bring the full apparatus of code review, version control, and observability to bear.

This is not metaphor. Chapter 19 will treat primitives as code in operational detail — lint, test, lockfile, CI. That chapter is the practical consequence of the claim this one just made.

9.3 The harness is the compiler

The most useful single sentence in this chapter is: the harness is the compiler.

The model is a runtime. The agent source code is the source tree. The harness is the program that decides what compiles to what, in what order, at what visibility. Two harnesses given identical source produce different running programs because their compilers make different decisions. The decision space is small and enumerable, which is why portability is a problem you can solve rather than a fact you must accept.

Consider a single primitive — a scoped rule that says for code under src/api/, use the auth_v2 helper, never auth_v1 — and how two harnesses materialize it.

Aspect	GitHub Copilot	Anthropic Claude Code
File name	`api.instructions.md` (any stem)	`CLAUDE.md` (exact name, conventional)
Folder	`.github/instructions/`	`src/api/CLAUDE.md` (nested in subtree)
Scope predicate	`applyTo: "src/api/**"` glob in frontmatter	Implicit by directory hierarchy
Multiple files	Many `.instructions.md` files, glob-matched	One closest `CLAUDE.md` wins per subtree
External references	Plain markdown links	`@path` directive (inlines file)
User-scope variant	`.copilot/instructions/` in user home	`~/.claude/CLAUDE.md` in user home
Loaded when	A thread touches a matching path	A thread starts work in that subtree

Both harnesses implement the same substrate concept — text the model sees automatically when working in a particular part of the codebase — but the syntax is incompatible. A primitive written for one is silent in the other. This is not a bug in either tool; it is the unavoidable consequence of two compilers with different opinions about source layout.

Cursor’s .cursor/rules/ directory and OpenAI Codex’s AGENTS.md convention round out the same picture from different angles: same substrate concept, different surface syntax.⁴ ⁵ OpenCode follows the AGENTS.md convention as well.⁶ The agentskills.io standard — the SKILL.md entrypoint with a description-driven match — is the one place the major harnesses have substantially converged, which is why skills port more cleanly than scope-attached rules.⁷

The cross-harness load order looks like this:

flowchart LR
    subgraph Copilot
        direction TB
        C1["1. User config\n.copilot/"]
        C2["2. Project base\n.github/copilot-instructions.md"]
        C3["3. Glob-match instructions\n.github/instructions/*.instructions.md\nfiltered by applyTo"]
        C4["4. Skills available\n.github/skills/*/SKILL.md"]
        C5["5. Agents available\n.github/agents/*.agent.md"]
        C1 --> C2 --> C3 --> C4 --> C5
    end
    subgraph ClaudeCode["Claude Code"]
        direction TB
        K1["1. User memory\n~/.claude/CLAUDE.md"]
        K2["2. Project root\n./CLAUDE.md"]
        K3["3. Nested memories\nsrc/.../CLAUDE.md\n@path imports inlined"]
        K4["4. Skills available\n.claude/skills/*/SKILL.md"]
        K5["5. Subagents available\n.claude/agents/*.md\nvia Task tool"]
        K1 --> K2 --> K3 --> K4 --> K5
    end

Figure 9.2: Cross-harness load order for the same project at session start

Notice that the steps line up — both harnesses do user-scope, then project-scope, then scoped rules, then skills, then specialist personas — but the files do not. A primitive set tuned for one harness is, at the file-naming layer, dead on arrival in the other. The fix is not heroic. It is mechanical: provide a thin shim file at the harness’s expected path that re-exports the canonical primitives. Maya’s project becomes Claude-Code-compatible the moment a CLAUDE.md at the repo root says, in plain markdown, for code under src/api/, follow @.github/instructions/api.instructions.md. The compiler now knows where the source is.

Probing your harness

If you are running Genesis as recommended in Chapter 8, you can probe the substrate model directly. Ask your agent: load Genesis, then list the six runtime-affordance primitives and tell me which file in this project realizes each one for our current harness. The skill walks its own per-harness adapter and prints the mapping. It is the fastest way to confirm what your harness will and will not see — and to spot the gaps Maya hit before they cost a sprint.⁸

The compiler analogy has a second consequence worth stating. When you choose a harness, you are choosing a compiler. The decision is not reversible by editing a config file. Switching harnesses is a port, with all the same costs and gotchas as porting a program from one language to another. Most projects pay this cost only once, and pay it deliberately. The vocabulary of this chapter is what makes the port a project plan rather than a mystery.

9.4 Inference is per-thread; the filesystem is shared

The single most consequential structural property of this whole machine is one sentence:

Inference is per-thread; the filesystem is shared.

When the harness spawns a session, it allocates a thread of inference — a context window, a token budget, a conversation log — that is private to that session. The thread is born empty and dies forgotten. Anything it learned during execution, any decision it made, any tool output it received, is gone the moment the session ends. There is no persistent memory inside the model. The session is amnesiac by construction.

The filesystem, by contrast, is the only thing that survives. Every primitive lives there. Every plan, every memento, every lockfile. When two threads run in parallel — a parent dispatching a child via Claude Code’s Task tool, or a developer running two gh copilot sessions in two terminals — they share zero context. They communicate, if at all, by writing to and reading from the filesystem. The filesystem is the shared memory of every multi-thread agentic system that has ever existed.

This is why the disciplines in Chapter 12 (load lifecycle) and Chapter 15 (multi-agent orchestration) center on the filesystem. When a long session loses the thread, the cure is plan-write-then-reload — the agent writes its plan to a file mid-session and re-reads the file at decision points. The plan survives the inference; the inference does not survive the plan. When a parent agent fans work out to children, the children inherit nothing from the parent’s context except what the parent wrote down. A worker that needs to know the architecture decision its parent just made cannot ask the parent; it must read a file the parent wrote.

If you accept one sentence from this chapter, accept that one. The amnesia of the model and the persistence of the filesystem are the load-bearing asymmetry. Every other discipline in this part of the book is a consequence of it.

9.5 What this chapter unlocks

You now have the vocabulary the rest of Part III is going to spend. A short forward map:

Chapter 10 treats agent source code as a thing you build, primitive by primitive. Now that you know the harness is the compiler and the agent source code is the source tree, the seven-primitive catalogue stops being a list of tools and starts looking like a small programming language.

Chapter 12 treats the load lifecycle in detail. You will learn what to look for in the harness’s verbose output, how to compute the transitive closure of files that load when a primitive activates, and why your skill is silent when a parent rule file overflows the context budget.⁹

Chapter 14 locates the deterministic seam — the line between what the harness deterministically does and what the model probabilistically guesses. The model is one of the four parts. It is the only one that hallucinates. Knowing where it sits is what lets you put the irreversible side effects somewhere else.

Chapter 15 uses the per-thread/shared-filesystem asymmetry to design multi-agent topologies that stay coordinated when they fan out.

Appendix A is the cross-harness reference. When you need to actually port Maya’s project, that is where the matrix lives.

TL;DR — Four parts, one machine

Name the four parts. Model, harness, agent source code, client. When something behaves differently between two setups, ask which of the four changed first.
Treat your markdown as code. It is parsed, linked, loaded, and executed. Version it, review it, lint it, and write it precisely enough that the next agent does not hedge.
The harness is the compiler. Two harnesses given the same primitives produce different running programs. Cross-harness portability is a project plan, not a mystery.
Inference is per-thread; the filesystem is shared. Threads are amnesiac; the filesystem is the only memory that persists. Every coordination pattern in the rest of this book follows from that asymmetry.¹⁰

GitHub, “Adding custom instructions for GitHub Copilot,” https://docs.github.com/en/copilot/customizing-copilot/adding-custom-instructions-for-github-copilot. The applyTo frontmatter field accepts a glob and binds the instruction file to any thread whose work path matches. Files live in .github/instructions/ for project scope and .copilot/instructions/ for user scope.↩︎
Anthropic, “Claude Code: Memory,” https://docs.claude.com/en/docs/claude-code/memory. Claude Code reads CLAUDE.md from the project root and then walks the directory tree, applying the closest CLAUDE.md in scope to work in that subtree. The body may import other markdown files via the @path directive, which inlines the referenced file’s contents into the model’s context at session start. There is no glob predicate; hierarchy is the only scope selector.↩︎
This four-part decomposition borrows substrate vocabulary from the agent-side danielmeppiel/genesis skill, specifically assets/runtime-affordances/common.md, which enumerates six harness-agnostic primitive concepts and treats the client as orthogonal to the inference harness. Agent-side reference; substrate concept.↩︎
Cursor, “Rules for AI,” https://docs.cursor.com/context/rules-for-ai. Cursor stores scoped rules in .cursor/rules/ as files with frontmatter declaring globs and rule type. Same substrate concept as Copilot’s instructions and Claude Code’s nested memories; incompatible surface syntax.↩︎
OpenAI, “Codex CLI: AGENTS.md,” https://github.com/openai/codex. Codex follows the AGENTS.md convention — a single markdown file at the project root (and optionally nested) that the harness loads at session start. The convention has been adopted by several harnesses as a portable lingua franca for project-scope rules.↩︎
OpenCode, https://opencode.ai/docs. OpenCode loads AGENTS.md from the project root using the same convention as Codex, which makes a single AGENTS.md the cheapest portable target across the AGENTS.md-aware harnesses.↩︎
agentskills.io, the open registry standard for the SKILL.md entrypoint with description-driven activation, has been adopted in substantially compatible form by Copilot (.github/skills/<name>/SKILL.md), Claude Code (.claude/skills/<name>/SKILL.md), and several others. Skills port across harnesses more cleanly than scope-attached rules because the standard pinned the file name and the activation contract.↩︎
Genesis ships per-harness adapters under skills/genesis/assets/runtime-affordances/per-harness/ — one each for Copilot, Claude Code, Cursor, Codex, and OpenCode — each mapping the six substrate concepts to that harness’s concrete file paths and frontmatter fields. Agent-side reference; substrate concept.↩︎
The transitive-closure framing — the full graph of files that load when a primitive activates, including dependencies of dependencies — is borrowed from danielmeppiel/genesis, assets/composition-substrate.md. Agent-side reference; substrate concept.↩︎
The four-part runtime machine this chapter just named is the substrate at which the design discipline of Part III stops being optional. The companion the part opener pointed at is one packaged answer to that discipline. Cross-reference; substrate framing.↩︎