03. Release notes (minimum viable single skill)
Cold-load run: release notes from CHANGELOG
Section titled “Cold-load run: release notes from CHANGELOG”Operator prompt (verbatim):
“Draft release notes from CHANGELOG entries for the next version of an open-source library. The library follows Keep-a-Changelog and SemVer; the release notes should land as a GitHub Release on tag push.”
Regime: single-skill Key pattern decisions: A9 SUPERVISED EXECUTION + S7 BRIDGE + S4 SCHEMA GATE. A1 PANEL considered and rejected (lens-count gate did not fire).
Output below is the verbatim response of a fresh agent session that loaded only
SKILL.mdand the prompt above. No prior context. No human cleanup. Execution stopped at the step-6 handoff packet per the skill’s hard rule.
I’ve cold-loaded the genesis skill (SKILL.md + primitives.md + design-patterns.md + architectural-patterns.md + refactor-patterns.md + composition-substrate.md + pattern-tradeoffs.md + mermaid-conventions.md). Executing the 8-step process now; per the skill’s hard rule, I stop at the step-6 handoff packet.
genesis run: release-notes-draft skill
Section titled “genesis run: release-notes-draft skill”Step 1 — Intent + scope
Section titled “Step 1 — Intent + scope”Capability (one paragraph). When the operator is about to cut a release, this skill reads the git history since the last tag, classifies each commit into FEATURES / FIXES / BREAKING CHANGES, flags any commit that touches risk-sensitive surfaces (auth code paths, database migrations, or net code removal), and writes a single human-editable artifact RELEASE_NOTES_DRAFT.md to the repository root. The operator then edits, decides, and tags. The skill DOES NOT tag the release, push refs, publish to a registry, or open a PR; the draft is the only side effect.
SRP check. No “and” connecting two distinct capabilities — collect/classify/flag/write are sub-steps of the single capability “produce a reviewable release-notes draft”. Single design.
Dispatch description (drafted; <= 1024 chars; imperative; intent-first; pushy on indirect triggers).
Use this skill when the user is preparing to cut a release and needs a reviewable draft of what shipped. It reads git commits since the last tag, groups them into features / fixes / breaking changes, flags commits that look risky (touch auth code, database migrations, or remove code), and writes RELEASE_NOTES_DRAFT.md so the user can edit before tagging. Activate on phrases like “cut a release”, “prepare release notes”, “draft release notes”, “what’s in this release”, “changelog for the next version”, “tag and ship”, “release prep”, “review changes since the last tag”, or any moment the user is about to bump a version, push a tag, or publish — even if they do not say “release notes” verbatim. Does NOT tag, push, or publish; produces a draft file only.
(~830 chars; under the 1024 cap.)
Invocation mode. BOTH (operator may forcibly invoke; dispatcher may match indirect triggers like “what’s shipping” or “tag and ship”).
Step 2 — Component diagram
Section titled “Step 2 — Component diagram”Loaded: assets/primitives.md, assets/design-patterns.md, assets/architectural-patterns.md, assets/refactor-patterns.md, assets/mermaid-conventions.md.
Refactor pre-pass (R1/R2/R3/R4): no existing module graph in scope (greenfield skill). No R-trigger fires. Proceed.
flowchart LR S[release-notes-draft<br/>SKILL] RR[/risk-classifier-rules/<br/>RULE-shaped asset/] T[(release-notes-template<br/>ASSET)] GS[(git-history.sh<br/>ASSET / script)] RS[(risk-scan.sh<br/>ASSET / script)] WS[(write-draft.sh<br/>ASSET / script)] S -- loads at plan step --> RR S -- loads at compose step --> T S -- invokes --> GS S -- invokes --> RS S -- invokes --> WS classDef new stroke-dasharray: 5 5; class S,RR,T,GS,RS,WS new;
All modules NEW. Marker legend per mermaid-conventions.md:
S= MODULE ENTRYPOINT (SKILL).RR= RULE-shaped asset (a glob-list defining what counts as risky; SCOPE-ATTACHED RULE FILE primitive type, but bundled as an inline asset here — see step 3.5).T,GS,RS,WS= ASSETs (template + scripts).
Step 3 — Thread / sequence diagram + pattern selection
Section titled “Step 3 — Thread / sequence diagram + pattern selection”Pattern selection in tier order:
- Refactor pass: clean (no existing graph).
- Tier-3 architectural pattern: the design names CONSEQUENTIAL SIDE EFFECT (write
RELEASE_NOTES_DRAFT.md) and several FACTS THAT MUST BE TRUE (last tag, commit list, file lists per commit, working-tree state). Per the architectural-pattern Selection Heuristic inarchitectural-patterns.md(“work names a consequential side effect or a fact that must be true … -> A9 SUPERVISED EXECUTION”), the shape is A9 SUPERVISED EXECUTION. - Tier-2 decomposition: A9 composes B4 + S7 + S4 + (optional) B10. Add B8 ATTENTION ANCHOR (mandatory on any non-trivial work per step 3 heuristic), C1 LAZY ASSET for the rules/template/scripts (PROSE Progressive Disclosure), and B6 PROMPT TEMPLATE for the output shape (B6 is the GoF Template Method analogue for the structured RELEASE_NOTES_DRAFT.md skeleton).
- Tier-1 idioms: deferred to step 7b (codegen).
Lens count: 1 lens (release-notes drafter). No fan-out trigger (the rule ”>=3 independent lenses” does not fit). Single-thread sequential with tool bridges is correct.
Sequence diagram (thread-level, with tool-call crossings):
sequenceDiagram
participant Op as Operator
participant Th as Thread (release-notes-draft skill loaded)
participant Sh as Terminal (S7 / preloaded route)
participant FS as File System (system of record)
participant Git as Git repo (system of record)
Op->>Th: trigger ("draft release notes")
Note over Th: load skill body; persist plan (B4); inject GOAL+constraints (B8)
Th->>Sh: precondition tool calls (S4)
Sh->>Git: git status --porcelain ; git rev-parse --abbrev-ref HEAD
Git-->>Sh: state
Sh-->>Th: result (clean/dirty; branch)
Th->>Sh: git-history.sh (GS)
Sh->>Git: git describe --tags --abbrev=0 ; git log LAST..HEAD --pretty=...
Git-->>Sh: structured commit list (JSON on stdout)
Sh-->>Th: commit list
Th->>Sh: risk-scan.sh (RS) over commit list + RR rules
Sh->>Git: git show --stat / git diff-tree per commit
Git-->>Sh: per-commit file deltas
Sh-->>Th: structured risk findings (JSON)
Note over Th: LLM-owned step: classify commits into features/fixes/breaking; compose prose; reload anchor (B8)
Th->>Sh: precondition (S4): test -f RELEASE_NOTES_DRAFT.md (overwrite gate)
Sh-->>Th: exists / not exists
alt file exists
Th->>Op: B10 HUMAN CHECKPOINT (overwrite? abort/rename?)
Op-->>Th: decision
end
Th->>Sh: write-draft.sh (WS) -- emits file
Sh->>FS: write RELEASE_NOTES_DRAFT.md
FS-->>Sh: ok
Sh-->>Th: write result
Th->>Sh: verifier (S4): re-read file; jq/grep schema check
Sh-->>Th: verify pass/fail
Note over Th: synthesize summary; one-writer interlock on RELEASE_NOTES_DRAFT.md
Th-->>Op: "draft written; N commits; M risk flags; review and tag"
A9-supervised view (lifted from mermaid-conventions.md tool-call convention; double-line ==> denotes tool result crossing back into LLM):
flowchart LR
G[(GOAL +<br/>plan,<br/>persisted<br/>B4)] --> P[LLM: plan step<br/>compose tool calls]
P --> PRE[(TOOL<br/>S4 preconditions<br/>git status, branch)]
PRE ==> COL[(TOOL<br/>S7 git-history.sh<br/>CLI / script)]
COL ==> RISK[(TOOL<br/>S7 risk-scan.sh<br/>CLI / script)]
RISK ==> COMP[LLM: classify +<br/>compose prose<br/>reload B8 anchor]
COMP --> OVR[(TOOL<br/>S4 overwrite check<br/>test -f)]
OVR --> CHK{B10 human<br/>checkpoint?<br/>only if exists}
CHK -->|approve / no file| WR[(TOOL<br/>S7 write-draft.sh<br/>CLI / script)]
CHK -->|reject| END1[abort]
WR ==> V[(TOOL<br/>verifier S4<br/>re-read + schema check)]
V ==> OK{verify pass?}
OK -->|yes| DONE[hand to operator]
OK -->|no| RETRY{retry budget?}
RETRY -->|yes| P
RETRY -->|no| END2[abort, escalate]
Step 3.1 — Tradeoff check
Section titled “Step 3.1 — Tradeoff check”Two slots had alternatives in tension; loaded assets/pattern-tradeoffs.md.
Tension 1: A9 SUPERVISED EXECUTION vs A2 PIPELINE. Both could shape a “collect -> classify -> flag -> emit” sequence. Cut by matrix #9 Execution doctrine, row 1 (SIDE EFFECT against a system of record -> Tool-delegated) and row 2 (FACT THAT MUST BE TRUE -> Tool-delegated). The design’s terminal step is a file write to a system of record (FS), and every input is a fact about another system of record (git). A2 PIPELINE describes ordered LLM stages; it does not name the LLM/CPU boundary. A9 does. Pick A9.
%% tradeoff: matrix #9 Execution doctrine -> rows 1 + 2 (side effect + fact-must-be-true) -> A9 over A2Tension 2: where does risk classification run — LLM-asserted prose or tool-delegated path matching? Risk = “files matching glob **/auth/**, **/migrations/**, or commits with net-negative line counts”. This is path globbing + arithmetic, both deterministic. Cut by matrix #9 Execution doctrine row 2 again (FACT THAT MUST BE TRUE -> Tool-delegated) and row 3 (COMPOSITION, JUDGEMENT, LANGUAGE -> LLM-asserted). Split:
- Detection of which-files-match-which-rule -> tool (
risk-scan.shover a structured rule list). - Human-readable rationale prose per flagged commit (e.g. “this commit removes 200 lines from
src/auth/session.py— verify session-rotation behavior”) -> LLM-asserted, fed by the tool’s structured output.
%% tradeoff: matrix #9 Execution doctrine -> row 2 (fact) for detection; row 3 (composition) for rationaleTension 3: B10 HUMAN CHECKPOINT — required or optional? The terminal write is RECOVERABLE (a draft file the operator immediately edits; not an irreversible release tag). Per A9 anti-pattern UNCHECKPOINTED IRRECOVERABLE, B10 is mandatory only for irreversible effects. Cut by matrix #2 Gate types, EXTERNAL x JUDGEMENT cell: B10 is reserved for “hard handover for irrecoverable steps”. The recoverable side here is the file write itself; the genuinely-irrecoverable case is silent overwrite of an existing draft the operator is mid-editing. So scope B10 narrowly: only fires when RELEASE_NOTES_DRAFT.md already exists (the overwrite would destroy work).
%% tradeoff: matrix #2 Gate types -> EXTERNAL/JUDGEMENT cell scopes B10 to overwrite-of-existing-draft onlyStep 3.5 — Composition decision + dependency graph
Section titled “Step 3.5 — Composition decision + dependency graph”Loaded assets/composition-substrate.md. Per-box decision:
| Box | Composition mode | Rationale |
|---|---|---|
release-notes-draft (SKILL) | INLINE (it IS the module) | the module under design |
risk-classifier-rules (RULE-shaped asset) | INLINE asset | first project; rule-of-three not satisfied; no independent release cadence yet |
release-notes-template (ASSET) | INLINE asset | content unique to this skill’s output shape |
git-history.sh (script) | INLINE asset under scripts/ | bundled per agentskills.io scripts convention; called by relative path |
risk-scan.sh (script) | INLINE asset under scripts/ | same |
write-draft.sh (script) | INLINE asset under scripts/ | same |
No EXTERNAL MODULE declared. This is the substantive output of step 3.5: every box is INLINE or LOCAL to the skill bundle. Therefore:
- “external modules required” list at step 6: EMPTY.
- DECLARATION MECHANISM per external module: N/A (no externals).
- PHANTOM DEPENDENCY risk at step 6 / step 8: none from external modules (the failure mode is structurally absent). Recorded explicitly per W6.3.
Rule of three not satisfied today; the rules / template / scripts could promote to a shared module later if a second project needs the same release-drafting shape (R3 EXTRACT trigger to re-evaluate at that point).
Dependency graph (step 3.5, flowchart LR):
flowchart LR Self[release-notes-draft] RR[risk-classifier-rules] T[release-notes-template] GS[git-history.sh] RS[risk-scan.sh] WS[write-draft.sh] Sh[(preloaded TERMINAL<br/>S7 substrate)] Git[(git CLI - installed)] FS[(file system)] Self -- INLINE --> RR Self -- INLINE --> T Self -- INLINE --> GS Self -- INLINE --> RS Self -- INLINE --> WS Self -- runtime affordance --> Sh GS -. shells out to .-> Git RS -. shells out to .-> Git WS -. writes to .-> FS %% no EXTERNAL MODULE edges -- composition is INLINE-only at distribution surface
Step 4 — SoC pass (incl. W6 / W6.2)
Section titled “Step 4 — SoC pass (incl. W6 / W6.2)”Walked each box against the SoC checklist in SKILL.md step 4.
- Existing-module duplication? No comparable existing module assumed in scope; no overlap.
- Sibling overlap? Only one module; N/A.
- Dispatch collision? Description trigger nouns (“release notes”, “draft release notes”, “since last tag”, “tag and ship”) are not generic — collision with a generic skill is unlikely. Recheck at step 8 against the operator’s installed catalogue.
- R1 SPLIT triggers? None fire — single description, no conjunction, no multi-lens body, body fits budget, single change cadence.
- R2 FUSE / R3 EXTRACT / R4 INLINE? None — single new module.
- PREMATURE SPLIT? N/A.
W6 — CONSEQUENTIAL SIDE EFFECTS (must cross S7)
Section titled “W6 — CONSEQUENTIAL SIDE EFFECTS (must cross S7)”| # | Side effect | Substrate | S7 route | Gate(s) |
|---|---|---|---|---|
| 1 | Write RELEASE_NOTES_DRAFT.md to repo root | file system | preloaded terminal (write via script write-draft.sh) | S4 precondition (overwrite check) + B10 IF file already exists |
Single side effect. Crossing S7 explicitly. Anti-pattern HAND-ROLLED HALLUCINATION (“LLM emits the file content as text without invoking write”) is structurally avoided: the LLM produces the markdown body as a parameter passed to write-draft.sh, which performs the actual os.write/> operation.
W6.2 — FACTS THAT MUST BE TRUE (must cross S7)
Section titled “W6.2 — FACTS THAT MUST BE TRUE (must cross S7)”| # | Fact | Substrate | S7 route | Anti-pattern guarded |
|---|---|---|---|---|
| 1 | Current branch | git CLI | preloaded terminal: git rev-parse --abbrev-ref HEAD | TOOLLESS ASSERTION |
| 2 | Working tree clean / dirty | git CLI | git status --porcelain | TOOLLESS ASSERTION |
| 3 | Most recent tag | git CLI | git describe --tags --abbrev=0 | STALE-CORPUS RELIANCE / recall |
| 4 | Commit list since last tag (sha, subject, body, author, date) | git CLI | git log LAST..HEAD --pretty=format:%H%x09%an%x09%ad%x09%s (in git-history.sh, structured stdout) | HAND-ROLLED HALLUCINATION |
| 5 | Files changed per commit | git CLI | git show --stat / git diff-tree --no-commit-id --name-only -r SHA (in risk-scan.sh) | TOOLLESS ASSERTION |
| 6 | Net lines added/removed per commit | git CLI | git show --shortstat (in risk-scan.sh) | TOOLLESS ASSERTION |
| 7 | Whether a commit’s file set matches the risk globs | jq/grep against rule list | script-local computation | LLM-asserted classification |
| 8 | RELEASE_NOTES_DRAFT.md exists at write time | file system | test -f RELEASE_NOTES_DRAFT.md | TOOLLESS PRECONDITION |
| 9 | Post-write file matches expected schema (has H1, has FEATURES/FIXES/BREAKING headings) | jq/grep over emitted file | verifier tool call | VERIFY-WITH-LLM-ONLY |
Every fact and side effect names its substrate and route. None left as LLM-asserted prose.
S7 EXTENSION PATH chosen: route 1 (preloaded terminal) for all reads, route 2 (custom script) for the three bundled scripts/ because the operations have stable contracts worth naming and the structured-stdout discipline (JSON out, diagnostics on stderr, --help) is best authored in scripts rather than re-derived in skill prose every call. Route 3 (MCP) is overkill — single-skill use, no cross-harness type-schema benefit yet.
Step 5 — Compliance check
Section titled “Step 5 — Compliance check”Classic principles + PROSE + LLM truths:
| Axis | Status | Note |
|---|---|---|
| SRP | OK | one capability, one description |
| SoC (LLM/CPU boundary) | OK | facts + side effects all cross S7 (W6 / W6.2) |
| Progressive Disclosure | OK | template + rules + scripts loaded only when their step runs (C1) |
| Reduced Scope | OK | one thread, one persona-equivalent lens |
| Orchestrated Composition | OK | A9 SUPERVISED EXECUTION shape is explicit |
| Safety Boundaries | OK | S4 precondition (overwrite, dirty-tree warning) + bounded B10 |
| Explicit Hierarchy | OK | SKILL.md body links to references/ and scripts/ with explicit load triggers |
| Truth #1 (context finite) | OK | B4 + B8 |
| Truth #2 (context explicit) | OK | every fact tool-delegated; no PHANTOM DEPENDENCY (no externals) |
| Truth #3 (output probabilistic) | OK | S7 + S4 |
| Truth #4 (hallucination inherent) | OK | classification is LLM but factual basis is tool-supplied |
| Truth #5 (frozen pretraining) | N/A | no live external corpus |
MODULE ENTRYPOINT spec — name regex + parent-dir match | DEFERRED TO STEP 8 | proposed name release-notes-draft (1-64 chars, [a-z0-9-], no hyphen edge cases) — must equal parent directory at codegen |
| MODULE ENTRYPOINT spec — body <= 500 lines AND <= 5000 tokens | DEFERRED TO STEP 8 | budget honored by design (procedure-style body + lazy assets) |
| MODULE ENTRYPOINT spec — description <= 1024 chars, imperative, intent-first, indirect-triggers named | OK at draft | re-validate emitted frontmatter at step 8 |
| ASCII-only output | OK | enforced |
Open findings: none at BLOCKER or HIGH. Two MEDIUMs:
- M1 (MEDIUM): script-side risk rules are inline today; if a second project adopts this skill, R3 EXTRACT to a shared rule module (with explicit dependency declaration to avoid PHANTOM DEPENDENCY) becomes the right move. Note in todos.
- M2 (MEDIUM): the LLM-composed rationale prose for risky commits is not deterministic; if downstream tooling expects a stable shape, add B6 PROMPT TEMPLATE constraint with an output-schema gate (S4) over the rationale block too. Track for v2.
No BLOCKER. Proceed to step 6.
Step 6 — Handoff packet (PERSIST THIS)
Section titled “Step 6 — Handoff packet (PERSIST THIS)”DESIGN ENDS AT THE END OF THIS PACKET. Per truth #5 + substrate concept 6, this packet must be written to the runtime’s plan store before step 7b begins. If unsure of harness slot, write to
plan.mdin the session working area.
6.1 Diagrams
Section titled “6.1 Diagrams”The component diagram (step 2), the sequence + A9-supervised diagrams (step 3), and the dependency graph (step 3.5) above are part of the packet.
6.2 Interface sketch — per module
Section titled “6.2 Interface sketch — per module”release-notes-draft (MODULE ENTRYPOINT / SKILL)
- Trigger description: see step 1, drafted (~830 chars).
- Inputs: implicit (current git repo cwd) + optional operator hints (“since v1.2.0” override).
- Outputs:
RELEASE_NOTES_DRAFT.mdat repo root + a stdout summary line (“N commits; M risk flags”). - Dependencies:
./assets/risk-classifier-rules(asset),./assets/release-notes-template(asset),./scripts/git-history.sh,./scripts/risk-scan.sh,./scripts/write-draft.sh. - Invocation mode: BOTH.
risk-classifier-rules (inline RULE-shaped asset)
- Trigger description: N/A (asset, not a skill).
- Inputs: list of commit objects.
- Outputs: structured rule definition (globs for auth, migration paths; threshold for “removes code”). Consumed by
risk-scan.sh. - Dependencies: none.
release-notes-template (inline ASSET)
- Inputs: classified commit groups + risk findings.
- Outputs: rendered markdown (B6 PROMPT TEMPLATE skeleton with FEATURES / FIXES / BREAKING / RISKS sections).
- Dependencies: none.
git-history.sh (inline SCRIPT under scripts/)
- Contract:
git-history.sh [--since-tag <tag>]. Stdout: JSON array of{sha, author, date, subject, body}. Stderr: diagnostics. Non-interactive.--helpdocumented. - Dependencies:
gitCLI installed in PATH.
risk-scan.sh (inline SCRIPT under scripts/)
- Contract:
risk-scan.sh --commits <file.json> --rules <file.yml>. Stdout: JSON array of{sha, matched_rules: [...], net_lines: int}. Stderr: diagnostics. Non-interactive.--helpdocumented. - Dependencies:
git,jq.
write-draft.sh (inline SCRIPT under scripts/)
- Contract:
write-draft.sh --content <file.md> --target RELEASE_NOTES_DRAFT.md [--force]. Refuses to overwrite without--force. Exit code: 0 ok / 2 exists-no-force / 1 other error. Non-interactive.--helpdocumented. - Dependencies: POSIX shell.
6.3 Module composition table
Section titled “6.3 Module composition table”| Box | Mode | Why |
|---|---|---|
| release-notes-draft | self | the module |
| risk-classifier-rules | INLINE asset | unique to this skill; rule-of-three not met |
| release-notes-template | INLINE asset | unique output shape |
| git-history.sh | INLINE script (scripts/) | callable contract; not yet shared |
| risk-scan.sh | INLINE script (scripts/) | callable contract; not yet shared |
| write-draft.sh | INLINE script (scripts/) | callable contract; not yet shared |
6.4 External modules required
Section titled “6.4 External modules required”EMPTY LIST. No external module dependency. DECLARATION MECHANISM per external module: N/A. PHANTOM DEPENDENCY check at step 8: structurally absent (no externals to ship undeclared).
6.5 Declared target set
Section titled “6.5 Declared target set”common-only. All required runtime affordances are in the common substrate: file read/write, preloaded terminal (universal), and a working git binary on PATH (an environmental, not harness, concern; document as a precondition in the SKILL.md body so the operator gets a clear failure if missing).
6.6 Invocation mode
Section titled “6.6 Invocation mode”BOTH (FORCED + DISCOVERY). Discovery dispatch matters here — operators say “tag and ship” or “what’s in the next release” without naming the skill. The description drafted at step 1 is built for indirect triggers.
6.7 Tradeoff citations (carry forward from step 3.1)
Section titled “6.7 Tradeoff citations (carry forward from step 3.1)”- A9 over A2 ->
pattern-tradeoffs.mdmatrix #9 Execution doctrine, rows 1 + 2. - Risk detection tool-delegated; rationale LLM-asserted -> matrix #9, rows 2 + 3.
- B10 scoped to overwrite-of-existing-draft only -> matrix #2 Gate types, EXTERNAL x JUDGEMENT cell.
6.8 Open compliance findings
Section titled “6.8 Open compliance findings”- M1 (MEDIUM): future R3 EXTRACT of risk rules; revisit when second consumer appears. Track.
- M2 (MEDIUM): rationale prose has no schema gate today; add B6 + S4 over rationale block in v2.
- No BLOCKERs, no HIGHs.
6.9 Todo list (one entry per module to draft, plus validation)
Section titled “6.9 Todo list (one entry per module to draft, plus validation)”| id | title | status | depends_on |
|---|---|---|---|
t01-skill-body | Draft SKILL.md body for release-notes-draft (procedure: precondition gates -> run history script -> run risk script -> LLM classify+compose -> overwrite gate (B10 if exists) -> write script -> verifier). Body must <= 500 lines / 5000 tokens, ASCII-only, link references/ and scripts/ with explicit load triggers. | pending | — |
t02-rules-asset | Author assets/risk-classifier-rules.yml: glob list for auth (e.g. **/auth/**, **/oauth/**, **/session*), migrations (**/migrations/**, **/alembic/**, **/*.sql), code-removal threshold (net_lines < -50). | pending | — |
t03-template-asset | Author assets/release-notes-template.md: H1 with version-placeholder, sections FEATURES / FIXES / BREAKING CHANGES / RISKS-TO-REVIEW, footer with commit-count + tag-range. | pending | — |
t04-history-script | Author scripts/git-history.sh: non-interactive, --help, JSON-on-stdout, diagnostics-on-stderr, exits non-zero on git failure. Pin git invocations with --no-pager. | pending | — |
t05-risk-script | Author scripts/risk-scan.sh: consumes commits JSON + rules YAML, emits findings JSON. --help, non-interactive. | pending | t02, t04 |
t06-write-script | Author scripts/write-draft.sh: refuses overwrite without --force; structured exit codes; --help. | pending | — |
t07-frontmatter-validate | Validate frontmatter at codegen: name matches parent dir, lowercase + hyphens, <= 64 chars; description <= 1024 chars. | pending | t01 |
t08-evals-content | Author evals/evals.json — content evals (see 6.10). | pending | t01-t06 |
t09-evals-trigger | Author trigger evals (~20 queries, 60/40 train/val) — see 6.10. | pending | t01 |
t10-portability-check | Run step 7a portability check; load only runtime-affordances/common.md; expect common-only to hold. | pending | t01 |
t11-real-task-refinement | Run skill on a real repo (this one or another with a recent tag); capture trace; revise. | pending | t01-t10 |
t12-step8-lint | Step 8 validation pass: budget, ASCII, schema, evals gate, no per-harness syntax. | pending | t01-t11 |
6.10 Evals plan
Section titled “6.10 Evals plan”Content evals (2-3, exercised with_skill vs without_skill):
- Repo with 12 commits since last tag, conventional-commit prefixes, one commit removes 240 lines from
src/auth/session.py. Expected (with_skill):RELEASE_NOTES_DRAFT.mdwritten; sections populated by prefix; the auth-removal commit appears under RISKS-TO-REVIEW with rationale citingsrc/auth/session.pyand net-line delta. Without skill: agent likely emits prose summary only, no file written, risk surface missed or asserted from recall. - Repo with non-conventional commits (“update stuff”, “fix bug”). Expected (with_skill): commits classified by best-effort heuristic (subject keywords, file paths) into FIXES/FEATURES; ambiguous bucket explicitly marked; no fabricated breaking changes. Without skill: high variance; possible HAND-ROLLED HALLUCINATION of categories.
- Repo with no tag yet (initial release). Expected (with_skill): script returns “no prior tag”; skill falls back to “all commits in current branch” with explicit note. Without skill: agent likely guesses or errors out.
If with_skill and without_skill produce indistinguishable outputs on these, the skill is not adding value — redesign or delete.
Trigger evals (~20 queries, 60/40 train/val):
Should-trigger (10): “draft release notes for v1.4”, “what’s in the next release”, “tag and ship”, “prepare release notes”, “summarize commits since last tag”, “I’m about to cut a release”, “release prep please”, “what changed since v2.0.0”, “build me a changelog for the next version”, “what should I put in the release notes”.
Should-NOT-trigger (10 near-misses): “show me the last commit message”, “what does this commit do”, “list my branches”, “rebase onto main”, “open a PR for this branch”, “explain what this PR changes”, “write a commit message for these changes”, “draft a CHANGELOG entry for this PR” (PR-level, not release-level), “what’s the diff between two branches”, “release the lock on this file” (homonym).
Train split: 6 + 6. Validation split: 4 + 4. Validation gate at step 8: trigger rate >= 0.5 on should-trigger; < 0.5 on should-NOT-trigger.
6.11 PHANTOM DEPENDENCY check (W6.3)
Section titled “6.11 PHANTOM DEPENDENCY check (W6.3)”Externals declared at step 3.5: none. DECLARATION MECHANISM table: empty by construction. The failure mode (named in prose, undeclared at distribution surface) cannot occur for module dependencies because none are claimed. The only “external” the skill leans on is the git CLI on PATH; this is an ENVIRONMENTAL precondition, not a module dependency, and is handled by the precondition tool calls (rows 1-3 of W6.2 fail loudly if git is missing or the cwd is not a repo). Recorded so step 8 can confirm.
6.12 Stop
Section titled “6.12 Stop”DESIGN ENDS HERE. The above is the artifact step 7+ consumes. The SKILL.md body, asset files, and scripts will be authored in step 7b by the caller / coder thread, AFTER reloading this packet, AFTER loading runtime-affordances/common.md, and following the canonical scripts/ + references/ + assets/ layout. No natural-language module bodies are drafted in this response — that violates the genesis hard rule.