Hooks

A hook is a single Markdown file that subscribes to a lifecycle event and returns an allow / block / modify decision in deterministic Starlark.

If tools are what the agent can do, hooks are what the harness watches and rules on. They are the policy and observability plane of AI Harness — the layer where deny-lists, audits, redactions, retries, and rate limits live, all expressed as code that diffs cleanly in a pull request.

What a hook is

A hook artifact has three jobs:

  1. Subscribe to a lifecycle eventtool.pre, tool.post, turn.start, turn.end.
  2. Inspect the event payload — the tool name and arguments, the model response, the structured result.
  3. Return a decisionallow(), block(reason), or modify(payload).

Hooks are loaded from .harness/hooks/*.md. Unlike tools they are not addressable by the model. The model never sees a hook by name; it only ever sees the consequences (a tool call rejected, a result redacted, a turn re-prompted).

That is the point. A hook is a piece of harness policy that the agent cannot route around.

Anatomy of a hook

A complete, real hook from the governed-agent example:

---
event: tool.pre
priority: 10
when: payload["name"] == "run_command"
script: |
  def handle(event, payload):
      cmd = payload.get("args", {}).get("command", "")
      dangerous = [
          "rm -rf /",
          ":(){ :|:& };:",
          "mkfs",
          "dd if=",
          "shutdown",
      ]
      for d in dangerous:
          if d in cmd:
              metrics.incr("audit.policy.deny")
              return block("dangerous command pattern blocked: '" + d + "'")
      return allow()
---

# command_guard

Hard-blocks well-known destructive shell patterns. Pair with the systemd
unit (`deploy/systemd/harness.service`) for real isolation.

Four things to notice:

  • event: is the subscription. This handler runs on every tool.pre event the harness fires.
  • when: is a static gate. A Starlark expression evaluated against the payload before handle is called — it lets a hook scope itself to a single tool, model, or turn shape without paying the cost of running its body.
  • priority: resolves order. Lower numbers run first. The audit hook ships at priority 1 so it sees every call, even ones a higher-priority policy hook will go on to block.
  • The decision shape is allow / block / modify. That ternary is the entire contract between hook and harness.

The four lifecycle events

The hook event catalog is intentionally small. Every event surface is a deterministic place where the harness needs a yes/no/rewrite answer.

EventWhen it firesTypical use
turn.startBefore the model is called for a new turnInject context, reject empty turns, stamp a trace ID
tool.preAfter argument validation, before run(args)Deny-list tools, scrub args, enforce path/network policy
tool.postAfter run(args) returnsRedact results, truncate, attach metrics, append audit lines
turn.endAfter the model produces its turn outputEmit summaries, write transcripts, fire OTel spans

Each event is dispatched by the runtime unconditionally — there is no fast path that skips hooks. That uniformity is what lets a single hook enforce a policy across every tool the agent will ever call, including ones added later or generated by a self-augmenting workflow.

Coming primitives. The event catalog is designed to grow. Two events in active spec — delegation.post_verify (#103) and agent.stop (#104) — extend the same allow/block/modify model to sub-agent verification and loop-exit decisions. The hook contract you learn here is the contract you keep.

The decision model

Every hook handler ends in one of three calls:

allow()                         # pass through unchanged
block("reason for the agent")   # reject; reason is surfaced as an error
modify({"args": new_args})      # rewrite the payload, then continue

The harness composes decisions across hooks deterministically:

  1. Hooks for an event run in priority order (low to high).
  2. The first block wins — the chain short-circuits and the rest are skipped.
  3. modify rewrites the payload in place for downstream hooks and the underlying operation.
  4. allow is a no-op pass.

There is no "after-the-fact override" and no implicit rule that lets a later hook silently undo an earlier block. The order is the rule.

The Starlark sandbox (and what hooks get extra)

Hook scripts run in the same Starlark dialect as tools — no I/O at the language level, no imports, no mutable globals. Hooks pick up a small set of additional built-ins shaped around their job:

Built-inPurpose
allow() / block() / modify()Decision constructors
metrics.incr / metrics.setCounters and gauges visible to metrics.snapshot()
log / log.info / log.warnStructured logs that flow into turn.end payloads
cache.get / cache.setPer-run KV cache shared with tools
http.get / http.postOutbound HTTP, gated by the network allowlist

Hooks deliberately do not receive exec.run or fs.write. Policy code that can shell out is policy code an attacker can pivot through. If a hook genuinely needs to mutate state (rare), do it through a named tool the hook calls explicitly — the call goes back through the lifecycle and inherits all the same audit guarantees.

Why hooks are files, not callbacks

A hook could in principle be a Go interface registered in init(). We reject that for the default path for the same reasons as tools:

  • Reviewable. A diff like + .harness/hooks/command_guard.md shows the entire policy: subscription, scope, priority, decision logic — in one file.
  • Composable. Layer policies by adding files; remove them with git rm. There is no central registration table to keep in sync.
  • Portable. Hook artifacts move between repos, teams, and harness versions without a code change.
  • Governed. Because hooks are Markdown, other hooks can read them. A meta-policy hook can enforce that every new tool ships with a matching audit hook, or that no hook in .harness/hooks/ lacks a priority:.

The Go-level hook API still exists for cases that genuinely need it (performance-critical paths, native integrations). It is the escape hatch, not the default.

Composing hooks: the policy stack

Real harnesses don't have a hook — they have a stack of hooks for each event. The governed-agent example ships seven, every one of which is independently reviewable:

.harness/hooks/
├── audit_tool_pre.md          # priority 1   — count + log every call
├── audit_tool_post.md         # priority 1   — count + log every result
├── command_guard.md           # priority 10  — deny dangerous shell patterns
├── path_guard.md              # priority 10  — jail filesystem writes
├── prefer_named_tools.md      # priority 20  — reject raw exec.run
├── meta_tool_guard.md         # priority 30  — block tools editing .harness/
└── completion_window_guard.md # priority 40  — cap output size per turn

Reading top to bottom, the policy reads like English: "audit everything, deny dangerous commands, jail the filesystem, only let the agent use named tools, don't let it edit the harness itself, cap completion size." Each line is a file. Each file is a 30-line Markdown artifact. The whole governance posture is a git log.

Hooks versus middleware versus interceptors

People coming from Express, Rails, or gRPC ask where the line is. In AI Harness:

  • Hooks subscribe to agent lifecycle events, not HTTP requests. They see semantic payloads (tool name, arguments, results), not bytes.
  • Hooks return a decision, not a continuation. There is no next() call; the harness owns the chain and runs it deterministically.
  • Hooks are governed alongside tools. They live next to the capabilities they regulate, version with them, and ship with them.

That last point is the one that matters most. In a typical service, the middleware stack and the handler stack live in different repos, owned by different teams, deployed on different cadences. In AI Harness they live in the same .harness/ directory and ship in the same pull request. You cannot land a tool without landing the policy that guards it.

Hook execution lifecycle

For any event, the harness runs this sequence:

  1. Filter. Evaluate each hook's when: expression against the payload; drop the ones that don't match.
  2. Sort. Order surviving hooks by priority ascending.
  3. Dispatch. Call handle(event, payload) for each in order.
  4. Compose. Apply modify rewrites in place; short-circuit on the first block; treat allow as pass-through.
  5. Return. Hand the final decision and (possibly modified) payload back to the caller — the tool dispatcher, the turn loop, or whichever subsystem fired the event.

This pipeline is identical for every event. There is no privileged hook, no built-in policy that runs outside the chain, and no way for a tool or sub-agent to bypass it.

  • Delegation — how sub-agents inherit the same hook surface, and how the upcoming delegation.post_verify and agent.stop events extend it.
  • Governance & Policy — patterns for stacking hooks, allowlists, and tool wrappers into a deployable agent.
  • Guide: Writing a Hook walks through a hook from blank file to production review.
  • Reference: the full Hook Artifact Schema documents every supported frontmatter field and built-in.