Dispatch №11Apr 13, 20269 min read

Stop Praying in Prompts. Start Enforcing with Hooks.

Git figured this out decades ago and agents are about to go further.

By Nivedit JainOriginally published on jainnivedit.substack.com
Contents · 9 sections
  1. 01We’ve Seen This Movie Before
  2. 02Prompts Are Probabilistic by Design
  3. 03The Enforcement Gap
  4. 04Agent Hooks Close the Gap
  5. 05But Agent Hooks Are More Powerful Than Git Hooks Ever Were
  6. 06Enforcement in Practice: Failproof AI
  7. 07The GUIDE vs. GUARD Framework
  8. 08Move These Out of Your Prompt Today
  9. 09The Bigger Shift

I have a line in my CLAUDE.md that reads something like:

“Always verify CI pipelines are green as the final step, after every push.”

It doesn’t get more explicit than that. It specifies what (CI green), when (after every push), and where in the sequence (final step). It’s been in my CLAUDE.md for months. And my agent has ignored it more times than I care to count. Broken builds merged. PRs shipped with failing tests. Not because the model didn’t read the instruction — it did. Not because the model is broken it isn’t. Because a prompt is not enforcement.

This is the enforcement gap. And we need to talk about it.

We’ve Seen This Movie Before

Cast your mind back to the early days of collaborative software development. Teams had rules. “Always run tests before committing.” “Don’t push directly to main.” These rules lived in READMEs, wikis, and onboarding docs. Everyone agreed on them — and developers violated them constantly. Not out of malice. Just because humans, under pressure, take shortcuts.

Then Git hooks happened.

pre-commit. pre-push. post-merge. The rule moved out of documentation and into the runtime. Now the process itself enforced the constraint. You couldn’t push without the hook’s permission. The README was still there — but it was no longer where the rule lived. The hook was.

The result? Fewer broken builds. Consistent quality. Teams that could give developers more autonomy because the guardrails were structural, not social.

Sound familiar? We’re in exactly the same moment with AI agents.

Prompts Are Probabilistic by Design

Here’s the uncomfortable truth: an LLM is, at its core, a function that samples from a probability distribution. Every token it generates is a weighted guess based on context. Your instructions in CLAUDE.md are inputs to that distribution — they increase the likelihood of the desired behavior. They do not guarantee it.

This is a feature, not a bug, for most tasks. Probabilistic reasoning is exactly what you want when an agent is writing code, summarizing documents, or planning a project. Nuance, judgment, creativity all of this flows from the probabilistic nature of the model.

But workflow invariants are not creative decisions. “CI must be green before merging” is not a suggestion to weigh against other context. It’s a hard precondition. And hard preconditions cannot be encoded as soft inputs.

When you write a rule in a system prompt and expect it to hold under all conditions, you’re trusting a probability distribution to be a policy enforcer. It isn’t. It never was.

The Enforcement Gap

The gap between “I wrote the rule” and “the rule is enforced” is exactly where agent reliability dies.

It shows up everywhere once you start looking:

  • “Always check CI before merging” → agent merges anyway because it assessed the changes as low-risk
  • “Don’t modify files outside /src“ → agent edits a config file one directory up because it seemed relevant
  • “Run the full test suite before marking a task complete” → agent skips it when the change is “obviously correct”
  • “Never call external APIs without user confirmation” → agent makes the call mid-task because it inferred permission from context

In each case, the model isn’t malfunctioning. It’s doing exactly what language models do: weighing context, making inferences, resolving ambiguity. The problem is that you didn’t want a judgment call there. You wanted a gate.

Agent Hooks Close the Gap

This is what PreToolUse and PostToolUse hooks are for.

Hooks run outside the model. They intercept tool calls before execution or after, and they can block, modify, or log those calls without any involvement from the LLM. The model cannot reason around a hook. It cannot assess that “in this case, the rule probably doesn’t apply.” The hook runs. The condition is checked. The decision is made deterministically.

Here’s what that looks like for the CI case:

The model proposed the merge. The hook decided. No negotiation.

But Agent Hooks Are More Powerful Than Git Hooks Ever Were

Here’s where the analogy breaks open in an interesting way. Git hooks were a step change for developer workflows. Agent hooks are a step change beyond that — because the enforcement problem is harder, and the primitive is richer.

1. Enforcement across an entire autonomous session, not a single operation

A git pre-commit hook fires once: at the moment of commit. An agent hook fires on every tool call across an entire autonomous session potentially hundreds of enforcement points in a single run. Your agent might touch ten files, run six bash commands, make three API calls, and attempt a merge. A hook can enforce invariants at every one of those moments, not just the last one.

2. Three decisions, not two

Git hooks are binary: the operation passes or it fails. Agent hooks have a third option: instruct().

instruct() injects context into the agent’s reasoning mid-task without halting it. It’s the difference between a bouncer who throws you out and one who quietly tells you “the back door is locked tonight, try the side entrance.” The agent course-corrects without losing its train of thought.

3. Hooks can see what the tool returned

PostToolUse hooks inspect tool output before the model sees it. That means a hook can redact secrets from a bash command’s output, scrub connection strings from a file read, or strip API keys from an environment dump — all transparently, before the sensitive data ever enters the agent’s context.

Git hooks have no equivalent. They can’t intercept what a developer sees after running a command.

4. Hooks are async and externally connected

A git hook is a shell script that runs locally. It can check files and run commands that’s about it.

An agent hook is an async function with full network access. It can:

  • Call an approval API and wait for a human to greenlight a destructive operation
  • Post to Slack: “Agent is about to deploy to production, approve?”
  • Query a feature flag system to decide what the agent is allowed to do today
  • Look up whether the user’s billing is active before allowing an expensive API call

The enforcement layer can integrate with the entire organizational context around the agent, not just the local filesystem.

5. Hooks have full session history

Every agent hook has access to ctx.session.transcriptPath the complete log of everything the agent has done in the current session. A hook that fires on a git push can look back and verify that tests were actually run earlier, not just trust the agent’s claim that they were. A hook at Stop can audit the entire session before allowing it to close.

Git hooks know nothing about what the developer did before the commit. Agent hooks know everything.

6. The stakes are just higher

A bad git commit is annoying. An autonomous agent that ignores your rules over a long-running session modifying files, calling APIs, merging code, spending money has a much larger blast radius. The higher the autonomy, the more load-bearing the enforcement layer becomes.

This is why agent hooks aren’t just “git hooks for AI.” They’re a more capable primitive for a more consequential enforcement problem.

Enforcement in Practice: Failproof AI

Failproof AI is a hooks and policies layer for Claude Code agents. It ships with 30 built-in policies and lets you write custom ones with a simple allow / deny / instruct API. Everything runs locally no data leaves your machine.

The CI story: one line

The exact scenario from the opening verify CI is green before the agent wraps up is a built-in policy:

bash

npm install -g failproofai
failproofai policies --install require-ci-green-before-stop

That’s it. require-ci-green-before-stop fires on the Stop event when Claude Code is about to conclude the session. If CI checks are still running or failing, the agent is denied from stopping and told to wait. It uses GitHub CLI under the hood; if gh isn’t available, it fails open and tells Claude why the check was skipped.

Your CLAUDE.md rule is now structural. The agent literally cannot finish without CI passing.

Going further: enforce at merge time too

Want to catch it earlier at the merge attempt, not just session end? Write a custom policy:

javascript

// ci-gate.js
import { customPolicies, allow, deny } from "failproofai";
import { execSync } from "child_process";

customPolicies.add({
  name: "ci-green-before-merge",
  description: "Block git merge/push unless CI is green on the current branch",
  match: { events: ["PreToolUse"] },
  fn: async (ctx) => {
    if (ctx.toolName !== "Bash") return allow();

    const cmd = ctx.toolInput?.command ?? "";
    if (!/git\s+(merge|push)/.test(cmd)) return allow();

    try {
      const result = execSync(
        "gh run list --status failure --branch $(git branch --show-current) --limit 1",
        { cwd: ctx.session?.cwd, encoding: "utf8" }
      );
      if (result.trim()) {
        return deny("CI checks are failing on this branch. Fix them before merging.");
      }
      return allow("CI checks passed.");
    } catch {
      return allow("Could not reach GitHub CLI — skipping CI check.");
    }
  },
});

bash

failproofai policies --install --custom ./ci-gate.js

Now the agent is blocked at two points: the merge attempt (PreToolUse) and session end (Stop). The rule that lived in CLAUDE.md for months and got ignored is now enforced unconditionally at the runtime level.

Every step is a hard gate. The agent can’t handwave past any of it.

The GUIDE vs. GUARD Framework

A simple test for where any given rule belongs: would you be okay if this rule was skipped in 5% of cases?

Prompts guide. Hooks guard. Most people are using prompts for both. That’s the mistake.

Move These Out of Your Prompt Today

None of these require you to touch your system prompt. They’re enforced at the runtime level, unconditionally, on every tool call.

The Bigger Shift

Git hooks didn’t make developers less capable. They didn’t constrain creativity. What they did was move the enforcement burden off individual discipline and onto the system exactly where it belongs at scale.

Agent hooks do the same thing, at a higher order of magnitude. When your invariants live in the runtime rather than in the prompt, you’ve separated guidance from governance. The model remains fully capable of reasoning, planning, and adapting. But it now operates inside a structure that actually enforces the things that cannot be left to probability.

And here’s the counterintuitive part: enforced guardrails enable more autonomy, not less. When you know the hard preconditions are structurally guaranteed, you can give the agent a longer leash on the creative, judgment-heavy parts of the task. You stop hovering. You trust the system because the system has earned it.

Reliable agents aren’t ones with better-written prompts. They’re ones where the runtime enforces the invariants the business actually cares about.

Hooks are that runtime layer. They’re here now. Use them.

Stop praying in your prompts. Start enforcing with hooks.

Get started with Failproof AI:

bash

npm install -g failproofai
failproofai policies --install         # enable built-in policies
failproofai policies --install --beta  # enable the full workflow enforcement chain
failproofai                            # launch the dashboard

→ docs.befailproof.ai · GitHub · Discord

In SF? Come show off your setup.

If you’ve built something interesting with Claude Code wild hooks, cursed CLAUDE.md files, agents you’re almost too afraid to run unattended we want to see it. We’re hosting a meetup for SF-based developers to share what they’re building.

→ Grab a spot