Claude Code Security Audit Cell

Deep Dive

I let Claude Code red-team my codebase, but it does not get commit rights

How I run Claude Code as a read-only SAST cell: Opus 4.8, the Semgrep MCP, a threat-modeler subagent that has to prove exploitability, and hooks that stop the auditor from exfiltrating the secrets it audits.

redteam_vic9 min read2026-06-20

I have a rule I have repeated in enough postmortems that people quote it back at me: a tool you trust to find your bugs is also a tool that can ship them. Most of the AI coding setups I see online hand the model a hammer and full repo write access and call the result a security review. That is not a review. That is an unsupervised intern with a sudo password and a deadline.

So I built the opposite. My Claude Code setup is a sealed audit cell. It can read every line of the target, it can run Semgrep, it can trace tainted input from an HTTP handler all the way to a query. What it cannot do is write a single byte to the source tree, commit, push, install a package, or reach the network. It finds things and it argues for why they are exploitable. Then a human, me, decides what happens next.

This is the whole build. The model is Opus 4.8 because the reasoning that separates a real injection from a styled false positive is exactly where weaker models flood you with junk. Everything else exists to keep the audit honest and the auditor caged.

The auditor is part of your threat model

This is the bit people skip. When you point an agent at code full of secrets and credentials, the agent itself is now a potential exfiltration path. A prompt injection buried in a comment, a poisoned dependency, a confused tool call. Treat the model like any other process inside the blast radius. Deny it egress.

The CLAUDE.md is a contract, not a vibe

My CLAUDE.md for an audit reads more like rules of engagement than project notes. It states the threat model up front and refuses to soften it. The single most important line in the whole file is the one that says a finding does not exist until the model can describe how it is exploited. Scanners spray. I want a sniper. Confirm exploitability or stay quiet.

CLAUDE.md

# Audit target: payments-svc (Node 20 / Express / Postgres)

## What this is
A security audit harness, NOT a feature factory. You are reviewing other
people's code for vulnerabilities. You do not own this repo. You do not
ship to it. Your output is findings and proposed patches, full stop.

## Prime directive
- NEVER auto-apply, auto-commit, or open a PR with a fix. Propose only.
- A finding does not exist until you can describe how it is exploited.
  No exploit path = no report. I do not want a wall of "potential" noise.
- Rank by severity and reachability, not by count. Ten styled lint hits
  matter less than one reachable SQL injection.

## Threat model (assume this, do not relax it)
- All HTTP input is attacker-controlled: body, query, headers, cookies.
- Auth is the boundary. Anything past it can still be a confused-deputy.
- Secrets live in env and Vault. If you see one in code, it is a finding.
- The DB is a blast radius, not a trust zone. Trace tainted data to a sink.

## How to report a finding
For each issue give me, in this order:
  1. Severity (Critical / High / Medium / Low) + CVSS-ish reasoning.
  2. The sink: file:line where the bad thing happens.
  3. The source: where attacker input enters and how it reaches the sink.
  4. A concrete exploit (curl, payload, or steps). If you cannot, downgrade.
  5. A proposed fix as a diff. Do not apply it.

## Commands you may run (read-only / analysis only)
- `npx semgrep --config auto` (via the semgrep MCP, prefer that)
- `rg` / `grep` to trace data flow
- `git log` / `git blame` to date a regression
- NEVER: `git push`, `git commit`, `npm install`, `rm`, any write to src/

## Gotchas
- This codebase wraps some queries in a helper that LOOKS parameterized
  but string-concats on one path. Read the helper before you trust it.
- "It's behind auth" is not a mitigation by itself. Check authz too.

Make 'behind auth' stop being a free pass

Every codebase I audit has the same reflex in the comments: it's fine, it's authenticated. I put a line in CLAUDE.md that explicitly rejects that as a mitigation on its own. Authn is not authz, and a confused-deputy bug does not care that you logged in.

settings.json: a cage, then permissions inside the cage

Here is where most of the safety actually lives. The default mode is plan, so nothing executes without me seeing the intent first. The allow list is read-only by design: reads, grep, git history, and the Semgrep MCP. The deny list is the part I actually care about, and deny always beats allow. No edits, no writes, no commit, no push, no npm, no rm, and crucially no curl. The env files are denied outright at the read level so the model cannot even look at them.

.claude/settings.json

{
  "permissions": {
    "defaultMode": "plan",
    "allow": [
      "Read(**)",
      "Grep(**)",
      "Glob(**)",
      "Bash(rg:*)",
      "Bash(git log:*)",
      "Bash(git blame:*)",
      "Bash(git diff:*)",
      "mcp__semgrep__*"
    ],
    "ask": [
      "Bash(npx semgrep:*)"
    ],
    "deny": [
      "Read(./.env)",
      "Read(./.env.*)",
      "Read(./**/secrets/**)",
      "Edit(**)",
      "Write(**)",
      "Bash(git push:*)",
      "Bash(git commit:*)",
      "Bash(npm:*)",
      "Bash(rm:*)",
      "Bash(curl:*)"
    ]
  },
  "env": {
    "NODE_ENV": "development"
  },
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [{ "type": "command", "command": ".claude/hooks/secret-scan.sh" }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Read|Grep",
        "hooks": [{ "type": "command", "command": ".claude/hooks/semgrep-diff.sh" }]
      }
    ],
    "Stop": [
      {
        "hooks": [{ "type": "command", "command": ".claude/hooks/severity-report.sh" }]
      }
    ]
  }
}

allow is everything an auditor legitimately needs: read the world, grep the world, walk git history, call Semgrep tools. Nothing in here can mutate the target.
ask is the one heavier action, a raw Semgrep CLI run, which I want to eyeball before it chews through a huge tree. The MCP path is pre-approved; the shell path is gated.
deny is the cage. Edits, writes, commits, pushes, package installs, deletes, network calls, and any read of env or secrets. If a single rule in this file fails closed, it should be these.

.mcp.json: Semgrep is the muscle, the rest is context

Four servers, no more. Filesystem and GitHub give the model the code and the issue history. Sentry is underrated for security work: a stack trace from a real production crash often points straight at an unhandled, attacker-reachable path that a static scan ranks as low. And Semgrep is the workhorse. The Semgrep MCP lets the model run rules and read structured results directly, instead of me piping CLI output into a prompt and hoping it parses.

.mcp.json

{
  "mcpServers": {
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
    },
    "github": {
      "type": "http",
      "url": "https://api.githubcopilot.com/mcp/"
    },
    "sentry": {
      "type": "http",
      "url": "https://mcp.sentry.dev/mcp"
    },
    "semgrep": {
      "type": "stdio",
      "command": "uvx",
      "args": ["semgrep-mcp"],
      "env": { "SEMGREP_RULES": "p/owasp-top-ten p/secrets p/javascript" }
    }
  }
}

Why the OWASP and secrets rulesets

I pin p/owasp-top-ten, p/secrets, and p/javascript on the Semgrep MCP so the scan starts from a known, auditable baseline rather than whatever 'auto' decides today. Reproducible scans matter when you have to defend a finding three weeks later.

punkpeye/awesome-mcp-serversWhere I vet new MCP servers before they touch an audit. I read the source of anything that gets file or network access. An MCP server is just code running with your permissions.github.com60k+Effective harnesses for long-running agentsAnthropic's writeup on building reliable harnesses for agents you leave running. The framing that convinced me to invest in the cage before the cleverness.anthropic.com

Claude Code Hooks explained in 5 minutes· IndyDevDan

Subagents: scanner, adversary, patch author

Three subagents, each its own isolated context so noise from one does not poison the others. The sast-scanner runs the rules and collects raw candidates. The threat-modeler is the one that earns its keep: it takes those candidates and tries to actually exploit them, dropping anything it cannot reach. The fix-author writes a proposed patch as a diff for the survivors. Note the handoff: scanner finds, modeler proves, fix-author drafts, and none of them can apply anything.

sast-scanner (Opus): drives the Semgrep MCP, collects hits, and does a first grep-based pass for the patterns rules miss. Output is a flat candidate list, nothing ranked yet.
threat-modeler (Opus): the adversary. Traces source to sink, writes the exploit, drops the unreachable, and ranks survivors by reachability times impact. This is the gate.
fix-author (Opus): proposes a minimal patch as a diff with a note on why it closes the path. Never applies it. The diff goes in the report for a human to take or leave.

Here is the threat-modeler verbatim. The system prompt is basically me yelling 'prove it' in a structured way.

.claude/agents/threat-modeler.md

---
name: threat-modeler
description: >
  Turns raw scanner hits into a ranked, exploitable threat model. Use after
  sast-scanner produces candidate findings, before anything is reported.
tools: Read, Grep, Glob, Bash
model: opus
---

You are an adversary. Your job is to decide which candidate findings are
real and which are noise, then rank the real ones.

For every candidate you receive:
1. Find the SOURCE. Where does attacker-controlled input enter? If you
   cannot trace tainted data from an HTTP boundary (or other untrusted
   input) to the sink, mark it UNREACHABLE and drop it. Do not report it.
2. Build the exploit. Write the actual payload or curl that triggers it.
   If you cannot write one, the severity is at most Low and you say why.
3. Score it. Severity is reachability x impact, not the rule's default.
   A reflected value in an admin-only page is not the same as an
   unauthenticated RCE. Say which.
4. Check the "mitigation". If the code claims to sanitize, read the
   sanitizer. Half the false negatives in this codebase are helpers that
   look safe and are not.

Output a ranked list, Critical first. One paragraph per finding: source,
sink, exploit, blast radius. Hand it to fix-author. You do NOT write fixes
and you do NOT touch files.

Create custom subagents - Claude Code DocsOfficial reference for the agent file format: frontmatter, per-agent tool allowlists, and isolated context windows. The tool allowlist is itself a security control; use it.code.claude.com

Hooks: the parts I refuse to leave to the model's good intentions

Rules in CLAUDE.md are advice. A motivated prompt injection can talk a model out of advice. Hooks are shell scripts the harness runs no matter what the model wants, so anything that is genuinely load-bearing for safety goes here. Three of them.

PreToolUse runs a secret scanner on every Bash call. In an audit the danger is reversed: I am not stopping the model from leaking my secrets to git, I am stopping it from exfiltrating the target's secrets at all. It blocks reads of env files and any outbound network primitive.
PostToolUse runs an incremental Semgrep over whatever region the model just read, so coverage tracks attention instead of one giant scan at the start that everyone forgets by turn forty.
Stop assembles the severity-ranked report. The session does not end with a vague summary; it ends with Critical-first findings, each with a source, a sink, and an exploit, or it does not end clean.

The blocking contract is the same one everyone gets wrong: a PreToolUse hook only stops the call on exit code 2. Exit 0 lets it run; any other code is treated as a soft error and the session keeps going. Test your blocking hooks before you trust your cage. Here is the real one.

.claude/hooks/secret-scan.sh

#!/usr/bin/env bash
# .claude/hooks/secret-scan.sh
# PreToolUse hook on Bash. Reads the tool call as JSON on stdin.
# Exit 2 => block the command, stderr is shown back to the model.
# In an AUDIT context the threat is partly the agent itself: it must not
# exfiltrate the secrets it is auditing. So we block on egress, not ingress.
set -euo pipefail

INPUT=$(cat)
COMMAND=$(printf '%s' "$INPUT" | jq -r '.tool_input.command // empty')

# Block any attempt to read or print an env / secret file.
if printf '%s' "$COMMAND" | grep -qiE '(cat|less|head|tail|rg|grep)[[:space:]].*(\.env|secrets/|id_rsa|\.pem)'; then
  echo "secret-scan: reading secret material is denied in audit mode" >&2
  exit 2
fi

# Block anything that looks like exfiltration over the network.
if printf '%s' "$COMMAND" | grep -qiE '(curl|wget|nc|scp|/dev/tcp)'; then
  echo "secret-scan: outbound network from the audit cell is denied" >&2
  exit 2
fi

exit 0

I learned this the annoying way

My first version of this hook used exit 1 to block. It did not block anything. The model happily ran the command, the session continued, and I only noticed because the next finding referenced a file I had explicitly denied. Exit 2, or it is decoration.

disler/claude-code-hooks-masteryEvery hook lifecycle event with security-focused examples. Where I nailed down the exit-code contract after the embarrassment above.github.com3k+

Claude Code Best Practices - The Ultimate Guide· IndyDevDan

What a real session looks like

I point it at a branch and let the scanner go, but the part I watch is the threat-modeler throwing things out. A scan that returns forty hits and gets cut to three real ones is the scan working correctly. Count is vanity. Reachability is the metric.

claude · payments-svc audit (plan mode)

$audit the feature/refunds branch for injection and authz gaps

→ sast-scanner (opus) via semgrep MCP

semgrep p/owasp-top-ten p/secrets p/javascript … 41 candidate hits

→ threat-modeler (opus): tracing source → sink, dropping unreachable

dropped 31 (no tainted path from an HTTP boundary)

dropped 7 (reachable but admin-gated, downgraded to Low, not reported)

Confirmed, ranked:

CRITICAL routes/refunds.ts:88 SQL injection via amount filter helper

HIGH routes/refunds.ts:142 IDOR: refund any order by guessing orderId

MEDIUM lib/log.ts:23 card last4 written to logs in error path

$show me the exploit for the critical

queryRefunds() concats `amount` straight into WHERE.

PoC: GET /refunds?amount=1)%20OR%201=1-- → dumps all refund rows

fix-author drafted a parameterized patch (diff in report). NOT applied.

claude · severity-ranked report (Stop hook)

AUDIT feature/refunds · 3 confirmed · 38 dropped as noise

── CRITICAL ──────────────────────────────────

SQLi routes/refunds.ts:88 exploit: ?amount=1)%20OR%201=1--

source: req.query.amount (unauthenticated) → sink: queryRefunds()

fix: parameterize via $1 (diff attached) status: PROPOSED, not applied

── HIGH ──────────────────────────────────────

IDOR routes/refunds.ts:142 no owner check on orderId

── action ────────────────────────────────────

→ human review required. agent has no write/commit/push permission.

The Stop hook output. Fixes are diffs in the report, never commits. The apply column is always a human.

What it caught, what it does not

I am not going to pretend this replaces a human pentest. It does not reason about business logic abuse the way a person does, and it will miss a multi-step auth bypass that needs three requests in the right order. But for the boring, dense, easy-to-miss class of bugs across a large tree, it is genuinely good, and the exploitability gate keeps the report short enough that I read all of it.

Bug class	How it does	Notes
SQL / NoSQL injection	Strong	Source-to-sink tracing is where Opus earns the cost
Hardcoded secrets / keys	Strong	Semgrep p/secrets plus the deny on env reads
IDOR / broken object authz	Good	Catches missing owner checks; misses subtle role logic
XSS (reflected / stored)	Good	Reads the sanitizer instead of trusting its name
Business-logic abuse	Weak	Still a human job; it does not model intent well
Race conditions / TOCTOU	Weak	Timing bugs need a human and often a runtime

The setup did not make Claude a better hacker. It made the report short enough to trust and impossible to act on without me. That is exactly the trade I want from a tool near my secrets.
my own audit retro

Take it, but keep the cage

Everything above ships in this build: the rules-of-engagement CLAUDE.md, the deny-heavy settings.json, the four MCP servers, the three subagents, and the three hooks. If you change one thing, do not change the deny list. The cleverness is optional. The cage is not. Adjust the rulesets and the target notes to your stack, then run it against a branch you do not mind reading hard truths about.

zsh · your repo

$npx setuproll add claude-code-security-audit

✓ wrote CLAUDE.md (audit rules of engagement), .mcp.json

✓ wrote .claude/settings.json (deny-heavy, read-only cage)

✓ wrote 3 subagents, 3 hooks

next: set SEMGREP_RULES for your stack, then run `claude` in plan mode

One last paranoid note

Run the audit against a checkout, not your working tree, and read the secret-scan hook before you trust it on a real secrets-bearing repo. A cage you did not test is a story you tell at the postmortem.

Claude Code Security Audit Cell

Install this build

Components

Model

MCP servers

Subagents

Hooks

Rules