I gave Claude Code four jobs and finally stopped doing all of them
How a planner/reviewer subagent loop, Postgres + Playwright MCP, and three boring hooks turned my Next.js app into something I can leave running.
Okay so I need to tell you about the week my whole job changed. I run a tiny Next.js plus Postgres SaaS by myself, and for two years my real bottleneck was not ideas, it was that I am one person with one brain doing planning, coding, reviewing my own bad decisions, and then testing the thing at 1am. I was the entire team. And the entire team was tired!
This is the setup I landed on. I run it on Claude Opus 4.8, I split the work across four subagents, I wired five MCP servers so the model can actually touch my database and my browser, and I added three hooks so I can leave a session running without babysitting it. I have it saved on Setuproll as Claude Code Full-Stack Prime. Below is the real config, not a sanitized version.
The rules file that does the most work
My CLAUDE.md is short on purpose. Every line in it is a rule I wished I had had on some specific bad day. Four rules, that is it. The plan-before-editing one alone has saved me from so many sprawling rewrites that I would tattoo it on the repo if I could.
# Working agreement
- Plan before editing more than 3 files. Show me the plan first.
- Never commit without passing tests. No exceptions, no "I'll fix it after".
- Prefer existing utils over new deps. Search the repo before reaching for npm.
- Small reviewable commits. One concern per commit.
## Stack
- Next.js (App Router) + Postgres
- Playwright for end-to-end checks
- Sentry for runtime errorsPeople always want my CLAUDE.md to be longer. It used to be! It was a wall of text and the model skimmed it like I skim terms of service. Cutting it down to four rules it actually follows beat a 200-line manifesto it half-ignored. Less is genuinely more here, I was shocked.
Four subagents, because one model wearing four hats is a mess
Here is the part I am emotional about. I split the loop into a planner, an implementer, a reviewer, and a tester. The planner writes the approach. The implementer writes the code. The reviewer reads the diff cold, like it has never seen the plan, and the tester runs the actual end-to-end pass. The reviewer catches design mistakes before they become code, and that is the difference between me trusting this thing and not.
---
name: reviewer
description: Reviews diffs for correctness and design before they land. Use after the implementer finishes a change.
tools: Read, Grep, Glob, Bash(git diff:*)
---
You are a skeptical senior reviewer. You did NOT write this code.
Read the diff with fresh eyes. Look for:
- logic that contradicts the original plan
- N+1 queries against Postgres, missing indexes
- error paths that swallow Sentry-worthy failures
- new dependencies that duplicate an existing util
Be specific. Quote the line. If it's fine, say so in one sentence and stop.
Do not rewrite the code yourself; report findings to the implementer.MCP: letting it see the database and the browser
This is where it stopped being a chatbot and became a coworker. Five MCP servers: filesystem and github you would expect, but the postgres, playwright, and sentry ones are the magic. It can run a query to check a migration actually did what it claimed. It can open the page in a real browser and confirm the button is there. And when something throws in production, it reads the Sentry event instead of me copy-pasting a stack trace like a caveman.
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./"]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" }
},
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": { "DATABASE_URL": "${DATABASE_URL_READONLY}" }
},
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
},
"sentry": {
"command": "npx",
"args": ["-y", "@sentry/mcp-server"],
"env": { "SENTRY_AUTH_TOKEN": "${SENTRY_AUTH_TOKEN}" }
}
}
}Three hooks so I can actually walk away
Hooks are the unglamorous part that nobody posts screenshots of, and they are the reason I sleep. The model does not have to remember to scan for secrets, format the code, or run typecheck. The harness does it every single time, deterministically, whether the model is feeling thorough or not.
PreToolUse: secret-scanruns before any write. If a key or token shows up in a diff, the write is blocked. Hard stop.PostToolUse: prettier + eslint --fixformats and lints right after each edit, so I never review a diff full of whitespace noise.Stop: run typecheckruns tsc when the session ends. If it does not compile, I know before I even look.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Write|Edit",
"hooks": [
{ "type": "command", "command": ".claude/hooks/secret-scan.sh" }
]
}
],
"PostToolUse": [
{
"matcher": "Write|Edit",
"hooks": [
{ "type": "command", "command": "npx prettier --write "$CLAUDE_FILE" && npx eslint --fix "$CLAUDE_FILE"" }
]
}
],
"Stop": [
{
"hooks": [
{ "type": "command", "command": "npx tsc --noEmit" }
]
}
]
}
}
5:42What it actually costs me
I track this stuff because I am a founder and credits are real money. Here are the numbers from my own runs, not a benchmark someone else ran on a clean repo.
| Metric | Value | How I feel about it |
|---|---|---|
| Avg response | 2.4s | Fine. Thinking time is the cost of fewer mistakes. |
| Cost per task | $0.52 | Cheaper than the bug it would have caused. |
| Test pass rate | 91% | The 9% is usually me writing a vague prompt. |
| Setuproll score | 94 (S tier) | I will take it. |
One real opinion before I wrap up: do not start with the four-agent loop. Start with one model, one good CLAUDE.md, and the secret-scan hook. Live with it for a week. Add the reviewer when you feel the pain of reviewing your own AI's code at midnight. The setup should grow out of your actual frustration, not out of a blog post, including this one.
Steal my setup
That is the whole build. Opus 4.8, four subagents, five MCP servers, three hooks, four rules. If you run a small product alone, this is the closest thing to a team I have found that does not require a payroll. Install it and tweak the rules to match your own bad days.
npx setuproll add claude-code-fullstack-primeThen go read the official subagent docs and tune the reviewer to be meaner. Mine started polite and useless. It is much better now that I let it hurt my feelings a little.