Good Validation Enables Good Agents
Last week I started working on a small personal blog website, and I figured if agents these days are good enough to build the entire Cursor documentation website, they surely have to be good enough to write a simple MDX TypeScript blog.
Table of Contents 1
The Experiment
I wanted to test the ideas from my previous post on RSVG in practice. The theory is straightforward: deterministic validation provides clearer learning signals than front-loaded instructions. It also enables scaling and autonomy. So I knew from the start that I wanted to have validation as a central focus.
For the blog I picked Astro, a framework I hadn’t used before. I’d be learning alongside the agents. Same starting point, same documentation to read.
The goal was simple: build a blog that both Claude Code and Cursor could contribute to without me constantly correcting the same mistakes. If an agent wrote an invalid component, I wanted that caught by a validator, not by me noticing it three commits later.
The Problem With Multiple Tools
One problem I was facing is that I currently use quite a lot of different coding tools simultaneously. I use Claude Code as my main agent. For all UI/UX related tasks, I tend to shift to Cursor as I love their browser integration, and I still prefer a UI-based interface over a terminal when it comes to structuring a project. Finally, I lean towards Amp for research and planning tasks, and yes, sometimes I even edit files myself. Each of these workflows has different hook mechanisms I needed to map to my RSVG system.2Claude Code has PreToolUse, PostToolUse, and Stop hooks. Cursor has beforeShellExecution and afterFileEdit. Git has pre-commit hooks (via lefthook). GitHub Actions runs on PR.
When I add a new custom content rule validator, I have to remember to add it everywhere. That’s fragile.
What I wanted was a single source of truth for what constitutes valid code, with adapters that translate that into whatever format each tool expects.
Gates as Unified Checkpoints
The solution I landed on: use the gates as unified abstractions. A gate is a moment in the development lifecycle where certain rules must pass. The gate doesn’t care whether it’s being triggered by Claude Code or a git commit; it just runs the validators.
Gate flow in the development lifecycle
For now, I work with these 5 gates, each with a clear purpose:
| Gate | Trigger |
|---|---|
on-pre-tool-use | Before shell commands |
on-save | After file edit |
on-stop | Agent finishes turn |
on-commit | Git pre-commit |
on-pr | GitHub Actions |
So again, these are the moments in the development cycle where I validate rules. To orchestrate the validation in a unified way, I need to decide what validation to run at what gate. For now, this is the division of tasks roughly:
| Gate | Trigger | What it checks |
|---|---|---|
on-pre-tool-use | Before shell commands | Block unwanted terminal commands |
on-save | After file edit | Prettier formatting |
on-stop | Agent finishes turn | Content, structure, styling |
on-commit | Git pre-commit | All rules + scoped test |
on-pr | GitHub Actions | All rules + all tests + build |
When you take into account only the mental model described in my previous blog, you could argue: why not do all validation at the earliest gate possible? This would create the smallest feedback loop and steer the agent most effectively. There are multiple reasons not to want that though.
First and foremost: validation comes at a price, and that price is time. A full test suite or typecheck can consume quite a bit of time, which would make iterating code far slower: after each small iteration, we’d have to wait for validation to run.
Secondly, by design, some errors are impractical to validate on file-level saves. If you refactor a component to be typed stricter and as a consequence introduce a type error somewhere else, that shouldn’t block you from saving the refactor. It can still be an improvement to the code, even though it temporarily introduces a type error. You haven’t committed it anywhere; it’s just a local change.
All of these reasons are of course logical and not new at all. They just show that we have to choose a good tradeoff between the amount of validation we do versus the time those validations take on each gate.
For now I settled roughly on the following set up:
| Gate | Trigger | Budget | What it checks |
|---|---|---|---|
on-pre-tool-use | Before shell commands | instant | Block unwanted terminal commands |
on-save | After file edit | < 1s | Prettier formatting (not blocking) |
on-stop | Agent finishes turn | < ~10s | Content, structure, styling rules |
on-commit | Git pre-commit | < ~15s | All rules + scoped test |
on-pr | GitHub Actions | ~2min | All rules + all tests + build |
Of course this is hugely dependent on the size and type of project you are working on.3This setup is not optimized for speed. In larger projects you’d parallelize validators more aggressively. Here I prioritized the agent feedback loop over raw performance.
Rules as the Single Source of Truth
One thing I really find useful in general, but especially when working with agents, is precisely defining requirements and conventions for anything being built. Rules are a nice way of doing that, as they can be matched easily to a validator. This is basically what linters have been doing for their entire existence.
At the same time, I don’t want to document things in multiple places; keeping docs up to date is a task that’s easily overlooked and annoying to do. Therefore I made the rules part of the validation code itself.
The next decision I had to make was how to handle tool-based validation, for example a linting rule. If I were to formulate them all one by one, I’d just be creating an administration of things that are already clear from looking at the config files of those tools. So I grouped rules into two types: tool-based rules and custom rules. For each tool-based rule, I formulated it as an overarching one: “the configuration of tool X needs to be obeyed.”
I created a small rules schema with an ID, description, scope (glob pattern), type, and an explanation. The explanation helps identify why a specific rule exists if I ever run into violating it. It helps me decide whether to refactor the code, make an exception, or loosen the rule. This could have been a code comment, but by making it part of the schema it was easier to enforce consistently.
Every rule lives in scripts/rules/index.ts:
"CON-01": {
id: "CON-01",
type: "custom",
description: "Frontmatter must include title, description, and pubDate",
scope: ["src/content/**/*.mdx"],
explanation: "These three fields are required for proper indexing, rendering and visualizing."
} The Orchestrator
I then created a single orchestrator that would run these rules. The reason not to import rules directly in the gate files was that I preferred the gates to be small and clear. I wanted to be able to just see the definitions of which rules run at a gate. All performance optimization, IO logic, etc. made sense to separate from the core logic of what we check where. Besides that, whenever I build a new gate, it can utilize the same orchestrator.
const result = await orchestrate({
rulePatterns: ["CON-*", "STR-*", "STY-*"],
}); As you may have guessed, the rulePatterns correspond with categories of rules. I ended up with 37 implemented rules across five categories:
- COD (Code): No browser APIs in Astro frontmatter, explicit return types, formatting
- CON (Content): Required frontmatter, SEO constraints, heading structure
- STR (Structure): Import boundaries, no cross-feature dependencies
- STY (Styling): Tailwind-only in components, no inline styles in pages
- GIT (Git/Deploy): Use bun, build must pass
Each category enforces different boundaries. The structure rules, for example, prevent src/lib from importing anything framework-specific, keeping utilities portable. The rules are very opinionated, and this clarity keeps things consistent, especially with agents involved.
Transformers for Each Hook System
Each gate script has a transformers.ts file that converts the gate output into the format each tool expects.
For on-pre-tool-use, the transformer produces different shapes depending on who’s calling:
// Claude Code format
export function toClaudePreToolUse(output: GateOutput) {
if (output.passed) {
return {
hookSpecificOutput: {
hookEventName: "PreToolUse",
permissionDecision: "allow",
},
};
}
return {
hookSpecificOutput: {
hookEventName: "PreToolUse",
permissionDecision: "deny",
permissionDecisionReason: output.reason,
},
};
}
// Cursor format
export function toCursorBeforeShellExecution(output: GateOutput) {
if (output.passed) {
return { permission: "allow" };
}
return {
permission: "deny",
user_message: `Command blocked: ${output.reason}`,
agent_message: output.reason, // This feeds back to the agent
};
} Same gate logic, different output shapes. When I fix a bug in the validation, it’s fixed for all tools at once.
It’s quicker this way too. Each modification in a config file requires you to restart the IDE or coding agent before changes are applied, but this unified layer works immediately.
The agent_message field in Cursor is a clear example of learning through the agent loop. Whatever you put there flows back into the agent’s context. If I block an npm install command, the agent learns why and (usually) switches to bun install on the next attempt.
The orchestrator:
- Filters rules by the provided patterns
- Runs all matching custom validators in parallel
- Collects results into a single output
Running validators in parallel matters because it improves speed. We want to run as many rules as the machine can handle in parallel. Serial execution would be noticeably slower, and the more rules we can validate within our budget, the better our agent feedback loop is.
Tool-based rules (prettier, eslint, tsc) aren’t run by the orchestrator. Those are heavy processes that the gate scripts invoke directly.4No point parallelizing bun run typecheck; it already uses all available cores. I did find myself sometimes writing a quick check for a rule violation that would also be caught by a tool-based validation. The reason is to build a quick feedback signal earlier in the development lifecycle to steer the agent in the right direction.
What Actually Happens
Let me walk through a concrete scenario. An agent writes a new MDX post with this frontmatter:
---
title: "Some Very Long Title That Definitely Exceeds Sixty Characters And Keeps Going"
description: "Short."
pubDate: 2025-01-05
--- Two problems: title exceeds 60 characters (CON-14), description is under 120 characters (CON-04).
If the agent is using Claude Code, the Stop hook fires when it tries to finish. This hook triggers a bash command configured in .claude/settings.json:
{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "bun scripts/gates/on-stop/index.ts",
"timeout": 60
}
]
}
]
}
} The gate script runs the orchestrator and formats the output:
export async function run(): Promise<GateOutputWithResults> {
const start = Date.now();
const result = await orchestrate({
rulePatterns: ["CON-*", "STR-*", "STY-*"],
});
const failures = result.results.filter((r) => !r.passed);
return {
gate: "on-stop",
passed: result.passed,
duration: Date.now() - start,
violationCount: failures.length,
results: result.results,
};
} The transformer converts this into Claude’s expected format, blocking the stop if validation fails:
export function toClaudeStopHook(output: GateOutputWithResults) {
if (output.passed) {
return {};
}
const failures = output.results.filter((r) => !r.passed);
const reason = failures
.map((f) => `[${f.ruleId}] ${f.message}${f.file ? ` (${f.file}:${f.line ?? 1})` : ""}`)
.join("\n");
return {
decision: "block",
reason: `Validation errors found:\n\n${reason}\n\nPlease fix these issues before completing.`,
};
} This results in a simple feedback message to the agent:
Validation errors found:
[CON-14] Title too long: 78 chars (max 60) (src/content/writings/some-post.mdx)
[CON-04] Description too short: 6 chars (min 120) (src/content/writings/some-post.mdx)
Please fix these issues before completing.
Claude can’t mark the task as done until it fixes these. It goes back, adjusts the frontmatter, tries again. The validators pass. Task completes.
This is the fail-fast approach in action. I didn’t tell Claude “remember to keep titles under 60 characters.” I let it write whatever it wanted, and a deterministic validator told it exactly what was wrong. The feedback is precise, not probabilistic.
Pushing Failure Left
The gate timeline isn’t arbitrary. Each gate is positioned to catch errors as early as possible while respecting time budgets. Building an agent harness largely comes down to orchestrating validation. Each gate has its own needs for handling not only the validation itself, but also the mapping to agent configuration and maximizing information with the fewest tokens.
Consider formatting. The on-save gate runs prettier immediately after every file edit. This mimics the behavior of many IDEs, but after a file edit, it doesn’t make sense to block the change anymore; in fact, it’s impossible; the change already happened. Besides that, formatting isn’t something that should block the save, so it fails silently.
To me, the on-stop gate is the most useful location to tighten the agent loop. It runs after file edits but before the agent returns its work to you, allowing the agent some autonomy as long as its final output aligns with the rules. Unfortunately, I couldn’t find a good Cursor equivalent for this hook.
The leftward push has limits. Type checking is expensive, especially because it’s not a job you can split easily. A modification to type strictness in one file can cause errors in an untouched file, so you quickly end up needing to typecheck all code. For now I only run it at commit time, not on every save.
There’s room for improvement with state-based validation: an early gate like on-save could kick off a time-consuming validation in the background, while the results would be presented to the agent on stop. This introduces the need to track whether the save being checked is the last one, since a subsequent save can fix an error from an earlier one, and we only want to feed back current issues. I left this optimization out of the first version.
What I’ve Learned
I only started this weekend, so this list will grow, but 2 things already are clear to me:
Hard boundaries compound. Whenever I introduce a hard boundary or constraint, I know all future agent requests will be slightly better handled. Each validator I add is a correction I’ll never have to make again. Over time, the accumulated rules create an increasingly capable agent environment.
Rule definition forces clarity. The act of writing a rule (with an ID, description, scope, and explanation) forces you to think precisely about what you actually want. “No inline styles in pages” sounds simple until you have to decide: what about CSS variables? What about dynamic values? Writing the rule makes implicit conventions explicit.
The Setup Cost
This wasn’t free to build. The validation infrastructure is around 1,500 lines of TypeScript across rules, validators, gates, and transformers. For a personal blog, that’s arguably over-engineered.
But the investment pays off in two ways. First, I’m not repeating corrections. Every time I would have said “use bun, not npm” is now handled by a validator. Second, agent sessions are more autonomous. I can give Claude Code a task and trust it to produce output that is usable.
Closing Thoughts
The core idea is simple: good validation enables good agents. But “good validation” means more than just having tests. It means:
- Unified gates that work across all your tools
- Clear rules and sometimes defining rules precisely is hard
- Left-pushed validation with clear, concise feedback
- Deterministic checks that enable scaling
The agents aren’t magic, but the feedback loop is quite powerful. It provides deterministic validation, precise feedback, frees up context window space, and allows for simpler prompting, a productivity boost I initially overlooked.
I’m still iterating on this. Some validators are too strict, some not strict enough, and many are still missing. The point isn’t to have a complete setup from the start. The point is to make the setup a little better each time you find yourself providing feedback to an agent, instead of just steering it in the right direction for that specific case.
The validation code is part of this website’s repository on GitHub.
- Jan Willem
Related articles
- 5m
My Agent Rules
Guidelines for structuring and operating codebases to maximize safe autonomy, high-quality output, and efficient use of agent context. Read → - 13m
My Principles for Successful Coding Agents
A mental model for working with coding agents: Rules, Scopes, Validators, and Gates (RSVG). How to use deterministic checkpoints to guide probabilistic agents. Read →