Most developers using AI coding tools are not failing because the tools are bad. They’re failing because they’re treating AI like a smarter autocomplete, rather than a junior developer that needs project context, clear instructions, and enforceable guardrails.
Over the past year, I’ve shipped multiple full-stack applications using Claude Code as my primary development tool. At some point, I became curious about the quality difference between my pre-AI projects and my AI-assisted ones, so I asked Claude to audit both.
The result was uncomfortable but useful. The pre-AI code was more consistent: better naming, tighter types, fewer shortcuts, and stronger architectural continuity. The AI-assisted code shipped faster, but it drifted more. I found inconsistent patterns across files, any types sneaking in, security shortcuts I would not have made manually, and implementation choices that optimized for “make it work” instead of “make it right.”
That gap is what led me to build a governance system for AI-assisted development. Not a better prompt. Not a reusable template. A full system of agents, hooks, protocols, and project conventions that keeps AI-generated code aligned with the standards I actually want.
In this article, I’ll walk through what an AI coding governance system is, why prompts are not enough, and how to build one around the mistakes your AI tooling actually makes.
The Replay is a weekly newsletter for dev and engineering leaders.
Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.
When AI-generated code disappoints, the first instinct is to write a better prompt. Add more detail. Be more specific. Include more constraints.
That works for isolated tasks, but it breaks down at project scale.
Every conversation with an AI coding tool starts with limited context. The prompt you wrote yesterday is gone. The pattern you corrected three sessions ago is forgotten. The security rule you explained last week is not automatically loaded.
A prompt is a one-shot instruction. A governance system is persistent infrastructure.
A governance system loads the same project rules into every session, blocks unsafe or inconsistent code before it lands, and helps file 200 follow the same standards as file one.
The difference becomes clear when you give the same prompt to the same model in two different environments.
Here’s the task:
Add a contact form with email validation and send the data to our API.
Without project rules, review steps, or commit-time checks, Claude Code might produce something like this:
// app/components/ContactForm.tsx
export default function ContactForm() {
const [formData, setFormData] = useState<any>({});
const [status, setStatus] = useState("");
const handleSubmit = async (e: any) => {
e.preventDefault();
try {
const res = await fetch("/api/contact", {
method: "POST",
body: JSON.stringify(formData),
});
const data = await res.json();
console.log("Response:", data);
setStatus("sent");
} catch (err) {
console.log("Error:", err);
setStatus("error");
}
};
return (
<form onSubmit={handleSubmit}>
<input
type="email"
onChange={(e) => setFormData({ ...formData, email: e.target.value })}
/>
<textarea
onChange={(e) => setFormData({ ...formData, message: e.target.value })}
/>
<button type="submit">Send</button>
</form>
);
}
The form works, but several issues shipped with it:
| Issue | Why it matters |
|---|---|
any types |
Weakens type safety and lets implementation errors pass unnoticed |
console.log in production code |
Leaks debugging behavior into production |
Missing Content-Type header |
Makes the request less explicit and can break backend parsing |
| No client-side validation before submission | Pushes avoidable errors to the API |
| No typed state or event handling | Makes the component harder to maintain |
| No loading state | Creates poor UX during submission |
| Different component convention | Drifts from the project’s existing style |
None of these mistakes are surprising. They are the kind of implementation drift that happens when an AI model is asked to complete a task without persistent project standards.
Now take the same prompt and run it inside a governed system.
Before writing code, the task is routed through a planner that reads the project conventions, checks the relevant files, and defines the scope. An executor builds against that plan. Pre-commit hooks run before anything is committed.
The result looks more like this:
// app/components/contact-form.tsx
import { useState } from "react";
import type { FormEvent } from "react";
interface ContactFormData {
email: string;
message: string;
}
interface ContactFormProps {
onSuccess?: () => void;
}
const ContactForm = ({ onSuccess }: ContactFormProps) => {
const [formData, setFormData] = useState<ContactFormData>({
email: "",
message: "",
});
const [status, setStatus] = useState<
"idle" | "submitting" | "sent" | "error"
>("idle");
const handleSubmit = async (e: FormEvent<HTMLFormElement>) => {
e.preventDefault();
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
if (!emailRegex.test(formData.email)) {
setStatus("error");
return;
}
setStatus("submitting");
const response = await fetch("/api/contact", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(formData),
});
if (!response.ok) {
setStatus("error");
return;
}
setStatus("sent");
onSuccess?.();
};
return (
<form onSubmit={handleSubmit}>
<input
type="email"
required
value={formData.email}
onChange={(e) =>
setFormData((prev) => ({ ...prev, email: e.target.value }))
}
/>
<textarea
required
value={formData.message}
onChange={(e) =>
setFormData((prev) => ({ ...prev, message: e.target.value }))
}
/>
<button type="submit" disabled={status === "submitting"}>
{status === "submitting" ? "Sending..." : "Send"}
</button>
</form>
);
};
export default ContactForm;
The prompt did not change. The system did.
This version follows the project’s conventions: no any types, no production console.log, typed state, typed events, client-side validation, a clear loading state, and the correct component style.
That is the value of governance. It does not depend on the AI remembering your preferences. It gives the AI a system that makes those preferences unavoidable.
A governance system is a set of configuration files, project conventions, specialized agents, and automated checks that help AI coding tools produce consistent, high-quality code in a specific codebase.
Think of it like onboarding a new developer. You would not hand them a Jira ticket and say “go.” You would give them:
AI needs the same context. The difference is that a human developer builds that context over weeks or months. An AI starts fresh every session, so you need to externalize that context into files it can read and checks it cannot skip.
In my system, that breaks down into four layers:
| Governance layer | Purpose | Best for |
|---|---|---|
| Convention files | Give the AI persistent project context | Style, architecture, security, and testing rules |
| Specialized agents | Split planning, implementation, and review into distinct roles | Complex or multi-step development work |
| Pre-commit hooks | Enforce rules when the AI ignores them | Type safety, secrets, logs, tests, and scope control |
| Protocols | Govern how agents make decisions | Ambiguous requirements, risk checks, and scope changes |
You do not need every layer on day one. Start with the lightest version that fixes your most common AI coding failures, then build from there.
The foundation of my system is a project-level instruction file. In Claude Code, that file is CLAUDE.md, a Markdown file at the root of the project that loads into every conversation automatically.
That automatic loading is important. It means the AI gets the same baseline rules every time, without you having to paste them into each prompt.
The file should stay lean. Every line costs tokens, so CLAUDE.md should act more like a routing table than a complete handbook.
## Overview Video analysis SaaS. React Router 7 framework mode, Supabase, Drizzle ORM. ## Core rules - TypeScript strict, no `any` - Const arrow functions for React components - Max file length: 300 lines. Extract when approaching. - Every API endpoint must verify auth and data ownership. ## Documentation rules/architecture.md rules/api-patterns.md rules/security.md
This is enough to orient the AI without overloading the context window.
The detailed rules live in separate files that the AI can read when needed. Your architecture patterns, API examples, security checks, and testing conventions should not clog every interaction. They should be available on demand.
That distinction matters. If you put 300 lines of API rules, security guidance, testing standards, and component conventions into every prompt, you waste context on information that may not apply to the current task. Worse, you risk pushing more relevant details out of the model’s working context.
A good convention file should answer three questions quickly:
A convention file gives the AI project context. Specialized agents go a step further by giving different AI roles different responsibilities.
Instead of asking one general-purpose AI agent to plan, implement, review, and commit everything, you define narrower roles with explicit boundaries.
In my framework, I use five agents, but three are the most important:
| Agent | Role | Constraint |
|---|---|---|
| Planner | Defines the implementation plan before code is written | Cannot write code |
| Executor | Builds against the approved plan | Cannot change the plan |
| Security reviewer | Reviews for security issues after implementation | Runs automatically |
The planner acts like a senior architect. It reads the project conventions, checks which files already exist, and produces a scoped plan with specific files to modify and testable acceptance criteria. It plans backward from what must be true when the task is complete, rather than forward from a loose list of things to try.
The executor acts like a senior developer. It reads the plan, follows the relevant convention files, modifies only the files in scope, and commits atomically per task.
The security reviewer runs after each build phase. It checks for insecure direct object references, missing auth, unvalidated input, hardcoded secrets, and other common security problems. This agent exists because AI coding tools are optimized to make code work, not necessarily to make code safe.
The key is separation of concerns. The planner should not write code. The executor should not quietly expand the scope. The reviewer should run whether you remember to ask for a review or not.
You do not need five agents to start. Even one planning agent that forces the AI to reason before it writes code can significantly reduce implementation drift.
Conventions tell the AI what to do. Hooks block the commit when it does not listen.
Here is what my pre-commit hook checks on every commit:
# Hardcoded secrets such as AWS keys, API tokens, and Stripe keys # -> BLOCKS the commit # console.log in production code # -> WARNING # TypeScript any type # -> WARNING # .env files being committed # -> BLOCKS the commit # Commit scope greater than 15 files # -> WARNING # Test suite # -> BLOCKS the commit if tests fail
These checks catch the exact problems from the earlier contact form example: any types, production logs, missing test coverage, and unsafe files entering the repository.
Hooks do not replace conventions. The goal is still for the AI to write correct code the first time because the conventions told it how. Hooks are the safety net for when it fails.
They also create a feedback loop. When the hook fails during the same session, the AI can inspect the failure, fix the code, and learn what the project actually enforces.
Protocols define how agents should behave when a task is unclear, risky, or out of scope. They govern decision-making, not code style.
The most useful protocol I’ve built is a confidence gate. Before any agent proceeds, it scores the task across four dimensions:
| Dimension | Question |
|---|---|
| Scope | Is it clear what to deliver? |
| Target | Do I understand the relevant part of the codebase? |
| Output | Can I define testable acceptance criteria? |
| Risk | Have I identified what could go wrong? |
If any dimension scores below 0.7, the agent stops and asks questions instead of guessing.
This single protocol eliminated the most expensive category of AI mistake in my workflow: confidently building the wrong thing.
Another useful protocol is a deviation rule. It tells the AI what to do when reality does not match the plan:
| Situation | Action |
|---|---|
| Minor bug found | Fix it and log it |
| Missing import or type | Add it if obvious and log it |
| Architecture change needed | Stop and ask the human |
| Out-of-scope idea | Write it down for later, but do not build it |
Without rules like these, AI agents will make architectural decisions on your behalf or add features nobody asked for. Protocols keep them in their lane.
The best governance system is built from your own mistakes.
Do not copy someone else’s framework wholesale. Do not start with 50 rules because a blog post told you to. Start building with your AI coding tool, then pay attention to what goes wrong repeatedly.
The pattern is simple:
My security reviewer exists because I am weaker at backend security. It is not there because security reviews are generally a good idea, though they are. It is there because the AI was shipping insecure patterns I did not always catch manually.
If you are strong at security but weaker at CSS, your governance system should look different. You might need stronger UI conventions, component examples, accessibility checks, or visual regression tests.
My confidence gate exists because the AI kept building the wrong thing when requirements were ambiguous. If you always write extremely clear specs, that protocol may matter less for your workflow.
That is why I do not recommend starting with a prebuilt framework. Start with a lean CLAUDE.md, build for a week, and let friction guide the system. Within a few days, you will usually have conventions that cover most recurring issues.
You can build a governance system gradually. The goal is not to design the perfect system upfront; it is to turn repeated AI mistakes into reusable infrastructure.
CLAUDE.mdStart with the smallest useful version:
## Overview [Your stack in two to three lines] ## Core rules [Five to 10 non-negotiable conventions] ## Rules rules/architecture.md rules/patterns.md rules/security.md
Only include rules you actually care about. Good candidates are things you have already corrected more than once.
Each time the AI makes the same mistake more than once, add the correction to the relevant file:
| Repeated issue | Governance response |
|---|---|
AI keeps using any for API responses |
Add typed response patterns to rules/patterns.md |
| AI forgets auth checks on new endpoints | Add explicit examples to rules/security.md |
| Components keep getting too large | Add extraction rules to rules/architecture.md |
| AI mixes naming conventions | Add filename and component examples to rules/patterns.md |
This keeps the system grounded in actual work instead of theoretical best practices.
Once you know which rules the AI keeps breaking despite the conventions, add enforcement.
Useful early hooks include:
.env filesconsole.log outside test filesany in production TypeScript codeStart with warnings where false positives are likely. Use blocking checks for high-risk issues, such as secrets, broken tests, and unsafe environment files.
Specialized agents and protocols are most valuable once your conventions have stabilized.
Add them when you start noticing higher-level failure patterns:
| Failure pattern | Governance upgrade |
|---|---|
| AI starts building before understanding | Add a planner agent |
| AI changes too many files at once | Add commit-scope rules |
| AI makes architectural decisions silently | Add deviation protocols |
| AI misses security review | Add a security reviewer |
| AI implements vague requirements incorrectly | Add a confidence gate |
At this point, your system is no longer just a set of coding rules. It becomes a lightweight development process around AI-assisted work.
A good governance system compensates for two things: your gaps and the AI’s tendencies.
If you are a frontend developer who is weaker at backend security, your system should include strong security rules and automated checks. If you are a backend developer who struggles with UI consistency, your system should include component conventions, accessibility rules, and design-system examples.
You should also encode rules that counter common AI coding behaviors. AI models often:
any to make TypeScript compileDo not encode things the AI already handles well. You probably do not need 30 rules about basic syntax. Focus on the gap between what the AI produces by default and what you would accept in a serious code review.
Governance systems can rot.
You add a rule after every bad experience, and six months later you have a bloated configuration that wastes tokens, contradicts itself, and makes the AI less effective.
I run a quarterly audit on every component of the system. For each rule, agent, hook, and protocol, I ask:
I also keep a list of ideas I considered and rejected, with the reason documented. For example, “meta-agents that create other agents” sounded appealing, but I rejected it because it added complexity without measurable value for solo work.
That decision is written down, so I do not re-debate the same idea every quarter.
The system should grow when you hit real problems and shrink when pieces stop earning their keep. Pruning is as important as building.
The uncomfortable truth is that a governance system does work that senior engineers usually do through code reviews, mentorship, and institutional knowledge.
When I compared my projects, the quality gap was not just about the AI model or prompt quality. It was about whether the AI had access to the same context that makes senior developers consistent: coding standards, architecture decisions, security patterns, naming conventions, and the “this is how we do things here” knowledge that takes months to absorb on a team.
A governance system externalizes that knowledge into files, automation, and review steps. The AI can read it every session instead of slowly learning it over time.
That has two practical implications:
| Developer context | Governance benefit |
|---|---|
| Solo developers | Maintain consistency across larger codebases without a full review team |
| Engineering teams | Encode repeated review feedback into files and automation |
| AI-heavy workflows | Reduce drift as more code is generated by tools |
| Security-sensitive projects | Add checks where AI tools commonly take shortcuts |
The teams that figure this out first will not ship better software because AI is doing their job for them. They will ship better software because they have learned how to direct AI with the same rigor they apply to human development workflows.
AI-assisted development does not remove the need for engineering judgment. It makes that judgment more important. The difference is that the best judgment now gets encoded into the system around the AI, not repeated manually every time the model forgets.

A step-by-step guide to building your first MCP server using Node.js, covering core concepts, tool design, and upgrading from file storage to MySQL.

Using security headers in your Next.js apps is a highly effective way to secure websites from common security threats.

A deep dive into May 2026’s AI model and tool rankings. We break down performance, usability, pricing, and real-world capabilities across 50+ features to help you pick the right tools for your development workflow.

A practical guide to Agent Browser CLI. Learn how AI agents navigate, snapshot, and interact with web pages using stable references, enabling efficient automation and exploratory testing.
Would you be interested in joining LogRocket's developer community?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up now