On December 19, 2025, Cursor acquired Graphite for more than $290 million. CEO Michael Truell framed the move simply: code review is taking up a growing share of developer time as the time spent writing code keeps shrinking. The message is clear. AI coding tools have largely solved the generation speed. Now the industry is betting that review is the next constraint to break.
But that raises a more interesting question. Does AI-generated code actually require a different kind of review? The data suggests the bottleneck is real. A 2025 study by CodeRabbit found that AI-written code surfaces 1.7× more issues than human-written code, and nearly half of developers say debugging AI output takes longer than fixing code written by people. Still, those are aggregate numbers. They don’t tell you what actually changes when you’re staring at a pull request.
So I tested it. I built the same features twice: once by hand using a standard Python workflow, and once using Claude Code with natural language prompts. The point wasn’t to prove that AI code is worse. It was to see how review patterns shift, and whether the claimed bottleneck shows up in practice. What I found does confirm the review problem is real, just not in the ways I expected.
In this article, we’ll explore what actually changes during code review when AI enters the workflow, how review questions shift from correctness to necessity, and why faster code generation can still slow teams down if review practices don’t evolve with it.
The Replay is a weekly newsletter for dev and engineering leaders.
Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.
I picked two tasks that developers actually do. One involved building a REST API endpoint with validation, and the other is refactoring error handling in existing code. Same requirements for both approaches of manual coding versus Claude Code generated from prompts.
Claude Code is Anthropic’s command-line tool that takes natural language instructions and writes code autonomously. It reads your codebase, runs tests, and handles git operations. I gave both approaches the same specs. First, implement a user registration endpoint with JSON validation. Then, add error handling to a file-processing function. Each run started from a fresh directory with an identical context.
I measured the outcomes across a few dimensions: final line count, structural patterns, which edge cases were covered, and how much time a reviewer would realistically need to approve the pull request.
The REST API endpoint was an immediate surprise. My manual implementation came in at 29 lines of code. Claude Code’s version weighed in at 186. That’s 6.4× more code for the exact same requirements.
I expected bloat: verbose variable names, over-commenting, maybe some redundant logic. What I actually got was something else entirely. My human-written version did the basics. It checked that required fields existed, validated the email with a simple regex, and returned errors when something went wrong. You can see the full implementation on GitHub. Here’s the validation logic:
# Human: Direct validation in endpoint
if not email or not name:
return jsonify({'error': 'email and name are required'}), 400
if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
return jsonify({'error': 'Invalid email format'}), 400
Claude Code pulled every validation step into its own function, complete with type hints, docstrings, and edge-case handling I probably would’ve added later, if at all. Email validation turned into a dedicated function, annotated with notes about RFC 5322 compliance. Name validation enforced length limits and character constraints. The full implementation is on GitHub.
On top of that, there was a third function whose only job was to orchestrate the validation flow, stitching those pieces together in a way that was explicit, readable, and hard to misuse.
# AI: Extracted validation with extensive checks
def validate_name(name: str) -> Tuple[bool, str]:
"""Validate user name with length and character constraints."""
if len(name) < 2:
return False, "Name must be at least 2 characters long"
if len(name) > 100:
return False, "Name must not exceed 100 characters"
if not re.match(r"^[a-zA-Z\s'-]+$", name):
return False, "Name contains invalid characters"
return True, ""
The difference isn’t just about volume. It’s about where the work shows up. My version optimizes for the happy path with some basic error handling. The AI version front-loads everything. It accounts for inputs I didn’t think about yet: names with apostrophes, excessive lengths, and odd characters.
Reviewing it took longer – around eight to twelve minutes versus maybe three for my code, but not because anything was wrong. The time went into a different question set. I wasn’t asking what you miss. I was asking if all of this is necessary right now?
That’s the shift I didn’t expect. More code, yes – but more importantly, different review questions.
The error-handling refactor made the pattern even harder to ignore. I started from a 16-line naive implementation that simply read JSON files. My manual refactor added 10 lines, a 62 percent increase. Claude Code added 272 lines. That’s a 1,700 percent jump.
My approach was straightforward. I wrapped the existing logic in a try–except block that handled three cases: FileNotFoundError, JSONDecodeError, and a final catch-all. Each one logged the issue and moved on to the next file.
# Human: Targeted exception handling
try:
with open(path, 'r') as f:
data = json.load(f)
results.append({...})
except FileNotFoundError:
logger.warning(f"File not found: {path}")
except json.JSONDecodeError as e:
logger.error(f"Invalid JSON in {path}: {e}")
Claude Code went the opposite direction. It split the logic into five separate functions. One validated that file paths actually exist. Another handled JSON loading, including encoding errors. A third extracted user data with per-record validation. You can see the full implementation on GitHub.
The main function became an orchestrator, coordinating all of this while tracking errors end-to-end. Along the way, it introduced a custom exception class with structured context, layered in input validation, added type hints everywhere, and logged each stage in detail.
# AI: Multi-layer error handling with custom exception
def load_json_file(file_path: str) -> Optional[Dict[str, Any]]:
try:
with open(file_path, 'r', encoding='utf-8') as f:
data = json.load(f)
if not isinstance(data, dict):
raise FileProcessingError(file_path, "Not a JSON object", None)
return data
except FileNotFoundError as e:
raise FileProcessingError(file_path, "File not found", e)
except PermissionError as e:
raise FileProcessingError(file_path, "Permission denied", e)
# ...five more exception types
The AI version handled edge cases I hadn’t even considered: encoding errors, permission issues, and malformed records. But while reviewing it, my questions shifted. I wasn’t asking what’s missing. I was asking things like: Do we really need a custom exception class? Is per-user value validation necessary here? Can the error tracking be simplified?
That’s the difference. My version would get feedback like, should we handle permission errors separately? The AI version gets a different kind of pushback: Is this overkill?
Reviewing this took fifteen to twenty minutes, compared to four or five for my refactor. Not because the code was wrong, but because the review itself changed. I wasn’t checking correctness. I was judging necessity.
The AI is systematically defensive. I’m selectively defensive. Both approaches work. One just takes a lot more time to verify.
Here’s what reviewing AI code actually feels like: you’re not checking whether it works. You’re checking whether it’s over-engineered.
My API endpoint took a reviewer maybe three minutes. Claude Code’s 186-line version took eight to twelve. Not because anything was broken, but because the questions changed. Does the RFC 5322–style email validation break on emoji? Is a 100-character name limit going to cause internationalization issues later? You’re validating comprehensiveness, not correctness.
That lines up with what we’re seeing across the industry. A 2025 study found senior engineers spend an average of 4.3 minutes reviewing AI-generated suggestions, compared to 1.2 minutes for human-written code. At the same time, teams using AI heavily are shipping far more pull requests. Faros AI analyzed data from more than 10,000 developers and found a 98 percent increase in PR volume. The result: PR review time went up 91 percent, even though code generation itself got faster.
This shift hits senior engineers hardest. They’re the ones who’ve debugged production issues at 3 a.m. and know that passing tests doesn’t mean code survives reality. A survey by Qodo found that 68 percent of senior engineers report quality improvements from AI, but only 26 percent would ship AI-generated code without review. When Claude Code spits out five helper functions for error handling, a senior doesn’t skim. They evaluate each one, thinking through failure modes the model has never experienced.
Teams that adapt well change how review works. They use AI for first-pass checks and basic issue detection, then route the code to humans for judgment calls. That split actually works. AI catches the obvious problems. Seniors focus on the things that require experience, like security boundaries and architectural fit. But it only works if the process changes with it. Otherwise, you end up with burned-out seniors spending four minutes per AI suggestion while juniors merge code they don’t fully understand.
The Cursor–Graphite acquisition clicks into place once you see this play out in practice. When an AI can generate 6.4× more code for a simple API endpoint, or tack 272 lines onto a 16-line function, someone still has to read it. That someone becomes the bottleneck. And yes, the bottleneck is real.
Code review in the age of AI is a different job than code review in the age of humans. You’re not validating correctness. You’re judging necessity. Does this abstraction earn its keep? Is this edge case worth the complexity? Would we actually want to own this much defensive code six months from now?
The practical takeaway is straightforward. If your team adopts AI coding tools without restructuring how code review works, expect slower releases, not faster ones. Track review cycle time now, before AI reshapes it. Otherwise, you’ll be left wondering why productivity dropped even though everyone is writing code faster–right around the time tools like Cursor and Graphite start to feel less like accelerators and more like mirrors.

When security policies block cloud AI tools entirely, OpenCode with local models offers a compliant alternative.

A practical guide to React Router v7 that walks through declarative routing, nested layouts, dynamic routes, navigation, and protecting routes in modern React applications.

TanStack AI vs. Vercel AI SDK for React: compare isomorphic tools, type safety, and portability to pick the right SDK for production.

Handle user authentication with React Router v7, with a practical look at protected routes, two-factor authentication, and modern routing patterns.
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up now