I benchmarked Claude Code and OpenCode on a heavy refactor: The reality of agentic CLI workflows

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Check it out

Every week, someone posts something like: “We are so cooked. AI has taken our jobs.” And every other week, someone else replies, maybe on X, saying that the same agent deleted their .env file, with the caption “lol.”

Both reactions are understandable. Neither is enough evidence.

I benchmarked Claude Code and OpenCode on a heavy refactor The reality of agentic CLI workflows

To get a more useful answer, I benchmarked Claude Code and OpenCode on the same real refactor: migrating a deeply nested Next.js 15 and React 19 dashboard from prop drilling to Zustand. The goal was not to see which tool could write the prettiest code from a blank prompt. It was to see how each terminal coding agent handled a messy, multi-file migration with TypeScript, callback props, shared UI state, build verification, and enough component nesting to make the prop chain feel personal.

The result was not “AI replaces developers” or “AI agents are useless.” It was more practical: both tools completed the refactor, both hit the same environment issue, and one tool produced a cleaner first pass. More importantly, the mistakes showed what developers need to specify when using CLI agents for large refactors.

If you’re already experimenting with terminal agents, this benchmark pairs well with LogRocket’s guide to leveling up Claude Code workflows and its broader AI dev tool rankings.

What I benchmarked

The benchmark compared two agentic coding tools:

Tool	What it is	Why it matters for this test
Claude Code	Anthropic’s terminal-based agentic coding tool	Strong codebase navigation, conservative permissions, and tight integration with Claude models
OpenCode	An open source AI coding agent with terminal, IDE, and desktop surfaces	Provider flexibility, inspectability, and a terminal UI built around diffs and sessions

Claude Code is Anthropic’s official CLI coding agent. It can read and edit files, run shell commands, reason over a codebase, and use MCP-connected tools. It also defaults toward explicit approval for state-changing actions, which is useful when you want a coding assistant that does not silently rewrite half your repository.

OpenCode takes a different approach. It is open source, model-flexible, and designed to let developers connect different providers, including Claude, GPT, Gemini, and local models. LogRocket has covered this angle in more detail in its article on switching to OpenCode when security teams block proprietary coding tools.

For this benchmark, I used the same underlying model for both tools: Claude Opus 4.6. Anthropic positions Opus 4.6 as an agentic coding model with improved reliability on large codebases, so using it in both tools helped isolate the tool orchestration layer from the model layer.

That said, this is still a practical benchmark, not a scientific study. The results reflect one codebase, one refactor task, one machine, and one prompt. Treat the numbers as a useful field report, not a universal ranking.

The refactor task

The test app was a Next.js 15 dashboard using React 19, TypeScript, and Tailwind. The codebase was intentionally unpleasant, but not unrealistic: a project management dashboard with mock data, six levels of nested components, callback props traveling through components that did not use them, and shared UI state living too high in the tree.

The dashboard state lived in dashboard/page.tsx. From there, it flowed into DashboardHeader, Sidebar, ContentArea, ProjectList, ProjectCard, TaskRow, StatsFooter, and TaskModal.

Here was the component tree before the migration:

app/
├── layout.tsx
├── page.tsx
├── dashboard/
│   ├── page.tsx                   (owns all state)
│   └── components/
│       ├── DashboardHeader.tsx    (receives user, notifications, onMarkRead, sidebarCollapsed)
│       ├── Sidebar.tsx            (receives user, projects, selectedProjectId, onSelectProject,
│       │                           collapsed, onToggleCollapse, activeFilters, onFilterChange)
│       ├── ContentArea.tsx        (receives user, projects, tasks, selectedProjectId,
│       │                           activeFilters, onUpdateTask, onDeleteTask, onOpenTaskModal)
│       │   ├── ProjectList.tsx    (receives projects, selectedProjectId, tasks, user,
│       │   │                       activeFilters, onUpdateTask, onDeleteTask, onOpenTaskModal)
│       │   │   └── ProjectCard.tsx
│       │   │       └── TaskRow.tsx
│       │   └── StatsFooter.tsx    (receives projects, tasks, user)
│       └── TaskModal.tsx          (receives task, project, user, isOpen, onClose, onSave, projects)

ProjectCard alone received eight props, most of which it only passed down to TaskRow. That made the codebase a good candidate for a state management migration. Zustand was a reasonable choice because it lets React components consume state from a store without adding a provider wrapper, which makes it useful for cutting through deep prop chains. LogRocket’s Zustand adoption guide covers that tradeoff in more depth.

The refactor requirements were specific:

Create a Zustand store for user, project, and UI state
Replace prop drilling with direct store consumption
Remove dead props from every component interface
Keep TypeScript compiling with zero errors
Preserve existing behavior, including modal and task side effects

The prompt I used

I gave both tools the same prompt:

Refactor this Next.js 15 + React 19 app from prop drilling to Zustand.

Read the entire codebase first. Do not ask questions.

1. Create a Zustand store in src/store/ that consolidates user, project,
   and UI state currently being prop-drilled through the dashboard components.
2. Update every component that currently receives these props to consume
   from the store directly.
3. Remove all dead props and update TypeScript interfaces accordingly.
4. Run `npx tsc --noEmit` after every major change to verify type safety.
5. Run `npm run build` at the end to confirm the app compiles.

Maintain MISTAKES.md at the root. Log every error you encounter with:
file, what you did, the error, why it happened, how you fixed it.

The MISTAKES.md requirement turned out to be one of the most useful parts of the test. It forced both agents to leave a trail of wrong assumptions, failed commands, and recovery steps instead of silently patching over errors.

That matters because the value of a coding agent is not just whether it eventually passes the build. It is whether you can understand how it got there.

Claude Code’s attempt

Claude Code began by reading the codebase in dependency order. It opened page.tsx, traced imports, and worked through the child components. Its state audit was strong: it mapped shared state, identified where props were being passed through without being used, and separated true state mutations from callbacks that were only wiring.

That up-front analysis was exactly what I wanted. Claude Code did not immediately start writing files. It first established where the state lived and how it moved.

The Zustand store it created was clean and logically grouped:

// src/lib/store.ts
import { create } from 'zustand'

interface DashboardState {
  // User state
  user: User | null
  setUser: (user: User) => void

  // Project state
  projects: Project[]
  setProjects: (projects: Project[]) => void
  updateTask: (taskId: string, updates: Partial<Task>) => void
  deleteTask: (taskId: string) => void

  // UI state
  sidebarCollapsed: boolean
  toggleSidebar: () => void
  activeFilters: FilterState
  setFilters: (filters: FilterState) => void
  activeModal: { type: 'task' | null; data?: Task }
  openTaskModal: (task: Task) => void
  closeModal: () => void
}

This was the right shape for the migration. User state, project state, and UI state were grouped in a single dashboard store. The actions were explicit, and the modal state moved out of the prop chain.

But a good initial store is not the hard part of this refactor. The hard part is migrating every component without breaking parent-child contracts.

Mistake #1: Broken toolchain shims

Claude Code’s first failure was not actually a code failure:

$ npx tsc --noEmit
Cannot find module '../lib/tsc.js'

After installing Zustand, npm regenerated the .bin shims inside node_modules. The tsc wrapper resolved to the wrong path, looking for node_modules/lib/tsc.js instead of the TypeScript binary.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

The same issue affected the Next.js build command. Claude Code recovered by calling the binaries directly:

node node_modules/typescript/lib/tsc.js --noEmit
node node_modules/next/dist/bin/next build

This is the kind of boring environment issue that does not show up in polished demos but shows up all the time in real projects. The important part is that Claude Code correctly diagnosed it as tooling breakage rather than trying to “fix” unrelated TypeScript code.

Mistake #2: Refactoring top-down instead of bottom-up

The first real coding mistake was architectural. Claude Code started by editing the root DashboardPage, removing props from parent calls before the children had been updated to read from the Zustand store.

TypeScript immediately objected:

Type '{}' is missing the following properties from type 'SidebarProps':
user, projects, selectedProjectId, onSelectProject...

This is a common migration trap. In a prop-drilling-to-store refactor, the root component is the wrong place to start. The root owns the state, but the leaf components are the ones that need to stop depending on props first.

Claude Code had to reverse direction and refactor bottom-up:

TaskRow
→ ProjectCard
→ ProjectList
→ ContentArea
→ DashboardPage

That strategy worked. Each child moved to the store first. Only then did the parent stop passing the now-dead props.

This is the most useful lesson from Claude Code’s run: for state migrations, free the leaves before pruning the trunk.

Mistake #3: Incomplete prop cascade

The second code mistake followed from the first. Claude Code removed the user prop from ContentArea, but ContentArea was still passing user to StatsFooter, which had not been migrated yet.

TypeScript caught it:

Property 'user' is missing in type '{ projects: Project[]; tasks: Task[]; }'
but required in type 'StatsFooterProps'.

The fix was simple: migrate StatsFooter at the same time as the component passing data into it. But the failure is worth keeping because it shows why deeply nested component trees create hidden coupling. Props are not just values; they are contracts between files.

Claude Code’s final result:

Metric	Claude Code result
Total time	14 minutes
Final TypeScript errors	0
Build status	Passed
Mistakes logged	4
Environment issues	2
Code issues	2

OpenCode’s attempt

OpenCode approached the same task differently. It started by mapping the file tree first:

find . -name "*.tsx" -not -path "*/node_modules/*"

Then it pulled type information before making changes. This gave it a broader view of the component graph before it touched the code.

Its state audit was similar to Claude Code’s: shared state, local state, callback props, and UI state were identified correctly. It created the same general Zustand store shape and did not add unnecessary middleware, providers, or architectural flourishes.

Then, somewhat annoyingly for anyone who enjoys drama, it mostly just worked.

Metric	OpenCode result
Total time	7 minutes
Final TypeScript errors	0
Build status	Passed
Mistakes logged	2
Environment issues	2
Code issues	0

Benchmark results

Here is the side-by-side version:

Dimension	Claude Code	OpenCode
Underlying model	Claude Opus 4.6	Claude Opus 4.6
Task completed	Yes	Yes
Final TypeScript status	0 errors	0 errors
Final build status	Passed	Passed
Time to completion	14 minutes	7 minutes
Environment mistakes	2	2
Code mistakes	2	0
Main failure mode	Refactoring order and incomplete cascade	Toolchain shims only
Strongest behavior	Thorough reasoning and explicit side-effect awareness	Cleaner first-pass execution

OpenCode had the cleaner run in this benchmark. It finished faster and did not introduce code-level migration mistakes.

Claude Code’s run was still valuable. Its errors were readable, recoverable, and instructive. It also explicitly called out one behavior that mattered: preserving the handleSaveTask side effect that closes the modal after saving. OpenCode preserved the same behavior in the store, but it did not explain that decision as clearly.

That distinction matters. Passing the build is necessary, but it is not the whole story. In real refactors, developers also need to know which behavior was preserved deliberately and which behavior survived by accident.

What this benchmark actually shows

The benchmark does not prove that OpenCode is always better than Claude Code. It shows that, for this specific refactor, OpenCode’s orchestration produced a cleaner first pass.

More broadly, it shows three things about CLI coding agents:

The model is not the whole tool. Both agents used the same model, but they behaved differently.
Refactor order matters. Agents need migration strategy, not just task instructions.
Verification must be part of the prompt. Running TypeScript and the build after changes prevented both tools from drifting.

The environment issue is also important. Both tools hit the same broken npm shim problem. The lockfile did not care which agent was driving the terminal. Agentic coding still happens inside your actual project, with your actual dependency graph, your actual package manager state, and your actual weirdness.

That is why benchmark leaderboards are useful but incomplete. The better test is whether the agent can survive your repo.

How to prompt CLI agents for large refactors

The best prompts for CLI agents are not just task descriptions. They are operating procedures.

After running this benchmark, I would add four instructions to any large refactor prompt.

Enforce bottom-up refactoring

For prop-drilling migrations, do not let the agent start at the parent:

Refactor components bottom-up: start with the deepest leaf components
(TaskRow, StatsFooter), then work upward through ProjectCard,
ProjectList, ContentArea, and finally DashboardPage. Never modify
a parent component's props until ALL its children have been updated
to read from the store. This prevents cascade errors.

This one instruction would likely have prevented Claude Code’s main coding mistake.

Require explicit logging

OpenCode produced a cleaner log because the prompt forced failure tracking. I would make the logging even stricter:

After EVERY file you modify, immediately append to MISTAKES.md:
the file name, what you changed, and whether tsc passed or failed.
Log even if nothing went wrong. Write "No error" as the status.
This ensures the benchmark captures your full decision process,
not just failures.

A good MISTAKES.md file turns agent output into something you can audit.

Prevent scope creep

Agents love being helpful. Sometimes helpful means “I added a provider layer, middleware, and a clever abstraction you did not ask for.” Do not allow that unless you want it:

Do NOT add middleware, utilities, or patterns that do not exist in
the current codebase unless I explicitly ask for them. The goal
is a 1:1 migration of the state layer, not an improvement pass.

For Zustand specifically, this matters. Zustand can support middleware and slices, but a migration away from prop drilling does not automatically require persist, devtools, or a provider wrapper.

Protect callback side effects

Callback props are dangerous during refactors because they often carry behavior that is not visible from the child component.

Before removing any callback prop, trace its usage in the parent
component. If the callback triggers side effects such as analytics,
notifications, logging, modal state, or navigation, preserve those
side effects in the refactored version. Log any side effects you find
in MISTAKES.md before proceeding.

This was the most important safeguard in the benchmark. A refactor can compile and still be wrong if it drops a side effect.

When to use Claude Code vs. OpenCode

Based on this benchmark and a few smaller follow-up tests, I would not treat these tools as interchangeable. They overlap, but they feel different in practice.

Use case	Better fit	Why
Conservative refactors in conventional Next.js projects	Claude Code	Strong codebase reasoning and conservative file/command permissions
Open source or compliance-sensitive tool evaluation	OpenCode	Source-auditable and provider-flexible
Read-only planning before a risky migration	OpenCode	Strong planning workflow and useful TUI for inspecting diffs
Execution where you want Claude-specific workflows	Claude Code	Tight integration with Claude models, slash commands, and Claude Code conventions
Testing multiple model/provider strategies	OpenCode	Supports broader provider flexibility
Learning from the agent’s mistakes	Either	Only if you force explicit logging and verification

The safest workflow may be using both: run OpenCode in planning mode to map the refactor, then use Claude Code or OpenCode for execution depending on the project’s security, model, and workflow constraints.

A reusable CLI agent prompt for state refactors

Here is the prompt I would use for future Next.js state management migrations:

You are performing a state management refactor on a Next.js 15 + React 19 + TypeScript codebase.

## Context
This app currently uses prop drilling for shared dashboard state. Your job is to migrate that shared state to Zustand. The codebase uses the App Router with a mix of Server and Client Components.

## Rules

1. Read the entire codebase before making any changes. Map out the component tree and identify every prop that represents shared state vs. component-specific props. Log this map in MISTAKES.md under "## State Audit".

2. Before installing dependencies, verify:
- The package manager by reading package.json and lock files
- The tsconfig.json path aliases
- The React version and Zustand compatibility
- The existing build and typecheck commands

3. Create the Zustand store first. Do not touch any components until the store file compiles cleanly.

4. Refactor components bottom-up:
- Start with the deepest leaf components
- Move upward one component at a time
- Do not remove a parent prop until all children that depended on it have been updated

5. After each component:
- Run `npx tsc --noEmit`
- If it fails, fix the error before moving on
- Log the error in MISTAKES.md with the file, change, error, cause, fix, and category

6. Before removing any callback prop:
- Open the parent component
- Check whether the callback triggers side effects
- Preserve side effects such as analytics, toasts, logging, modal state, or navigation
- Document what you found in MISTAKES.md

7. Do not:
- Add middleware unless requested
- Create provider wrappers unless Zustand is being used with an explicit per-request or dependency-injection pattern
- Convert Server Components to Client Components unless the component needs client-side interactivity
- Run commands in parallel

8. After all components are refactored:
- Run the project’s build command
- Run `npx tsc --noEmit` one final time
- Summarize the full refactor in MISTAKES.md under "## Summary"

Begin by reading the project structure and producing the State Audit.

This prompt is longer than the original, but that is the point. Large refactors fail when the agent has freedom in places where you actually need discipline.

Final takeaways

This benchmark changed how I think about terminal coding agents.

Claude Code was not bad because it made mistakes. The mistakes were understandable, recoverable, and useful. OpenCode was not magically perfect because it passed cleanly. It still depended on the same model, the same project environment, and the same verification loop.

The real lesson is that agentic CLI tools are not replacements for engineering judgment. They are accelerators for developers who can specify the migration strategy, identify the risk areas, and recognize when a passing build is not enough.

The developers who get the most out of these tools will not be the ones who refuse to use them, and they will not be the ones who trust them blindly. They will be the ones who know enough architecture to tell the agent where to start, where to stop, and what not to break.

Keep the agent. Keep the compiler. Keep MISTAKES.md.

That is where the useful work happens.

#ai
#claude

A deep dive into React Fiber

Discover how React Fiber works under the hood. Learn how React builds the DOM, handles concurrent rendering, and works alongside React 19 features and the new React Compiler.

Karthik Kalyanaraman

Jul 30, 2026 ⋅ 23 min read

Skybridge: Build ChatGPT apps and MCP connectors

Learn how to use Skybridge, an open-source React framework, to build and deploy cross-platform AI apps and interactive UI widgets for ChatGPT, Claude, and MCP clients from a single codebase.

Emmanuel John

Jul 30, 2026 ⋅ 7 min read

Getting started with Meilisearch: A complete guide

Learn how to set up Meilisearch, index documents, and build keyword, semantic, and hybrid search with AI-powered retrieval.

Michiel Mulders

Jul 28, 2026 ⋅ 7 min read

pnpm vs. npm: Which package manager should you use?

Compare pnpm and npm across security defaults, disk usage, dependency strictness, and workspace policy to decide which package manager fits your project.

Chinwike Maduabuchi

Jul 28, 2026 ⋅ 10 min read

View all posts

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →