Exploring Agent Browser: AI agents on the web

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Check it out

If you’ve tried using AI agents to automate browser tasks, you’ve likely hit the same wall.

You point an agent at a page, it pulls in the entire raw DOM, and suddenly you’re burning thousands of tokens just to locate something simple like a login button. Then the site updates, a class name changes, and your selector breaks.

Tools like Playwright were built for traditional test automation. They work well there, but they fall short for agents that need to understand page structure, make decisions, and adapt on the fly.

That’s where Agent Browser, built by Vercel Labs, takes a different approach. Instead of feeding the raw DOM into the agent, it works off the accessibility tree, strips out the noise, and returns a clean, compact set of element references.

So the agent doesn’t have to parse HTML at all. It just uses something like click @E_1 to interact with the page.

In this guide, we’ll walk through how Agent Browser works, how to get it running, and how to plug it into your AI coding workflow for both automation and exploratory testing.

🚀 Sign up for The Replay newsletter

The Replay is a weekly newsletter for dev and engineering leaders.

Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.

Prerequisites

Before we get started, make sure you have the following:

Node.js 18+ (for running npm install) or macOS with Homebrew
An AI coding tool like Claude Code, Cursor, or GitHub Copilot
Basic familiarity with the command line

What does Agent Browser entail?

Agent Browser is a headless browser automation CLI built specifically for AI agents. Agent Browser takes a snapshot of a page’s accessibility tree and condenses all elements into deterministic references, stable identifiers like @E_1, @E_2, and so on.

Instead of this:

<div class="auth-form__container">
  <input id="login_field" class="form-control input-sm" type="text" ...>
</div>

Your agent receives this:

@E_1 - text input, label: "Username"
@E_2 - password input, label: "Password"
@E_3 - button, name: "Sign in"

Although the Agent Browser CLI uses Playwright under the hood, it takes a different approach to the token burn problem by working with the accessibility tree instead of the raw DOM.

This strips away unnecessary noise and surfaces only what the agent actually needs.

Big reduction in AI token usage

One of the biggest advantages of Agent Browser CLI is how much it cuts down token usage compared to traditional approaches.

Tools like Playwright send the entire page structure to the LLM, on a modern React or Next.js app, which can easily mean thousands of tokens for a single page, just to find a few elements.

Agent Browser takes a different route. According to the Vercel Labs team, it can reduce token usage by up to 97% per request.

Over 200k developers use LogRocket to create better digital experiences

Learn more →

What this really means:

Lower cost per automation run
Faster LLM responses
Cleaner context, so the agent stays focused on the task

Features of the Agent Browser CLI

Agent browser comes with the following features:

Multi-session support

Agent Browser supports isolated browser sessions using the --session flag.

Each session maintains its own cookies, storage, and history.

With this, you can run multiple browsers in parallel without interfering with each other.

For example, you can open two sessions simultaneously like so:

agent-browser open <https://example.com> --session=userA
agent-browser open <https://example.com> --session=userB

Headed mode

By default, Agent Browser runs headlessly.

If, in your development workflow, you need to see what’s happening in the browser while building or debugging, you can add the --headed flag:

agent-browser open <https://example.com> --headed

This will render the full browser UI.

Getting the Agent Browser CLI up and running

Getting Agent Browser installed is straightforward. You can use npm or Homebrew if you’re on macOS and Linux.

For a global installation via npm, run:

npm install -g agent-browser

Or, if you’re on macOS and prefer Homebrew:

brew install agent-browser

Once the CLI is installed, you need to download the headless browser it uses internally:

agent-browser install

If the installation was successful, you should see a screen similar to the one below:

Let’s run a quick test to verify everything is working. This command will open a page and print its title:

agent-browser open <https://www.vercel.com>

If you see the page title printed in the terminal, you’re good to go.

A look at a typical Agent Browser workflow

Agent Browser breaks down browser automation into three deterministic steps:

Navigate
Snapshot
Interact

Let’s walk through each one.

Navigate

It starts by opening the target URL. This initializes a headless session that stays active in the background.

agent-browser open <https://github.com>

This command sets the stage, creating an isolated context for the agent to work in.

Take a snapshot

Instead of returning raw HTML, it returns a snapshot of interactive elements:

agent-browser snapshot -i

We used the -i flag to indicate we only want to receive interactive elements.

Let’s check the output:

Instead of a list of HTML tags, we received a list of references to interactive elements on the page.

These identifiers are our bridge between the agent and the browser.

Interact

At this point, the agent doesn’t need to guess if the selector is #login_field or .input-block. It simply uses the reference ID.

Let’s tell the AI agent what to do on the site.

First, we can ask it to fill in a username:

agent-browser fill @E_1 "my-username"

We then ask it to fill in a password:

agent-browser fill @E_2 "my-password"

We then ask it to click the submit button:

agent-browser click @E_3

Using these references makes the agent’s logic robust.

Even if the underlying CSS classes change during a site update, the reference to the Sign in button remains stable within that session.

Semantic locators

Snapshots are great for discovering what’s on a page. But sometimes you already know what you’re looking for. That’s where semantic locators come in.

Instead of relying on CSS selectors like .btn-primary-v2 or div > input:nth-child(3), Agent Browser CLI lets you describe elements by their purpose.

For instance:

agent-browser find label "Email" fill "[email protected]"

Why this matters for AI

Since AI agents are good at understanding the intent of an element rather than its implementation details, this approach makes your browser automation self-healing.

For instance, as long as a button remains a submit button, your agent won’t care if the underlying CSS classes or IDs change.

For example, you could ask the Agent Browser CLI to fill a form like so:

agent-browser find label="Email address" fill "[email protected]"

Empowering your AI agent with skills

To make agent browser adaptation to your workflow easier, it comes with a couple of skills that teach AI coding agents how to work with it.

For this demo, we will use it for the following:

General browser Automation
Exploratory testing

To follow along, go ahead and install the skills accordingly:

For general browser automation, run this command:

npx skills add vercel-labs/agent-browser --skill agent-browser

This works with the following AI agents out of the box.

Amp
Antigravity
Cline
Codex
Cursor
Deep Agents
Firebender
Gemini CLI
GitHub Copilot
Kimi Code CLI
OpenCode
Warp

If you need it to work with other AI agents, you can just select additional agents before completing the installation.

You will see a similar screen to this:

For Exploratory testing, install the following skills:

npx skills add vercel-labs/agent-browser --skill dogfood

The dogfood skill methodically inspects a web application, identifies issues, and delivers a report that includes complete reproduction evidence for each issue.

In the next section, we will go into how to use this in your software development workflow.

Integration with your coding workflow

For this demonstration, I will be using GitHub Copilot.

Once the skill is active, you don’t need to write out Agent Browser commands manually. You can simply describe the task in plain language: “Open the login page and sign in as a test user.”

Behind the scenes, your agent will:

Execute agent-browser open <URL>.
Run agent-browser snapshot -i to “see” the page.
Identify the correct @ref.
Interact (click/type) to navigate.

This allows your agent to verify its own work in real-time.

For example, after deploying a feature, it can open the site and check if the new feature is there and functional. Let’s get down to it.

General browser automation

We will be using the agent-browser skill we installed earlier for this. Go ahead and test out the login feature you just built.

Here’s the prompt I gave GitHub Copilot:

Use agent browser and test out the app. Log in as a user and try adding things to the cart and checking out

Check out the result below:

It signed in as a demo user, added items, and saw correct cart totals.

It was finally noticed that checkout is not implemented.

Exploratory testing

I have a React project I created for this demonstration. I intentionally planted some bugs in the form validation, specifically, the Cart page order summary applies tax twice.

For this, we will use the dog food skill we installed earlier.

Using the dog food skill, I asked the agent to:

Use the dogfood skill and navigate to the Cart page order summary, and check if the order summary adds up correctly

The agent navigated to the Cart page order summary and checked if the order summary added up correctly, and used agent-browser dogfood skill to surface the bug.

It even created a report.md file on the order summary bug and the steps I can use to recreate the error.

Check the output below:

Agent Browser vs. Playwright CLI

A few months after Agent Browser launched, Playwright released its own CLI positioned for similar use cases. Like Agent Browser, it condenses the browser’s HTML into a more structured format that AI agents can parse, returning a YAML file with tagged elements for the agent to reference.

Here’s how the tools compare:

Here is how the Agent Browser CLI compares to similar tools.

Agent Browser	Chrome DevTools MCP	Playwright
AI Agents (Automation)	AI Agents (Debugging)	Human Developers
Low token usage	Moderate token usage	High token usage
Stable `@ref` Identifiers	JSON-RPC	Complex CSS Selectors

While Playwright remains the industry standard for traditional E2E testing, and Chrome DevTools MCP is excellent for live debugging and profiling, Agent Browser is the only tool built from the ground up for and specifically for AI Agents.

When to use Agent Browser

Not every automation task calls for Agent Browser. Here’s a quick reference to help you decide:

Scenario	Use Agent Browser?	Why?
Complex React/Next.js Apps	Yes	Handles dynamic DOM changes and provides stable `@ref` references.
Simple Static Site Scraping	Maybe	Simple `fetch` and `cheerio` could be faster.
Performance Profiling / Memory Leaks	No	Better to use Chrome DevTools MCP for deep performance metrics.
Cross-Browser Testing	Limited	Primarily focused on Chromium based browsers.
AI Agents need to see	Yes	High reduction in token usage compared to raw HTML.

Conclusion

The web was never built for AI agents. Agent Browser CLI closes that gap by translating user interfaces into something agents can actually work with.

It tackles the two biggest pain points in browser automation: token bloat and fragile selectors, all within a compact CLI.

In this guide, we walked through how to navigate pages, read structured snapshots, interact using stable references, and plug everything into an AI coding assistant using skills.

Whether you’re running automated regression tests, scraping pages, or letting an agent explore a UI, Agent Browser CLI gives your agents a reliable set of eyes on the web.

Using Next.js security headers to strengthen app security

Using security headers in your Next.js apps is a highly effective way to secure websites from common security threats.

Kingsley Ubah

May 4, 2026 ⋅ 11 min read

AI dev tool power rankings & comparison [May 2026]

A deep dive into April 2026’s AI model and tool rankings. We break down performance, usability, pricing, and real-world capabilities across 50+ features to help you pick the right tools for your development workflow.