← Bread

How AI systems actually browse the web

There are many distinct ways AI systems interact with websites. Where a system sits on the spectrum determines whether your site sees a visit, a server log entry, or nothing at all.

April 2026

AI search and AI agents

Marketers tend to use "AI search" and "AI agents" as separate categories. The distinction is useful but artificial, and it probably won't last.

The most common definition of an AI agent is simply an LLM that can call tools in an environment. By that definition, ChatGPT performing a web search already qualifies. Deep research, which autonomously browses hundreds of pages over minutes, almost certainly qualifies. Most people wouldn't call a standard ChatGPT conversation "using an agent," but the boundary is a matter of convention, not architecture.

What will drive adoption is utility. The agents that are the most capable will be the ones that get built into the platforms people already use. From the user's perspective, the experience doesn't change: you type into a chat box. No new behavior is required. The shift from chat to agent happens on the backend, invisibly. We wrote about the implications of that convergence in who controls the timeline.

The useful dividing line is not "search vs agent" but how the system actually touches your website. Some systems never touch it at all. Some fetch raw HTML. Some operate full browsers. Some call APIs directly. Each approach produces a different footprint in your analytics, your server logs, and your conversion funnels. We tested each of the major systems to understand exactly what they do. The details below reflect what we observed.

Cached vs live

Consumer AI chat platforms in their default modes (ChatGPT, Claude, Gemini, Perplexity) serve responses from either the model's training data or a retrieval cache built by periodic crawling. In both cases, no request reaches your site at query time. The mechanics of how caching works and why it makes AI traffic invisible to analytics are covered in detail in why your organic traffic is down.

When we tested the consumer chat platforms directly, we confirmed that ChatGPT, Claude, and Perplexity all crawl and cache pages with their respective crawler user-agents (ChatGPT-User, ClaudeBot, PerplexityBot), but there is no guaranteed correspondence between a crawler visit and any specific user's chat session. The crawl and the query are decoupled.

Codex is a notable case. When we ran Codex against live URLs, we observed no outbound network traffic at all. The one exception was when Codex called curl on a URL directly in Bash, but its default behavior during web searches and page retrieval used an internal fetch function with no outbound traffic. It appears to read pages entirely from cache. Your site never sees a request.

Even the distinction between cached and live access is not stable. Claude Code can spawn a sub-agent to do deep research that fetches pages live. Both Claude's and ChatGPT's deep research features primarily pull from a cache, but Claude Code in agent mode bypasses that cache entirely. The same underlying model can operate in either mode depending on the product it's embedded in and how the developer configured it.

Raw HTTP fetching

The simplest form of live interaction is HTTP fetching. The agent issues an HTTP request, receives raw HTML, converts it to markdown, and reads the markdown. No browser engine is involved. JavaScript does not execute. Cookies are not set. Analytics tags do not fire.

Claude Code and Claude Cowork (in its default mode) work this way when they access the web. The pipeline is the same in each case: fetch HTML, convert to markdown, read the markdown.

Whether you can identify these visits depends on how the agent was installed. In our testing, Claude Code installed natively passes a Claude-User user-agent string that identifies it clearly. Claude Code running inside Cursor or the Claude desktop app passes the same identifiable header. But Claude Code installed via Homebrew passes a generic axios user-agent. Axios is one of the most common HTTP client libraries in JavaScript, so the request blends in with a large volume of routine Node.js traffic rather than standing out as an AI agent. Same agent, same behavior, different header, depending entirely on the installation method.

Search APIs like Exa and Brave return URLs and snippets in response to a query, but they do not return full page content. The typical pattern is two steps: the agent queries a search API to find relevant URLs, then does a separate HTTP fetch for each page it wants to read in full. OpenClaw, for example, can be configured to use Brave or Exa for search, then fetch pages separately.

Browser-based agents

A large and growing category of agents operate real browsers. But "browser-based" is not one thing. These agents differ along two dimensions that matter for what your site sees: how the agent reads the page, and how the agent controls the page. The specifics vary more than most coverage suggests. We ran each of the major browser agents against instrumented test pages and observed significant differences.

Reading the page. There are two approaches, often combined.

The first is DOM and accessibility tree reading. These systems typically use both: the accessibility tree for layout and identifying interactive elements (buttons, inputs, links), and parsed DOM content for the actual text on the page, stripped of HTML clutter. The accessibility tree is the same data structure screen readers use. It is fast to parse, cheap in tokens, and reliable for standard page elements. The agent can identify "the button labeled Submit" or "the input field named email" by reference, without needing to interpret pixels or read the full HTML. Browser automation frameworks like Playwright and Puppeteer provide DOM and accessibility tree access; agents built on top of them, like Browser Use, use these capabilities alongside screenshots in a hybrid approach. Claude in Chrome similarly combines DOM reading with screenshots.

The second is screenshot-based vision. The agent takes a screenshot of the rendered page and sends it to a vision-language model. The model interprets what is on screen: where buttons are, what text says, what the layout looks like. The advantage is generality. Vision works on any page, any application, any interface, with zero cooperation from the site. The cost is speed and expense: every action requires a full vision model inference.

In practice, most browser-based agents today use both, but the balance differs significantly. ChatGPT Agent Mode is essentially purely pixel-based: it takes screenshots and generates coordinate-based mouse clicks, with minimal DOM reading. Both Manus and Claude in Chrome use a hybrid of screenshots and DOM reading. Perplexity Comet and ChatGPT Atlas are also hybrid, but their behavior depends on context (see below).

A note on Perplexity Comet and ChatGPT Atlas. Both products have two modes of operation that read pages differently. When you open the side panel and chat with the AI about the content of a page, both Comet and Atlas read strictly from the DOM. No screenshots. When you have them take actions on the page, both are capable of reading with DOM and screenshots, and can interact via DOM node references and coordinates. Both appear to prefer DOM-based interaction when possible, with screenshots and coordinate-based clicks used as supplementary fallbacks.

Controlling the page. There are also two approaches.

Programmatic commands by element reference. "Click the button labeled Submit." "Type 'hello' into the email field." Browser automation frameworks like Playwright and Puppeteer offer this approach, and agents built on them (like Browser Use) use it as their primary interaction method. The agent identifies the target element by name, role, or selector and issues a precise command. Fast, deterministic, repeatable. Breaks if the element isn't in the accessibility tree or the page structure is non-standard. (Playwright and Puppeteer also support positioning the mouse by pixel coordinates, so they are not limited to element-reference interaction.)

Vision-based mouse and keyboard at pixel coordinates. "Click at (342, 718)." "Type this string." ChatGPT Agent Mode generates coordinate-based clicks from screenshots. Manus uses both: pixel-coordinate clicks and DOM-event clicks depending on context. Perplexity Comet and ChatGPT Atlas can also use coordinate-based clicks when needed, though both prefer DOM-based interaction when possible. The screenshot-interpret-act loop repeats: screenshot, interpret, act, screenshot. Works on anything visible on screen. Slow because each step is a full round trip through a vision model.

Where the browser runs also matters. ChatGPT Agent Mode runs its browser in an OpenAI-hosted Chromium sandbox. Manus runs in a Manus-hosted Chromium sandbox. Both are remote. Cookies persist but are not the user's cookies. When authentication is required, ChatGPT Agent Mode pauses and hands control to the human.

Claude in Chrome, Perplexity Comet, ChatGPT Atlas, and Project Mariner are fundamentally different from the sandboxed agents: they run as browser extensions inside the user's actual browser, with the user's real cookies, auth state, and extensions. From your site's perspective, a visit from any of these looks like a visit from the user themselves. Claude in Chrome does inject a small number of DOM elements when the agent is active (a visual border and stop button), which is detectable if you add a DOM mutation listener to your site, but at the network and header level it produces no identifiable signal.

Identifying agent traffic

Whether you can detect an agent visit depends entirely on which agent it is. The landscape is fragmented.

Some agents identify themselves clearly. ChatGPT Agent Mode and Manus both implement the RFC 9421 HTTP Message Signatures standard, passing a Signature-Agent header that identifies them (chatgpt.com and api.manus.im respectively) along with Signature-Input and Signature headers for cryptographic verification. Claude Code (when installed natively) passes an identifiable Claude-User user-agent string. Manus also injects identifiable DOM elements on page load, which provides a secondary detection signal.

Other agents are effectively invisible. Perplexity Comet and ChatGPT Atlas both run bespoke Chromium browsers that produce headers identical to a standard user-driven Chrome instance. No identifiable agent headers. No injected DOM elements. The only path to detection is approximate, noisy analysis of behavioral signals like mouse movement patterns and click timing, which requires a lengthy interaction to produce even a rough estimate.

Claude in Chrome sends the exact same headers as the user's own browser. At the header level, it is indistinguishable from a human visit, though the DOM elements it injects when active provide a detection path for sites that monitor for them.

For raw HTTP fetchers, the picture is similarly mixed. Native Claude Code installations are identifiable. Homebrew installations are not. Codex produces no observable traffic at all.

The practical reality: a meaningful fraction of agent traffic visiting your site today is undetectable without purpose-built tooling. If understanding what's visiting your site matters to you, talk to us.

Full computer use

Screenshot-based browser agents are limited to the browser. Full computer-use agents extend the same approach to the entire operating system. Claude Computer Use in desktop mode can open applications, switch between windows, use keyboard shortcuts, and interact with native UIs. The browser is one application among many.

The same screenshot-interpret-act loop applies. The agent takes a screenshot of the full desktop, the vision model decides what to do, the agent executes the action, and the cycle repeats. The generality is maximum: anything a human can do at a computer, the agent can attempt. The reliability constraints are the same as screenshot-based browsing, compounded by the additional complexity of OS-level state management.

APIs, CLIs, and direct tool calling

Not all agent interaction involves web pages. A large category of agent activity operates through APIs, command-line tools, and MCP servers directly.

Examples include:

In all of these cases, the interaction never touches a UI. The service sees API calls in its API logs, not page visits in web analytics.

CLIs are a significant and underappreciated category. Developer-oriented tools often expose command-line interfaces that agents can invoke directly. Claude Code routinely installs packages, runs scripts, and calls CLI tools as part of multi-step workflows. None of that activity registers in web analytics.

WebMCP and structured web interaction

WebMCP represents a different model of agent-site interaction entirely. Instead of the agent reading or screenshotting a page and figuring out what to do, the website declares its available tools: search products, add to cart, check availability, submit a form. The agent discovers these declarations via a browser API and calls them directly. No screenshots. No DOM parsing. No guessing where to click. The interaction is deterministic, fast, and reliable.

Chrome 146 shipped an experimental flag to enable an early version of the core WebMCP API in March 2026. The spec is a W3C Draft Community Group Report, and it continues to evolve on a weekly basis through discussions on the webmachinelearning GitHub and Chromium commits. The most up-to-date information on the protocol often comes from the Chromium commit log itself rather than the spec document. We have an internal WebMCP agent implementation running on a stagehand fork.

The value of the protocol from the site's perspective is control and visibility. The site knows what the agent asked for, what parameters it passed, and what data it received. The interaction is logged, auditable, and controllable. The site can rate-limit, require authentication, version its tools, and deprecate endpoints gracefully.

For more on WebMCP, what it is, where it stands, and why it matters, see what is WebMCP.

The configurability problem

Most agent frameworks are not locked to one approach. OpenClaw can use Brave search for discovery, Exa for search, headless Chromium for full page interaction, or navigate Google in a real browser. Claude Code can curl a URL, write and execute a Playwright script, spawn sub-agents that fetch pages, or connect to MCP servers. Browser Use can run headless, headed, or connect to an existing browser instance.

Categorizing agents by product name is misleading. The same framework can behave completely differently depending on how it is configured, and the configuration can change per task or even per step within a task. What matters for your site is not which agent is visiting, but what it is actually doing on a given interaction: fetching HTML, controlling a headless browser, screenshotting from a remote VM, or calling your API directly.

Auth, signup, and conversion flows

Agents currently cannot easily create accounts. ChatGPT Agent Mode pauses and hands the browser back to the human when authentication is required. Claude Code can call APIs and read documentation once a human has signed up, but the signup itself requires a human.

The practical consequence: any product with gated access (free trials, demo requests, account creation) has a mandatory human step in the middle of the agent workflow. A developer using Claude Code to evaluate a product with a free trial still has to sign up themselves. The agent handles everything before and after, but the conversion event itself requires a human hand. That break in the workflow is also a break in attribution.

Google's position

Most of the approaches described above bypass Google entirely. No search query, no Chrome browser, no ad impression. An agent that fetches HTML via Exa, or controls a browser via Playwright, or calls an API directly, completes its entire workflow without Google appearing anywhere in the chain. Google is not losing a click. It is absent from the session.

Google's response spans multiple parallel strategies: Project Mariner (a Chrome extension powered by Gemini), WebMCP (a browser API with Chrome as the only current implementation), AI Mode in Search (agent-style interaction inside Google's search product), and A2A (agent-to-agent routing through Google Cloud). All of these keep agents flowing through infrastructure Google controls.

WebMCP is worth specific attention. It is a browser API, not inherently a Google product. Other browsers could implement it. But Chrome is currently the only browser that has, and if WebMCP becomes the standard way agents interact with websites, every implementation flows through Chrome's API surface. Google defines the permissions, captures the telemetry, and controls the standard. Firefox has begun syncing WebMCP Web Platform Tests, suggesting active implementation work. Whether Firefox and other browsers ship full support, or whether alternative structured protocols emerge, will determine whether Google retains its position as the mediating layer between user intent and websites.

Why this matters

Every approach described above is a pathway through which people find you, learn about you, and evaluate your product. If your job involves marketing, growth, or revenue, the landscape above is how information is increasingly flowing. Each day, all of it gets wider adoption.

The complexity is real. The same buyer might ask ChatGPT a question (served from cache, invisible), then use Claude Code to evaluate your API (raw HTTP fetch, visible in server logs if the installation passes identifiable headers), then have ChatGPT Agent Mode fill out your demo request form (screenshot-based browser from an OpenAI datacenter, identifiable via RFC 9421 signatures if you know to look for them). Each step uses a different mechanism, produces a different footprint, and creates a different attribution challenge.

The first problem to solve is knowing what's actually visiting your site and how. If any of this is relevant to you, talk to us.

Next

What is WebMCP?

Sources

← Bread · contact@aibread.com · OpenLens