Agent Browser is a headless browser automation CLI specifically designed for AI agents. Built with a fast Rust CLI and Node.js fallback, it provides an optimal interface for AI-driven web automation with features like accessibility tree snapshots, deterministic element references, and JSON output mode.

Core Features

1. AI-Optimized Workflow

Snapshot with Refs: Get accessibility tree with deterministic element references (@e1, @e2, etc.)
Ref-Based Actions: Interact with elements using refs from snapshots for reliable automation
JSON Output Mode: Machine-readable output perfect for AI agent integration

2. Fast Rust CLI

Native Rust binary for blazing-fast command execution
Client-daemon architecture for persistent browser sessions
Automatic fallback to Node.js when native binary unavailable

3. Comprehensive Browser Control

Full navigation and interaction capabilities
Mouse, keyboard, and touch event simulation
Network interception and mocking
Cookie and storage management
Multi-tab and iframe support

4. Session Management

Isolated browser sessions for parallel automation
Persistent authentication state
Session-scoped cookies and storage

5. Streaming & Preview

WebSocket-based browser viewport streaming
Live preview for "pair browsing" with AI agents
Real-time input event injection

6. Flexible Deployment

Custom browser executable support (e.g., @sparticuz/chromium for serverless)
CDP mode for connecting to existing browsers
Headed mode for debugging

Key Commands

agent-browser open <url>              # Navigate to URL
agent-browser click <sel>             # Click element
agent-browser fill <sel> <text>       # Fill input
agent-browser type <sel> <text>       # Type into element
agent-browser press <key>             # Press key
agent-browser hover <sel>             # Hover element
agent-browser scroll <dir> [px]       # Scroll page

AI-Optimized Workflow

agent-browser snapshot                # Get accessibility tree with refs
agent-browser snapshot -i             # Interactive elements only
agent-browser snapshot -c             # Compact mode
agent-browser click @e2               # Click by ref
agent-browser fill @e3 "text"         # Fill by ref
agent-browser get text @e1            # Get text by ref

Information Retrieval

agent-browser get text <sel>          # Get text content
agent-browser get html <sel>          # Get innerHTML
agent-browser get value <sel>         # Get input value
agent-browser get title               # Get page title
agent-browser get url                 # Get current URL

State Checking

agent-browser is visible <sel>        # Check visibility
agent-browser is enabled <sel>        # Check if enabled
agent-browser is checked <sel>        # Check if checked

Advanced Features

agent-browser screenshot [path]       # Take screenshot
agent-browser pdf <path>              # Save as PDF
agent-browser eval <js>               # Run JavaScript
agent-browser network route <url>     # Intercept requests
agent-browser cookies                 # Manage cookies
agent-browser storage local           # Manage localStorage

Optimal AI Workflow

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json   # AI parses tree and refs

# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"

# 4. Get new snapshot if page changed
agent-browser snapshot -i --json

Key Capabilities

Deterministic Selection: Refs provide exact element targeting from snapshots
Fast Execution: Rust CLI with daemon architecture for speed
AI-Friendly Output: JSON mode for seamless AI integration
Cross-Platform: macOS, Linux, Windows support
Serverless Ready: Custom executable support for lightweight deployments
Session Isolation: Multiple parallel browser instances
Live Streaming: WebSocket-based viewport streaming

Use Cases

AI agent web automation and testing
Automated UI testing and monitoring
Web scraping with AI guidance
Browser-based task automation
Serverless browser automation
AI-assisted debugging and exploration
Pair browsing with human oversight

Technical Details

Architecture: Rust CLI + Node.js daemon
Browser Engine: Chromium (via Playwright)
Platforms: macOS ARM64/x64, Linux ARM64/x64, Windows x64
Protocols: Chrome DevTools Protocol (CDP)
Streaming: WebSocket-based viewport streaming
Output: Human-readable or JSON format

Agent Browser

Core Features

1. AI-Optimized Workflow

2. Fast Rust CLI

3. Comprehensive Browser Control

4. Session Management

5. Streaming & Preview

6. Flexible Deployment

Key Commands

Navigation & Interaction

AI-Optimized Workflow

Information Retrieval

State Checking

Advanced Features

Optimal AI Workflow

Key Capabilities

Use Cases

Technical Details

Comments

Related Tools

Playwright

Claude Agent SDK

Playwright Automation

Related Insights

The Twilight of Low-Code Platforms: Why Claude Agent SDK Will Make Dify History

Skills + Hooks + Plugins: How Anthropic Redefined AI Coding Tool Extensibility

Claudesidian: Transform Obsidian into an AI-Powered Second Brain