Agent Browser logo

Agent Browser

Visit

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback, designed for seamless integration with AI workflows.

Share:

Agent Browser is a headless browser automation CLI specifically designed for AI agents. Built with a fast Rust CLI and Node.js fallback, it provides an optimal interface for AI-driven web automation with features like accessibility tree snapshots, deterministic element references, and JSON output mode.

Core Features

1. AI-Optimized Workflow

  • Snapshot with Refs: Get accessibility tree with deterministic element references (@e1, @e2, etc.)
  • Ref-Based Actions: Interact with elements using refs from snapshots for reliable automation
  • JSON Output Mode: Machine-readable output perfect for AI agent integration

2. Fast Rust CLI

  • Native Rust binary for blazing-fast command execution
  • Client-daemon architecture for persistent browser sessions
  • Automatic fallback to Node.js when native binary unavailable

3. Comprehensive Browser Control

  • Full navigation and interaction capabilities
  • Mouse, keyboard, and touch event simulation
  • Network interception and mocking
  • Cookie and storage management
  • Multi-tab and iframe support

4. Session Management

  • Isolated browser sessions for parallel automation
  • Persistent authentication state
  • Session-scoped cookies and storage

5. Streaming & Preview

  • WebSocket-based browser viewport streaming
  • Live preview for "pair browsing" with AI agents
  • Real-time input event injection

6. Flexible Deployment

  • Custom browser executable support (e.g., @sparticuz/chromium for serverless)
  • CDP mode for connecting to existing browsers
  • Headed mode for debugging

Key Commands

agent-browser open <url>              # Navigate to URL
agent-browser click <sel>             # Click element
agent-browser fill <sel> <text>       # Fill input
agent-browser type <sel> <text>       # Type into element
agent-browser press <key>             # Press key
agent-browser hover <sel>             # Hover element
agent-browser scroll <dir> [px]       # Scroll page

AI-Optimized Workflow

agent-browser snapshot                # Get accessibility tree with refs
agent-browser snapshot -i             # Interactive elements only
agent-browser snapshot -c             # Compact mode
agent-browser click @e2               # Click by ref
agent-browser fill @e3 "text"         # Fill by ref
agent-browser get text @e1            # Get text by ref

Information Retrieval

agent-browser get text <sel>          # Get text content
agent-browser get html <sel>          # Get innerHTML
agent-browser get value <sel>         # Get input value
agent-browser get title               # Get page title
agent-browser get url                 # Get current URL

State Checking

agent-browser is visible <sel>        # Check visibility
agent-browser is enabled <sel>        # Check if enabled
agent-browser is checked <sel>        # Check if checked

Advanced Features

agent-browser screenshot [path]       # Take screenshot
agent-browser pdf <path>              # Save as PDF
agent-browser eval <js>               # Run JavaScript
agent-browser network route <url>     # Intercept requests
agent-browser cookies                 # Manage cookies
agent-browser storage local           # Manage localStorage

Optimal AI Workflow

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json   # AI parses tree and refs

# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"

# 4. Get new snapshot if page changed
agent-browser snapshot -i --json

Key Capabilities

  • Deterministic Selection: Refs provide exact element targeting from snapshots
  • Fast Execution: Rust CLI with daemon architecture for speed
  • AI-Friendly Output: JSON mode for seamless AI integration
  • Cross-Platform: macOS, Linux, Windows support
  • Serverless Ready: Custom executable support for lightweight deployments
  • Session Isolation: Multiple parallel browser instances
  • Live Streaming: WebSocket-based viewport streaming

Use Cases

  • AI agent web automation and testing
  • Automated UI testing and monitoring
  • Web scraping with AI guidance
  • Browser-based task automation
  • Serverless browser automation
  • AI-assisted debugging and exploration
  • Pair browsing with human oversight

Technical Details

  • Architecture: Rust CLI + Node.js daemon
  • Browser Engine: Chromium (via Playwright)
  • Platforms: macOS ARM64/x64, Linux ARM64/x64, Windows x64
  • Protocols: Chrome DevTools Protocol (CDP)
  • Streaming: WebSocket-based viewport streaming
  • Output: Human-readable or JSON format

Comments

No comments yet. Be the first to comment!