Agent Browser is a headless browser automation CLI specifically designed for AI agents. Built with a fast Rust CLI and Node.js fallback, it provides an optimal interface for AI-driven web automation with features like accessibility tree snapshots, deterministic element references, and JSON output mode.
Core Features
1. AI-Optimized Workflow
- Snapshot with Refs: Get accessibility tree with deterministic element references (@e1, @e2, etc.)
- Ref-Based Actions: Interact with elements using refs from snapshots for reliable automation
- JSON Output Mode: Machine-readable output perfect for AI agent integration
2. Fast Rust CLI
- Native Rust binary for blazing-fast command execution
- Client-daemon architecture for persistent browser sessions
- Automatic fallback to Node.js when native binary unavailable
3. Comprehensive Browser Control
- Full navigation and interaction capabilities
- Mouse, keyboard, and touch event simulation
- Network interception and mocking
- Cookie and storage management
- Multi-tab and iframe support
4. Session Management
- Isolated browser sessions for parallel automation
- Persistent authentication state
- Session-scoped cookies and storage
5. Streaming & Preview
- WebSocket-based browser viewport streaming
- Live preview for "pair browsing" with AI agents
- Real-time input event injection
6. Flexible Deployment
- Custom browser executable support (e.g., @sparticuz/chromium for serverless)
- CDP mode for connecting to existing browsers
- Headed mode for debugging
Key Commands
Navigation & Interaction
agent-browser open <url> # Navigate to URL
agent-browser click <sel> # Click element
agent-browser fill <sel> <text> # Fill input
agent-browser type <sel> <text> # Type into element
agent-browser press <key> # Press key
agent-browser hover <sel> # Hover element
agent-browser scroll <dir> [px] # Scroll page
AI-Optimized Workflow
agent-browser snapshot # Get accessibility tree with refs
agent-browser snapshot -i # Interactive elements only
agent-browser snapshot -c # Compact mode
agent-browser click @e2 # Click by ref
agent-browser fill @e3 "text" # Fill by ref
agent-browser get text @e1 # Get text by ref
Information Retrieval
agent-browser get text <sel> # Get text content
agent-browser get html <sel> # Get innerHTML
agent-browser get value <sel> # Get input value
agent-browser get title # Get page title
agent-browser get url # Get current URL
State Checking
agent-browser is visible <sel> # Check visibility
agent-browser is enabled <sel> # Check if enabled
agent-browser is checked <sel> # Check if checked
Advanced Features
agent-browser screenshot [path] # Take screenshot
agent-browser pdf <path> # Save as PDF
agent-browser eval <js> # Run JavaScript
agent-browser network route <url> # Intercept requests
agent-browser cookies # Manage cookies
agent-browser storage local # Manage localStorage
Optimal AI Workflow
# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json # AI parses tree and refs
# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"
# 4. Get new snapshot if page changed
agent-browser snapshot -i --json
Key Capabilities
- Deterministic Selection: Refs provide exact element targeting from snapshots
- Fast Execution: Rust CLI with daemon architecture for speed
- AI-Friendly Output: JSON mode for seamless AI integration
- Cross-Platform: macOS, Linux, Windows support
- Serverless Ready: Custom executable support for lightweight deployments
- Session Isolation: Multiple parallel browser instances
- Live Streaming: WebSocket-based viewport streaming
Use Cases
- AI agent web automation and testing
- Automated UI testing and monitoring
- Web scraping with AI guidance
- Browser-based task automation
- Serverless browser automation
- AI-assisted debugging and exploration
- Pair browsing with human oversight
Technical Details
- Architecture: Rust CLI + Node.js daemon
- Browser Engine: Chromium (via Playwright)
- Platforms: macOS ARM64/x64, Linux ARM64/x64, Windows x64
- Protocols: Chrome DevTools Protocol (CDP)
- Streaming: WebSocket-based viewport streaming
- Output: Human-readable or JSON format
Comments
No comments yet. Be the first to comment!
Related Tools
Playwright
playwright.dev
Playwright is a modern end-to-end testing framework developed by Microsoft that enables reliable testing across Chromium, Firefox, and WebKit with a single API.
Claude Agent SDK
github.com/anthropics/anthropic-sdk-python
Official AI agent development toolkit from Anthropic. Supports Python and TypeScript with powerful features including tool calling, code execution, file operations, and MCP integration.
Playwright Automation
github.com/lackeyjb/playwright-skill
Browser automation and testing with Playwright for web application testing, UI automation, and end-to-end testing workflows.
Related Insights
The Twilight of Low-Code Platforms: Why Claude Agent SDK Will Make Dify History
A deep dive from first principles of large language models on why Claude Agent SDK will replace Dify. Exploring why describing processes in natural language is more aligned with human primitive behavior patterns, and why this is the inevitable choice in the AI era.
Skills + Hooks + Plugins: How Anthropic Redefined AI Coding Tool Extensibility
An in-depth analysis of Claude Code's trinity architecture of Skills, Hooks, and Plugins. Explore why this design is more advanced than GitHub Copilot and Cursor, and how it redefines AI coding tool extensibility through open standards.
Claudesidian: Transform Obsidian into an AI-Powered Second Brain
Discover Claudesidian, an open-source project that perfectly integrates Obsidian with Claude Code. Built-in PARA method, custom commands, and automated workflows for a complete idea-to-implementation solution.