Give any AI agent a real browser β persistent, stealthy, self-hosted.
Agent-OS is a production-grade stealth browser automation server that gives AI agents 199 browser tools β navigate, click, fill forms, extract data, handle CAPTCHAs, and more. Works with Claude, GPT-4, Codex, OpenClaw, and any agent that can send an HTTP request.
- Key Features
- Quick Start
- AI Platform Connectors
- Commands Reference
- Configuration
- Authentication
- Stealth and Anti-Detection
- Architecture Overview
- Production Deployment
- Project Structure
- Development
- Troubleshooting
- Tech Stack
- License
- Navigation β
navigate,back,forward,reload,smart-navigate(auto HTTP/browser strategy) - Interaction β
click,double-click,right-click,hover,type,press,fill-form,drag-drop,scroll,select,upload,checkbox - Smart Finder β
smart-click,smart-find,smart-fillβ find elements by visible text, no CSS selectors needed - Content Extraction β
get-content,get-dom,screenshot,get-links,get-images,get-text,evaluate-js - Page Analysis β
page-summary,page-tables,page-seo,page-structured,page-emails,page-phones,page-accessibility - Network Capture β
network-start,network-stop,network-get,network-apis,network-export(HAR/JSON) - Security Scanning β
scan-xss,scan-sqli,scan-sensitive - Workflows β
workflow,workflow-save,workflow-templateβ multi-step automation with variables and error handling - Sessions β
save-session,restore-session,save-creds,auto-login,get-cookies,set-cookie - Tabs β
tabs list/new/switch/close,add-extension - Device Emulation β 11 presets from iPhone SE to 4K desktop
- Transcription β
transcribeaudio/video via Whisper
- Auto-Heal β Self-healing selectors: if a CSS selector breaks, finds element by nearby text automatically
- Auto-Retry β Circuit breaker pattern with intelligent error classification and exponential backoff
- Smart Wait β 7 wait strategies: element, network idle, JS condition, DOM stable, page load, composed
- Session Recording β Record browser actions, replay them, export as workflows
- Multi-Agent Hub β Shared browser sessions, task queues, distributed locks, shared memory between agents
- Login Handoff β Pause AI, let human log in, resume with cookies. AI never sees passwords.
- Proxy Rotation β Pool management, health checks, geo-targeting, 6 rotation strategies
- LLM Provider β Built-in
llm-complete,llm-summarize,llm-classify,llm-extract - AI Content Extraction β Structured data extraction with schema.org, forms, metadata
- Query Router β Classify queries: does this need a browser? 3-tier routing (rules β LLM β conservative)
navigator.webdriverremoval (prototype-level, notdefineProperty)- CDP detection blocking (property filtering + intercept)
- WebGL/Canvas/Audio fingerprint spoofing (deterministic, per-session)
- TLS fingerprint bypass (Chrome 145/146 impersonation via curl_cffi)
- HTTP/2 fingerprint matching
- WebRTC IP leak blocking
- 40+ fingerprinting libraries blocked (FingerprintJS, ClientJS, BotD, CreepJS...)
- 15+ bot detection vendors blocked (DataDome, PerimeterX, Cloudflare, Akamai, Kasada...)
- Human-like mouse movements (Bezier curves), typing rhythms, scroll behavior
| Connector | Tools | Use With | API Key? |
|---|---|---|---|
| MCP Passthrough β | 199 | Claude Desktop, Claude Code, Codex, any MCP agent | β No |
| MCP Server | 199 | Claude Desktop, Claude Code, Codex, any MCP agent | Optional |
| OpenAI | 199 | GPT-4, GPT-4o, any OpenAI-compatible API | Yes |
| Claude API | 199 | Claude API (tool-use format) | Yes |
| OpenClaw | 199 | OpenClaw agent framework | Optional |
| CLI (Bash) | 198 | Any language (Python, Node, Go, Rust...) | Token |
| HTTP REST | 198 | Direct API calls | Token |
curl -sSL https://raw.githubusercontent.com/factspark23-hash/Agent-OS/main/install.sh | bashWith options:
# Custom token
curl -sSL .../install.sh | bash -s -- --token my-secret-token
# Show browser window (debugging)
curl -sSL .../install.sh | bash -s -- --headed
# Custom port
curl -sSL .../install.sh | bash -s -- --port 9000
# Install only, don't start
curl -sSL .../install.sh | bash -s -- --no-start# Start Agent-OS server
python3 main.py --agent-token "my-secret-token"
# In another terminal, start MCP wrapper
./run_mcp.sh --token "my-secret-token"This starts both the Agent-OS server and the MCP passthrough wrapper. No API key needed β Claude Desktop/GPT handles reasoning, Agent-OS handles execution. 87% token savings via SmartCompressor.
Claude Desktop config:
{
"mcpServers": {
"agent-os": {
"command": "python3",
"args": ["/absolute/path/to/Agent-OS/connectors/mcp_passthrough.py"],
"env": {
"AGENT_OS_URL": "http://localhost:8001",
"AGENT_OS_TOKEN": "my-secret-token"
}
}
}
}See MCP_WRAPPER_README.md for full documentation.
curl -sSL https://raw.githubusercontent.com/factspark23-hash/Agent-OS/main/quickstart.sh | bashThis does everything install.sh does PLUS:
- Auto-detects Claude Code, Codex, OpenClaw
- Configures MCP connections automatically
- Prints ready-to-use connection info
git clone https://github.com/factspark23-hash/Agent-OS.git
cd Agent-OS
export POSTGRES_PASSWORD="strong-db-password"
docker compose up -d
curl http://localhost:8001/healthFull stack: PostgreSQL + Redis + Agent-OS + Nginx reverse proxy.
git clone https://github.com/factspark23-hash/Agent-OS.git
cd Agent-OS
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python3 -m patchright install chromium
# Generate JWT secret
export JWT_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(48))')
# Start server
python3 main.py --agent-token "your-token"# Check health
curl http://localhost:8001/health
# Navigate
curl -X POST http://localhost:8001/command \
-H "Content-Type: application/json" \
-d '{"token":"your-token","command":"navigate","url":"https://github.com"}'
# Screenshot
curl -X POST http://localhost:8001/command \
-H "Content-Type: application/json" \
-d '{"token":"your-token","command":"screenshot"}'
# Click by text (no CSS selector)
curl -X POST http://localhost:8001/command \
-H "Content-Type: application/json" \
-d '{"token":"your-token","command":"smart-click","text":"Sign in"}'Drop-in MCP server that works without any LLM API key. The MCP client's LLM (Claude, GPT-4) handles reasoning β Agent-OS handles execution.
# Start server + wrapper
./run_mcp.sh --token "my-secret-token"Claude Desktop config:
{
"mcpServers": {
"agent-os": {
"command": "python3",
"args": ["/absolute/path/to/Agent-OS/connectors/mcp_passthrough.py"],
"env": {
"AGENT_OS_URL": "http://localhost:8001",
"AGENT_OS_TOKEN": "my-secret-token",
"AGENT_OS_COMPRESS": "aggressive"
}
}
}
}Features:
- 199 tools β 192 browser + 7 LLM (built-in rule-based, no API key)
- SmartCompressor β 87% token savings on browser results
- Configurable compression:
aggressive/normal/off - Works standalone (LLM tools work without Agent-OS server)
- Helpful error messages when server is down
See MCP_WRAPPER_README.md for full docs.
Add to your config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Claude Code: ~/.claude/claude_desktop_config.json
{
"mcpServers": {
"agent-os": {
"command": "python3",
"args": ["/absolute/path/to/Agent-OS/connectors/mcp_server.py"],
"env": {
"AGENT_OS_URL": "http://localhost:8001",
"AGENT_OS_TOKEN": "your-token"
}
}
}
}Restart Claude Desktop / Claude Code β 199 browser tools appear automatically.
from connectors.openai_connector import get_tools, call_tool
# Get tool definitions
tools = get_tools("openai") # 199 tools
# Use with OpenAI API
result = await call_tool("browser_navigate", {"url": "https://github.com"})
result = await call_tool("browser_screenshot", {})
result = await call_tool("browser_smart_click", {"text": "Sign In"})from connectors.openai_connector import get_tools, call_tool
# Get tools in Claude format
tools = get_tools("claude") # 199 toolsfrom connectors.openclaw_connector import get_manifest, execute_tool
manifest = get_manifest() # 199 tools
result = await execute_tool("browser_navigate", {"url": "https://example.com"})export AGENT_OS_TOKEN="your-token"
./connectors/agent-os-tool.sh navigate "https://github.com"
./connectors/agent-os-tool.sh screenshot
./connectors/agent-os-tool.sh smart-click "Sign In"
./connectors/agent-os-tool.sh statusRun without arguments to see all 198 commands.
# WebSocket port: 8000
# HTTP port: 8001
# Debug UI: 8002 (with --debug)
curl -X POST http://localhost:8001/command \
-H "Content-Type: application/json" \
-d '{"token":"your-token","command":"navigate","url":"https://example.com"}'All 198 server commands, organized by category:
| Category | Commands | Count |
|---|---|---|
| Navigation | navigate, smart-navigate, back, forward, reload, route |
6 |
| Interaction | click, double-click, right-click, context-action, hover, type, press, fill-form, clear-input, select, upload, checkbox, drag-drop, drag-offset, scroll, wait, viewport |
17 |
| Smart Finder | smart-find, smart-find-all, smart-click, smart-fill |
4 |
| Content | get-content, get-dom, screenshot, get-links, get-images, get-text, get-attr, evaluate-js, console-logs |
9 |
| Page Analysis | page-summary, page-tables, page-seo, page-structured, page-emails, page-phones, page-accessibility, analyze, analyze-search |
9 |
| Network | network-start, network-stop, network-get, network-apis, network-detail, network-stats, network-export, network-clear |
8 |
| Security | scan-xss, scan-sqli, scan-sensitive |
3 |
| Workflows | workflow, workflow-save, workflow-template, workflow-list, workflow-status, workflow-json |
6 |
| Sessions | save-session, restore-session, list-sessions, delete-session, save-creds, auto-login, get-cookies, set-cookie |
8 |
| Tabs & Device | tabs, add-extension, emulate-device, list-devices |
4 |
| Proxy | set-proxy, get-proxy |
2 |
| Proxy Rotation | proxy-add/remove/list/check/check-all/rotate/stats/enable/disable/strategy/save/load/load-file/load-api/record/get |
16 |
| Smart Wait | smart-wait, smart-wait-element/network/js/dom/page/compose |
7 |
| Auto-Heal | heal-click/fill/hover/double-click/wait/selector/stats/clear/fingerprint/fingerprint-page |
10 |
| Auto-Retry | retry-navigate/click/fill/execute/api-call/stats/health/circuit-breakers/reset-circuit/reset-all-circuits |
10 |
| Recording | record-start/stop/pause/resume/status/list/delete/annotate |
8 |
| Replay | replay-play/stop/pause/resume/step/jump/position/events/load/export-workflow |
10 |
| Multi-Agent Hub | hub-register/unregister/agents/status/broadcast/handoff/heartbeat/lock/unlock/locks/events/audit |
12 |
| Hub Tasks | hub-task-create/claim/start/complete/fail/cancel/tasks |
7 |
| Hub Memory | hub-memory-set/get/list/delete |
4 |
| Login Handoff | login-handoff-start/status/complete/cancel/list/stats/history, detect-login-page |
8 |
| TLS HTTP | fetch, tls-get, tls-post, tls-stats |
4 |
| LLM | llm-complete/summarize/classify/extract/provider-set/token-usage/cache-clear |
7 |
| AI Content | ai-content, fill-job, structured-extract/format/schema/deduplicate |
6 |
| CAPTCHA | captcha-assess/preflight/health/monitor-start/monitor-stop/shutdown |
6 |
| Query Router | classify-query, needs-web, query-strategy, router-stats, nav-stats |
5 |
| Transcription | transcribe |
1 |
| Status | health |
1 |
| Total | 198 |
| Variable | Required | Default | Description |
|---|---|---|---|
JWT_SECRET_KEY |
Production | Auto-generated | JWT signing key. Set for persistent sessions. |
POSTGRES_PASSWORD |
Docker | β | PostgreSQL password. |
AGENT_TOKEN |
Optional | Auto-generated | Legacy auth token. |
DATABASE_DSN |
Optional | β | PostgreSQL connection string. |
REDIS_URL |
Optional | β | Redis URL for distributed rate limiting. |
PROXY_URL |
Optional | β | HTTP/SOCKS5 proxy URL. |
SWARM_PROVIDER_API_KEY |
Optional | β | API key for LLM-based query routing. |
python3 main.py \
--agent-token "my-token" \ # Auth token
--port 8000 \ # WebSocket port (HTTP = port+1)
--headed \ # Show browser window
--max-ram 500 \ # RAM limit in MB
--proxy "http://proxy:8080" \ # Proxy
--device iphone_14 \ # Device preset
--persistent \ # Production mode (persistent Chromium)
--database "postgresql+asyncpg://..." \
--redis "redis://localhost:6379/0" \
--json-logs \ # JSON structured logging
--debug \ # Debug UI (port+2)
--swarm \ # Enable query routing
--create-tables # Create DB tables on startup| Preset | Type | Viewport |
|---|---|---|
iphone_se |
Mobile | 375Γ667 |
iphone_14 |
Mobile | 390Γ844 |
iphone_14_pro_max |
Mobile | 430Γ932 |
galaxy_s23 |
Mobile | 360Γ780 |
pixel_8 |
Mobile | 412Γ915 |
ipad |
Tablet | 768Γ1024 |
ipad_pro |
Tablet | 1024Γ1366 |
desktop_1080 |
Desktop | 1920Γ1080 |
desktop_1440 |
Desktop | 2560Γ1440 |
desktop_4k |
Desktop | 3840Γ2160 |
| Port | Service |
|---|---|
| 8000 | WebSocket (agent connections) |
| 8001 | HTTP REST API |
| 8002 | Debug UI (only with --debug) |
3-layer auth system, checked in order:
# Register
curl -X POST http://localhost:8001/auth/register \
-H "Content-Type: application/json" \
-d '{"email":"you@example.com","username":"admin","password":"StrongPass123!"}'
# Login
curl -X POST http://localhost:8001/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"admin","password":"StrongPass123!"}'
# Use JWT
curl -X POST http://localhost:8001/command \
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
-H "Content-Type: application/json" \
-d '{"command":"navigate","url":"https://example.com"}'# Create key (via JWT)
curl -X POST http://localhost:8001/auth/api-keys \
-H "Authorization: Bearer YOUR_JWT" \
-H "Content-Type: application/json" \
-d '{"name":"my-key","scopes":["browser"]}'
# Use key (header)
curl -X POST http://localhost:8001/command \
-H "X-API-Key: aos_your_key" \
-H "Content-Type: application/json" \
-d '{"command":"navigate","url":"https://example.com"}'python3 main.py --agent-token "my-dev-token"
# Use in requests
curl -X POST http://localhost:8001/command \
-H "Content-Type: application/json" \
-d '{"token":"my-dev-token","command":"navigate","url":"https://example.com"}'Agent-OS defeats bot detection with a 4-layer defense system:
- Chrome TLS fingerprint (JA3/JA4) via curl_cffi
- HTTP/2 fingerprint matching (Chrome 145/146)
- Bot detection scripts blocked at network level
- Fake success responses for reCAPTCHA/hCaptcha
- Page.addScriptToEvaluateOnNewDocument injection
- User-Agent metadata spoofing
- Timezone and locale override
navigator.webdriverremoval (prototype-level)- CDP property filtering
- Chrome object completeness (runtime, app, csi, loadTimes)
- WebGL/Canvas/Audio fingerprint spoofing
- WebRTC IP leak prevention
- Function toString masking
- Stack trace sanitization
- Bezier-curve mouse movements with micro-tremor
- Realistic typing rhythms (40-300ms per keystroke)
- Word pause simulation (200-600ms)
- Typo simulation with correction (3% rate)
- Natural scroll with micro-variance
DataDome, PerimeterX, Imperva, Akamai, Cloudflare Bot Management, Cloudflare Turnstile, Kasada, Shape Security, F5, Arkose Labs, ThreatMetrix, hCaptcha, reCAPTCHA
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β External Clients β
β Claude Desktop β GPT-4 β Codex β CLI β HTTP/WS β
ββββββββββ¬βββββββββ΄ββββ¬ββββββ΄ββββ¬ββββ΄βββ¬βββ΄βββββββ¬βββββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Connectors (199 tools each) β
β MCP β OpenAI β Claude β OpenClaw β CLI β REST+WebSocket β
ββββββββββ¬βββββββ΄ββββ¬ββββββ΄βββββ¬ββββββ΄βββ¬βββ΄βββββββ¬βββββββββββ
β β β β β
ββββββββββββ΄βββββ¬ββββββ΄βββββββββ΄ββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Server (aiohttp) β
β Auth β Rate Limiter β Validator β Command Router β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Browser β β Tools Layer β β Infrastructureβ
β (Patchright β β Smart Finder β β PostgreSQL β
β + Stealth) β β Workflows β β Redis β
β 26+ vectors β β Auto-Heal β β JWT Auth β
β β β Recording β β Logging β
β β β LLM Provider β β β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
export JWT_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(48))')
export POSTGRES_PASSWORD="strong-db-password"
docker compose --profile with-nginx up -d- Set
JWT_SECRET_KEY(without it, sessions don't survive restarts) - Use JWT + API keys (legacy tokens are dev-only)
- Run behind firewall (use Nginx + SSL for public exposure)
- Enable Redis for distributed rate limiting
- Set up PostgreSQL for multi-instance deployments
- Monitor RAM (~500MB idle, ~800MB under load)
- Configure CORS for your domain
| Config | Concurrent Users | Memory |
|---|---|---|
| 1 instance Γ 50 contexts | 50 | ~800 MB |
| 3 instances Γ 50 contexts | 150 | ~2.4 GB |
| 5 instances Γ 50 contexts | 250 | ~4 GB |
Agent-OS/
βββ main.py # Entry point
βββ install.sh # One-command installer
βββ quickstart.sh # Install + auto-connect
βββ docker-compose.yml # Full Docker stack
βββ Dockerfile # Multi-stage build
βββ requirements.txt # Python dependencies
βββ alembic.ini # DB migrations
β
βββ src/
β βββ core/ # Browser engine
β β βββ browser.py # Main browser (Patchright/Chromium)
β β βββ http_client.py # TLS HTTP client (curl_cffi)
β β βββ stealth.py # Anti-detection JS
β β βββ cdp_stealth.py # CDP-level stealth
β β βββ stealth_god.py # GOD MODE (26+ vectors)
β β βββ tls_spoof.py # TLS fingerprint spoofing
β β βββ tls_proxy.py # TLS proxy
β β βββ smart_navigator.py # Auto HTTP/browser strategy
β β βββ persistent_browser.py # Production persistent Chromium
β β βββ llm_provider.py # LLM integration
β β βββ config.py # Configuration (YAML)
β β βββ session.py # Session management
β β
β βββ auth/ # Authentication
β β βββ jwt_handler.py # JWT (HS256)
β β βββ api_key_manager.py # API keys (aos_ prefix)
β β βββ user_manager.py # User registration
β β βββ middleware.py # 3-layer auth middleware
β β
β βββ security/ # Stealth & Evasion
β β βββ evasion_engine.py # Fingerprint generation
β β βββ captcha_bypass.py # CAPTCHA detection
β β βββ captcha_solver.py # CAPTCHA solving
β β βββ captcha_preempt.py # CAPTCHA preemption
β β βββ cloudflare_bypass.py # Cloudflare bypass
β β βββ human_mimicry.py # Human behavior simulation
β β
β βββ tools/ # Feature engines
β β βββ smart_finder.py # Find by visible text
β β βββ workflow.py # Multi-step workflows
β β βββ network_capture.py # Network request capture
β β βββ page_analyzer.py # Page analysis
β β βββ form_filler.py # Form filling
β β βββ auto_heal.py # Self-healing selectors
β β βββ auto_retry.py # Auto-retry + circuit breaker
β β βββ session_recording.py # Record & replay
β β βββ multi_agent.py # Multi-agent hub
β β βββ proxy_rotation.py # Proxy pool management
β β βββ login_handoff.py # Human-in-the-loop login
β β βββ ai_content.py # AI content extraction
β β βββ web_query_router.py # Query classification
β β βββ transcriber.py # Audio/video transcription
β β
β βββ agents/
β β βββ server.py # WebSocket + HTTP server (198 commands)
β β
β βββ agent_swarm/ # Query routing system
β β βββ router/ # 3-tier router
β β βββ agents/ # Agent profiles
β β βββ search/ # Search backends
β β
β βββ infra/ # Infrastructure
β β βββ database.py # PostgreSQL (async)
β β βββ redis_client.py # Redis
β β βββ logging.py # Structured logging
β β
β βββ validation/
β βββ schemas.py # Input validation (Pydantic v2)
β
βββ connectors/ # AI Platform Connectors
β βββ _tool_registry.py # 199 tool definitions (source of truth)
β βββ mcp_server.py # MCP (Claude/Codex)
β βββ openai_connector.py # OpenAI function-calling
β βββ openclaw_connector.py # OpenClaw
β βββ agent-os-tool.sh # CLI (198 commands)
β βββ mcp_config.json # MCP config template
β
βββ web/ # React Web UI
β βββ src/ # React 18 + TypeScript + TailwindCSS
β βββ vite.config.ts # Vite 6
β
βββ tests/ # Test suite
git clone https://github.com/factspark23-hash/Agent-OS.git
cd Agent-OS
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python3 -m patchright install chromium
python3 main.py --headed --debug --agent-token "dev-token"python3 -m pytest tests/ -v| Problem | Solution |
|---|---|
| Port in use | python3 main.py --port 9000 |
| Chromium not found | python3 -m patchright install chromium |
| JWT warning | export JWT_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(48))') |
| Auth failed | Check token in startup logs or .env |
| Site detects bot | Try --device iphone_14 or add --proxy |
| High RAM | python3 main.py --max-ram 500 |
| DB connection error | Check docker compose logs postgres |
| Component | Technology |
|---|---|
| Browser | Patchright (stealth Playwright) + Chromium |
| HTTP Client | curl_cffi (Chrome TLS fingerprint) |
| Database | PostgreSQL (SQLAlchemy async) |
| Cache | Redis (with in-memory fallback) |
| Auth | JWT (HS256) + API keys + legacy tokens |
| Web UI | React 18 + Vite 6 + TypeScript + TailwindCSS |
| Validation | Pydantic v2 |
| Logging | structlog |
| Runtime | Python 3.10+ / asyncio |
MIT License β free for commercial and personal use.
