Skip to content

factspark23-hash/Agent-OS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

159 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agent-OS

Agent-OS

Give any AI agent a real browser β€” persistent, stealthy, self-hosted.

Agent-OS is a production-grade stealth browser automation server that gives AI agents 199 browser tools β€” navigate, click, fill forms, extract data, handle CAPTCHAs, and more. Works with Claude, GPT-4, Codex, OpenClaw, and any agent that can send an HTTP request.

License: MIT Python 3.10+ Docker Ready 199 Tools Version 3.2.0


Table of Contents


Key Features

🌐 Browser Automation β€” 199 Tools

  • Navigation β€” navigate, back, forward, reload, smart-navigate (auto HTTP/browser strategy)
  • Interaction β€” click, double-click, right-click, hover, type, press, fill-form, drag-drop, scroll, select, upload, checkbox
  • Smart Finder β€” smart-click, smart-find, smart-fill β€” find elements by visible text, no CSS selectors needed
  • Content Extraction β€” get-content, get-dom, screenshot, get-links, get-images, get-text, evaluate-js
  • Page Analysis β€” page-summary, page-tables, page-seo, page-structured, page-emails, page-phones, page-accessibility
  • Network Capture β€” network-start, network-stop, network-get, network-apis, network-export (HAR/JSON)
  • Security Scanning β€” scan-xss, scan-sqli, scan-sensitive
  • Workflows β€” workflow, workflow-save, workflow-template β€” multi-step automation with variables and error handling
  • Sessions β€” save-session, restore-session, save-creds, auto-login, get-cookies, set-cookie
  • Tabs β€” tabs list/new/switch/close, add-extension
  • Device Emulation β€” 11 presets from iPhone SE to 4K desktop
  • Transcription β€” transcribe audio/video via Whisper

πŸ”§ Advanced Engines

  • Auto-Heal β€” Self-healing selectors: if a CSS selector breaks, finds element by nearby text automatically
  • Auto-Retry β€” Circuit breaker pattern with intelligent error classification and exponential backoff
  • Smart Wait β€” 7 wait strategies: element, network idle, JS condition, DOM stable, page load, composed
  • Session Recording β€” Record browser actions, replay them, export as workflows
  • Multi-Agent Hub β€” Shared browser sessions, task queues, distributed locks, shared memory between agents
  • Login Handoff β€” Pause AI, let human log in, resume with cookies. AI never sees passwords.
  • Proxy Rotation β€” Pool management, health checks, geo-targeting, 6 rotation strategies
  • LLM Provider β€” Built-in llm-complete, llm-summarize, llm-classify, llm-extract
  • AI Content Extraction β€” Structured data extraction with schema.org, forms, metadata
  • Query Router β€” Classify queries: does this need a browser? 3-tier routing (rules β†’ LLM β†’ conservative)

πŸ›‘οΈ Stealth Engine β€” 26+ Anti-Detection Vectors

  • navigator.webdriver removal (prototype-level, not defineProperty)
  • CDP detection blocking (property filtering + intercept)
  • WebGL/Canvas/Audio fingerprint spoofing (deterministic, per-session)
  • TLS fingerprint bypass (Chrome 145/146 impersonation via curl_cffi)
  • HTTP/2 fingerprint matching
  • WebRTC IP leak blocking
  • 40+ fingerprinting libraries blocked (FingerprintJS, ClientJS, BotD, CreepJS...)
  • 15+ bot detection vendors blocked (DataDome, PerimeterX, Cloudflare, Akamai, Kasada...)
  • Human-like mouse movements (Bezier curves), typing rhythms, scroll behavior

πŸ”Œ Connectors β€” All 199 Tools in Every Connector

Connector Tools Use With API Key?
MCP Passthrough ⭐ 199 Claude Desktop, Claude Code, Codex, any MCP agent ❌ No
MCP Server 199 Claude Desktop, Claude Code, Codex, any MCP agent Optional
OpenAI 199 GPT-4, GPT-4o, any OpenAI-compatible API Yes
Claude API 199 Claude API (tool-use format) Yes
OpenClaw 199 OpenClaw agent framework Optional
CLI (Bash) 198 Any language (Python, Node, Go, Rust...) Token
HTTP REST 198 Direct API calls Token

Quick Start

Option 1: One-Command Install (Recommended)

curl -sSL https://raw.githubusercontent.com/factspark23-hash/Agent-OS/main/install.sh | bash

With options:

# Custom token
curl -sSL .../install.sh | bash -s -- --token my-secret-token

# Show browser window (debugging)
curl -sSL .../install.sh | bash -s -- --headed

# Custom port
curl -sSL .../install.sh | bash -s -- --port 9000

# Install only, don't start
curl -sSL .../install.sh | bash -s -- --no-start

Option 2: MCP Passthrough (Zero API Key β€” Recommended for Claude/GPT)

# Start Agent-OS server
python3 main.py --agent-token "my-secret-token"

# In another terminal, start MCP wrapper
./run_mcp.sh --token "my-secret-token"

This starts both the Agent-OS server and the MCP passthrough wrapper. No API key needed β€” Claude Desktop/GPT handles reasoning, Agent-OS handles execution. 87% token savings via SmartCompressor.

Claude Desktop config:

{
  "mcpServers": {
    "agent-os": {
      "command": "python3",
      "args": ["/absolute/path/to/Agent-OS/connectors/mcp_passthrough.py"],
      "env": {
        "AGENT_OS_URL": "http://localhost:8001",
        "AGENT_OS_TOKEN": "my-secret-token"
      }
    }
  }
}

See MCP_WRAPPER_README.md for full documentation.

Option 3: Quickstart (Auto-Connect Everything)

curl -sSL https://raw.githubusercontent.com/factspark23-hash/Agent-OS/main/quickstart.sh | bash

This does everything install.sh does PLUS:

  • Auto-detects Claude Code, Codex, OpenClaw
  • Configures MCP connections automatically
  • Prints ready-to-use connection info

Option 4: Docker Compose

git clone https://github.com/factspark23-hash/Agent-OS.git
cd Agent-OS
export POSTGRES_PASSWORD="strong-db-password"
docker compose up -d
curl http://localhost:8001/health

Full stack: PostgreSQL + Redis + Agent-OS + Nginx reverse proxy.

Option 5: Manual Install

git clone https://github.com/factspark23-hash/Agent-OS.git
cd Agent-OS
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python3 -m patchright install chromium

# Generate JWT secret
export JWT_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(48))')

# Start server
python3 main.py --agent-token "your-token"

First Commands

# Check health
curl http://localhost:8001/health

# Navigate
curl -X POST http://localhost:8001/command \
  -H "Content-Type: application/json" \
  -d '{"token":"your-token","command":"navigate","url":"https://github.com"}'

# Screenshot
curl -X POST http://localhost:8001/command \
  -H "Content-Type: application/json" \
  -d '{"token":"your-token","command":"screenshot"}'

# Click by text (no CSS selector)
curl -X POST http://localhost:8001/command \
  -H "Content-Type: application/json" \
  -d '{"token":"your-token","command":"smart-click","text":"Sign in"}'

AI Platform Connectors

1. MCP Passthrough (Zero API Key) ⭐ Recommended

Drop-in MCP server that works without any LLM API key. The MCP client's LLM (Claude, GPT-4) handles reasoning β€” Agent-OS handles execution.

# Start server + wrapper
./run_mcp.sh --token "my-secret-token"

Claude Desktop config:

{
  "mcpServers": {
    "agent-os": {
      "command": "python3",
      "args": ["/absolute/path/to/Agent-OS/connectors/mcp_passthrough.py"],
      "env": {
        "AGENT_OS_URL": "http://localhost:8001",
        "AGENT_OS_TOKEN": "my-secret-token",
        "AGENT_OS_COMPRESS": "aggressive"
      }
    }
  }
}

Features:

  • 199 tools β€” 192 browser + 7 LLM (built-in rule-based, no API key)
  • SmartCompressor β€” 87% token savings on browser results
  • Configurable compression: aggressive / normal / off
  • Works standalone (LLM tools work without Agent-OS server)
  • Helpful error messages when server is down

See MCP_WRAPPER_README.md for full docs.

2. MCP Server (Original)

Add to your config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json Claude Code: ~/.claude/claude_desktop_config.json

{
  "mcpServers": {
    "agent-os": {
      "command": "python3",
      "args": ["/absolute/path/to/Agent-OS/connectors/mcp_server.py"],
      "env": {
        "AGENT_OS_URL": "http://localhost:8001",
        "AGENT_OS_TOKEN": "your-token"
      }
    }
  }
}

Restart Claude Desktop / Claude Code β€” 199 browser tools appear automatically.

3. OpenAI / GPT-4

from connectors.openai_connector import get_tools, call_tool

# Get tool definitions
tools = get_tools("openai")  # 199 tools

# Use with OpenAI API
result = await call_tool("browser_navigate", {"url": "https://github.com"})
result = await call_tool("browser_screenshot", {})
result = await call_tool("browser_smart_click", {"text": "Sign In"})

4. Claude API (Tool-Use)

from connectors.openai_connector import get_tools, call_tool

# Get tools in Claude format
tools = get_tools("claude")  # 199 tools

5. OpenClaw

from connectors.openclaw_connector import get_manifest, execute_tool

manifest = get_manifest()  # 199 tools
result = await execute_tool("browser_navigate", {"url": "https://example.com"})

6. CLI / Bash / Any Language

export AGENT_OS_TOKEN="your-token"

./connectors/agent-os-tool.sh navigate "https://github.com"
./connectors/agent-os-tool.sh screenshot
./connectors/agent-os-tool.sh smart-click "Sign In"
./connectors/agent-os-tool.sh status

Run without arguments to see all 198 commands.

7. Direct REST API

# WebSocket port: 8000
# HTTP port: 8001
# Debug UI: 8002 (with --debug)

curl -X POST http://localhost:8001/command \
  -H "Content-Type: application/json" \
  -d '{"token":"your-token","command":"navigate","url":"https://example.com"}'

Commands Reference

All 198 server commands, organized by category:

Category Commands Count
Navigation navigate, smart-navigate, back, forward, reload, route 6
Interaction click, double-click, right-click, context-action, hover, type, press, fill-form, clear-input, select, upload, checkbox, drag-drop, drag-offset, scroll, wait, viewport 17
Smart Finder smart-find, smart-find-all, smart-click, smart-fill 4
Content get-content, get-dom, screenshot, get-links, get-images, get-text, get-attr, evaluate-js, console-logs 9
Page Analysis page-summary, page-tables, page-seo, page-structured, page-emails, page-phones, page-accessibility, analyze, analyze-search 9
Network network-start, network-stop, network-get, network-apis, network-detail, network-stats, network-export, network-clear 8
Security scan-xss, scan-sqli, scan-sensitive 3
Workflows workflow, workflow-save, workflow-template, workflow-list, workflow-status, workflow-json 6
Sessions save-session, restore-session, list-sessions, delete-session, save-creds, auto-login, get-cookies, set-cookie 8
Tabs & Device tabs, add-extension, emulate-device, list-devices 4
Proxy set-proxy, get-proxy 2
Proxy Rotation proxy-add/remove/list/check/check-all/rotate/stats/enable/disable/strategy/save/load/load-file/load-api/record/get 16
Smart Wait smart-wait, smart-wait-element/network/js/dom/page/compose 7
Auto-Heal heal-click/fill/hover/double-click/wait/selector/stats/clear/fingerprint/fingerprint-page 10
Auto-Retry retry-navigate/click/fill/execute/api-call/stats/health/circuit-breakers/reset-circuit/reset-all-circuits 10
Recording record-start/stop/pause/resume/status/list/delete/annotate 8
Replay replay-play/stop/pause/resume/step/jump/position/events/load/export-workflow 10
Multi-Agent Hub hub-register/unregister/agents/status/broadcast/handoff/heartbeat/lock/unlock/locks/events/audit 12
Hub Tasks hub-task-create/claim/start/complete/fail/cancel/tasks 7
Hub Memory hub-memory-set/get/list/delete 4
Login Handoff login-handoff-start/status/complete/cancel/list/stats/history, detect-login-page 8
TLS HTTP fetch, tls-get, tls-post, tls-stats 4
LLM llm-complete/summarize/classify/extract/provider-set/token-usage/cache-clear 7
AI Content ai-content, fill-job, structured-extract/format/schema/deduplicate 6
CAPTCHA captcha-assess/preflight/health/monitor-start/monitor-stop/shutdown 6
Query Router classify-query, needs-web, query-strategy, router-stats, nav-stats 5
Transcription transcribe 1
Status health 1
Total 198

Configuration

Environment Variables

Variable Required Default Description
JWT_SECRET_KEY Production Auto-generated JWT signing key. Set for persistent sessions.
POSTGRES_PASSWORD Docker β€” PostgreSQL password.
AGENT_TOKEN Optional Auto-generated Legacy auth token.
DATABASE_DSN Optional β€” PostgreSQL connection string.
REDIS_URL Optional β€” Redis URL for distributed rate limiting.
PROXY_URL Optional β€” HTTP/SOCKS5 proxy URL.
SWARM_PROVIDER_API_KEY Optional β€” API key for LLM-based query routing.

CLI Arguments

python3 main.py \
  --agent-token "my-token" \       # Auth token
  --port 8000 \                     # WebSocket port (HTTP = port+1)
  --headed \                        # Show browser window
  --max-ram 500 \                   # RAM limit in MB
  --proxy "http://proxy:8080" \     # Proxy
  --device iphone_14 \              # Device preset
  --persistent \                    # Production mode (persistent Chromium)
  --database "postgresql+asyncpg://..." \
  --redis "redis://localhost:6379/0" \
  --json-logs \                     # JSON structured logging
  --debug \                         # Debug UI (port+2)
  --swarm \                         # Enable query routing
  --create-tables                   # Create DB tables on startup

Device Presets

Preset Type Viewport
iphone_se Mobile 375Γ—667
iphone_14 Mobile 390Γ—844
iphone_14_pro_max Mobile 430Γ—932
galaxy_s23 Mobile 360Γ—780
pixel_8 Mobile 412Γ—915
ipad Tablet 768Γ—1024
ipad_pro Tablet 1024Γ—1366
desktop_1080 Desktop 1920Γ—1080
desktop_1440 Desktop 2560Γ—1440
desktop_4k Desktop 3840Γ—2160

Ports

Port Service
8000 WebSocket (agent connections)
8001 HTTP REST API
8002 Debug UI (only with --debug)

Authentication

3-layer auth system, checked in order:

Layer 1: JWT Tokens (Recommended)

# Register
curl -X POST http://localhost:8001/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email":"you@example.com","username":"admin","password":"StrongPass123!"}'

# Login
curl -X POST http://localhost:8001/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"StrongPass123!"}'

# Use JWT
curl -X POST http://localhost:8001/command \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
  -H "Content-Type: application/json" \
  -d '{"command":"navigate","url":"https://example.com"}'

Layer 2: API Keys

# Create key (via JWT)
curl -X POST http://localhost:8001/auth/api-keys \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{"name":"my-key","scopes":["browser"]}'

# Use key (header)
curl -X POST http://localhost:8001/command \
  -H "X-API-Key: aos_your_key" \
  -H "Content-Type: application/json" \
  -d '{"command":"navigate","url":"https://example.com"}'

Layer 3: Legacy Tokens (Development Only)

python3 main.py --agent-token "my-dev-token"

# Use in requests
curl -X POST http://localhost:8001/command \
  -H "Content-Type: application/json" \
  -d '{"token":"my-dev-token","command":"navigate","url":"https://example.com"}'

Stealth and Anti-Detection

Agent-OS defeats bot detection with a 4-layer defense system:

Layer 1: Network

  • Chrome TLS fingerprint (JA3/JA4) via curl_cffi
  • HTTP/2 fingerprint matching (Chrome 145/146)
  • Bot detection scripts blocked at network level
  • Fake success responses for reCAPTCHA/hCaptcha

Layer 2: CDP (Chrome DevTools Protocol)

  • Page.addScriptToEvaluateOnNewDocument injection
  • User-Agent metadata spoofing
  • Timezone and locale override

Layer 3: JavaScript (19 injection modules)

  • navigator.webdriver removal (prototype-level)
  • CDP property filtering
  • Chrome object completeness (runtime, app, csi, loadTimes)
  • WebGL/Canvas/Audio fingerprint spoofing
  • WebRTC IP leak prevention
  • Function toString masking
  • Stack trace sanitization

Layer 4: Behavior

  • Bezier-curve mouse movements with micro-tremor
  • Realistic typing rhythms (40-300ms per keystroke)
  • Word pause simulation (200-600ms)
  • Typo simulation with correction (3% rate)
  • Natural scroll with micro-variance

Blocked Vendors

DataDome, PerimeterX, Imperva, Akamai, Cloudflare Bot Management, Cloudflare Turnstile, Kasada, Shape Security, F5, Arkose Labs, ThreatMetrix, hCaptcha, reCAPTCHA


Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  External Clients                                            β”‚
β”‚  Claude Desktop β”‚ GPT-4 β”‚ Codex β”‚ CLI β”‚ HTTP/WS            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”¬β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚            β”‚         β”‚      β”‚         β”‚
         β–Ό            β–Ό         β–Ό      β–Ό         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Connectors (199 tools each)                                β”‚
β”‚  MCP β”‚ OpenAI β”‚ Claude β”‚ OpenClaw β”‚ CLI β”‚ REST+WebSocket   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”¬β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚          β”‚          β”‚        β”‚         β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Agent Server (aiohttp)                                     β”‚
β”‚  Auth β”‚ Rate Limiter β”‚ Validator β”‚ Command Router           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό          β–Ό          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Browser      β”‚ β”‚ Tools Layer  β”‚ β”‚ Infrastructureβ”‚
β”‚ (Patchright  β”‚ β”‚ Smart Finder β”‚ β”‚ PostgreSQL   β”‚
β”‚  + Stealth)  β”‚ β”‚ Workflows    β”‚ β”‚ Redis        β”‚
β”‚ 26+ vectors  β”‚ β”‚ Auto-Heal    β”‚ β”‚ JWT Auth     β”‚
β”‚              β”‚ β”‚ Recording    β”‚ β”‚ Logging      β”‚
β”‚              β”‚ β”‚ LLM Provider β”‚ β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Production Deployment

Docker Compose (Recommended)

export JWT_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(48))')
export POSTGRES_PASSWORD="strong-db-password"
docker compose --profile with-nginx up -d

Production Checklist

  • Set JWT_SECRET_KEY (without it, sessions don't survive restarts)
  • Use JWT + API keys (legacy tokens are dev-only)
  • Run behind firewall (use Nginx + SSL for public exposure)
  • Enable Redis for distributed rate limiting
  • Set up PostgreSQL for multi-instance deployments
  • Monitor RAM (~500MB idle, ~800MB under load)
  • Configure CORS for your domain

Scaling

Config Concurrent Users Memory
1 instance Γ— 50 contexts 50 ~800 MB
3 instances Γ— 50 contexts 150 ~2.4 GB
5 instances Γ— 50 contexts 250 ~4 GB

Project Structure

Agent-OS/
β”œβ”€β”€ main.py                          # Entry point
β”œβ”€β”€ install.sh                       # One-command installer
β”œβ”€β”€ quickstart.sh                    # Install + auto-connect
β”œβ”€β”€ docker-compose.yml               # Full Docker stack
β”œβ”€β”€ Dockerfile                       # Multi-stage build
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ alembic.ini                      # DB migrations
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ core/                        # Browser engine
β”‚   β”‚   β”œβ”€β”€ browser.py               #   Main browser (Patchright/Chromium)
β”‚   β”‚   β”œβ”€β”€ http_client.py           #   TLS HTTP client (curl_cffi)
β”‚   β”‚   β”œβ”€β”€ stealth.py               #   Anti-detection JS
β”‚   β”‚   β”œβ”€β”€ cdp_stealth.py           #   CDP-level stealth
β”‚   β”‚   β”œβ”€β”€ stealth_god.py           #   GOD MODE (26+ vectors)
β”‚   β”‚   β”œβ”€β”€ tls_spoof.py             #   TLS fingerprint spoofing
β”‚   β”‚   β”œβ”€β”€ tls_proxy.py             #   TLS proxy
β”‚   β”‚   β”œβ”€β”€ smart_navigator.py       #   Auto HTTP/browser strategy
β”‚   β”‚   β”œβ”€β”€ persistent_browser.py    #   Production persistent Chromium
β”‚   β”‚   β”œβ”€β”€ llm_provider.py          #   LLM integration
β”‚   β”‚   β”œβ”€β”€ config.py                #   Configuration (YAML)
β”‚   β”‚   └── session.py               #   Session management
β”‚   β”‚
β”‚   β”œβ”€β”€ auth/                        # Authentication
β”‚   β”‚   β”œβ”€β”€ jwt_handler.py           #   JWT (HS256)
β”‚   β”‚   β”œβ”€β”€ api_key_manager.py       #   API keys (aos_ prefix)
β”‚   β”‚   β”œβ”€β”€ user_manager.py          #   User registration
β”‚   β”‚   └── middleware.py            #   3-layer auth middleware
β”‚   β”‚
β”‚   β”œβ”€β”€ security/                    # Stealth & Evasion
β”‚   β”‚   β”œβ”€β”€ evasion_engine.py        #   Fingerprint generation
β”‚   β”‚   β”œβ”€β”€ captcha_bypass.py        #   CAPTCHA detection
β”‚   β”‚   β”œβ”€β”€ captcha_solver.py        #   CAPTCHA solving
β”‚   β”‚   β”œβ”€β”€ captcha_preempt.py       #   CAPTCHA preemption
β”‚   β”‚   β”œβ”€β”€ cloudflare_bypass.py     #   Cloudflare bypass
β”‚   β”‚   └── human_mimicry.py         #   Human behavior simulation
β”‚   β”‚
β”‚   β”œβ”€β”€ tools/                       # Feature engines
β”‚   β”‚   β”œβ”€β”€ smart_finder.py          #   Find by visible text
β”‚   β”‚   β”œβ”€β”€ workflow.py              #   Multi-step workflows
β”‚   β”‚   β”œβ”€β”€ network_capture.py       #   Network request capture
β”‚   β”‚   β”œβ”€β”€ page_analyzer.py         #   Page analysis
β”‚   β”‚   β”œβ”€β”€ form_filler.py           #   Form filling
β”‚   β”‚   β”œβ”€β”€ auto_heal.py             #   Self-healing selectors
β”‚   β”‚   β”œβ”€β”€ auto_retry.py            #   Auto-retry + circuit breaker
β”‚   β”‚   β”œβ”€β”€ session_recording.py     #   Record & replay
β”‚   β”‚   β”œβ”€β”€ multi_agent.py           #   Multi-agent hub
β”‚   β”‚   β”œβ”€β”€ proxy_rotation.py        #   Proxy pool management
β”‚   β”‚   β”œβ”€β”€ login_handoff.py         #   Human-in-the-loop login
β”‚   β”‚   β”œβ”€β”€ ai_content.py            #   AI content extraction
β”‚   β”‚   β”œβ”€β”€ web_query_router.py      #   Query classification
β”‚   β”‚   └── transcriber.py           #   Audio/video transcription
β”‚   β”‚
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   └── server.py                # WebSocket + HTTP server (198 commands)
β”‚   β”‚
β”‚   β”œβ”€β”€ agent_swarm/                 # Query routing system
β”‚   β”‚   β”œβ”€β”€ router/                  #   3-tier router
β”‚   β”‚   β”œβ”€β”€ agents/                  #   Agent profiles
β”‚   β”‚   └── search/                  #   Search backends
β”‚   β”‚
β”‚   β”œβ”€β”€ infra/                       # Infrastructure
β”‚   β”‚   β”œβ”€β”€ database.py              #   PostgreSQL (async)
β”‚   β”‚   β”œβ”€β”€ redis_client.py          #   Redis
β”‚   β”‚   └── logging.py               #   Structured logging
β”‚   β”‚
β”‚   └── validation/
β”‚       └── schemas.py               # Input validation (Pydantic v2)
β”‚
β”œβ”€β”€ connectors/                      # AI Platform Connectors
β”‚   β”œβ”€β”€ _tool_registry.py            #   199 tool definitions (source of truth)
β”‚   β”œβ”€β”€ mcp_server.py                #   MCP (Claude/Codex)
β”‚   β”œβ”€β”€ openai_connector.py          #   OpenAI function-calling
β”‚   β”œβ”€β”€ openclaw_connector.py        #   OpenClaw
β”‚   β”œβ”€β”€ agent-os-tool.sh             #   CLI (198 commands)
β”‚   └── mcp_config.json              #   MCP config template
β”‚
β”œβ”€β”€ web/                             # React Web UI
β”‚   β”œβ”€β”€ src/                         #   React 18 + TypeScript + TailwindCSS
β”‚   └── vite.config.ts               #   Vite 6
β”‚
└── tests/                           # Test suite

Development

git clone https://github.com/factspark23-hash/Agent-OS.git
cd Agent-OS
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python3 -m patchright install chromium
python3 main.py --headed --debug --agent-token "dev-token"

Testing

python3 -m pytest tests/ -v

Troubleshooting

Problem Solution
Port in use python3 main.py --port 9000
Chromium not found python3 -m patchright install chromium
JWT warning export JWT_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe(48))')
Auth failed Check token in startup logs or .env
Site detects bot Try --device iphone_14 or add --proxy
High RAM python3 main.py --max-ram 500
DB connection error Check docker compose logs postgres

Tech Stack

Component Technology
Browser Patchright (stealth Playwright) + Chromium
HTTP Client curl_cffi (Chrome TLS fingerprint)
Database PostgreSQL (SQLAlchemy async)
Cache Redis (with in-memory fallback)
Auth JWT (HS256) + API keys + legacy tokens
Web UI React 18 + Vite 6 + TypeScript + TailwindCSS
Validation Pydantic v2
Logging structlog
Runtime Python 3.10+ / asyncio

License

MIT License β€” free for commercial and personal use.