AURA — Autonomous User-Responsive Agent

A voice-controlled, multi-agent AI system that sees, understands, plans, and acts on your Android device in real-time.

What is AURA?

AURA is a production-grade Android UI automation backend. A user speaks a natural language command — AURA captures the screen, parses the UI tree, plans a series of atomic actions, executes gestures on the real device via Android Accessibility API, and speaks a natural response back.

Example: "Open Spotify and play my liked songs" → AURA opens the app, locates the Liked Songs button visually, taps it, starts playback, and says "Done — your liked songs are playing."

Built as a submission to the Gemini Live Agent Challenge, AURA integrates Google ADK, Gemini Live bidirectional audio+vision, and Cloud Run deployment.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        USER SPEAKS COMMAND                          │
│              (Android companion app, mic button)                    │
└─────────────────────────┬───────────────────────────────────────────┘
                          │ WebSocket  /ws/live  or  /ws/audio
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     FastAPI  +  LangGraph                           │
│                                                                     │
│  Audio → [STT: Groq Whisper] → transcript                           │
│                ↓                                                    │
│  [Commander Agent] → IntentObject (action, target, confidence)      │
│                ↓                                                    │
│  [Safety: Llama Prompt Guard 2 86M]                                 │
│                ↓                                                    │
│  Smart Router ─────────────────────────────────────────────────┐   │
│     NO_UI actions → Coordinator (skip perception)              │   │
│     Complex/multi-step → Coordinator                           │   │
│     UI actions → Perception → Coordinator                      │   │
│                                                                 │   │
│  ┌──────────────────────────────────────────────────────────┐  │   │
│  │              COORDINATOR  (perceive→decide→act→verify)   │  │   │
│  │                                                          │  │   │
│  │  ┌─────────┐   ┌─────────┐   ┌───────┐   ┌──────────┐  │  │   │
│  │  │Perceiver│ → │ Planner │ → │ Actor │ → │Verifier  │  │  │   │
│  │  │UI+Vision│   │ Phases  │   │ Zero- │   │Post-state│  │  │   │
│  │  │  SoM    │   │Reactive │   │  LLM  │   │ capture  │  │  │   │
│  │  └─────────┘   └─────────┘   └───────┘   └──────────┘  │  │   │
│  │       ↑                           │                      │  │   │
│  │       └────── 5-step Retry ───────┘                      │  │   │
│  │               Ladder + Replan                            │  │   │
│  └──────────────────────────────────────────────────────────┘  │   │
│                ↓                                                 │   │
│  [Responder Agent] → natural language reply                     │   │
│                ↓                                                    │
│  [TTS: Edge-TTS en-US-AriaNeural] → WAV audio                      │
└─────────────────────────────────────────────────────────────────────┘
                          │
                          ▼
              Android device executes gesture
              via Accessibility API (no root)

Request lifecycle:

Audio arrives over WebSocket → STT transcription via Groq Whisper Large v3 Turbo
Intent parsed by CommanderAgent (rule-based + LLM fallback)
Safety screened by Llama Prompt Guard 2 86M
Smart routing based on action type and complexity
Coordinator drives 9 agents through the perceive→decide→act→verify loop
Gesture executed via Android Accessibility API after OPA Rego policy check
Response spoken via Edge-TTS (Microsoft, no API key required)

The 9 Agents

All agents are single-responsibility — located in agents/.

Agent	File	Responsibility
Commander	`commander.py`	Parses voice transcript → structured `IntentObject` using rule-based classifier (fast) with LLM fallback
Planner	`planner_agent.py`	Decomposes goal into skeleton phases + ordered atomic `Subgoal` list (max 12 words each)
Perceiver	`perceiver_agent.py`	Captures screenshot + UI tree → `ScreenState` with Set-of-Marks annotations; never returns raw pixel coordinates
Actor	`actor_agent.py`	Zero-LLM deterministic gesture executor (tap, type, scroll, swipe, open_app, etc.)
Responder	`responder.py`	Generates natural TTS-ready conversational responses via LLM; strips markdown for clean speech
Verifier	`verifier_agent.py`	Waits for UI to settle post-gesture, captures post-action state, detects error dialogs
Validator	`validator.py`	Fast rule-based intent validation — no LLM calls, zero latency
ScreenVLM	`visual_locator.py`	Tri-layer visual perception: UI tree → YOLOv8 CV → VLM selection from numbered Set-of-Marks elements
Coordinator	`coordinator.py`	Orchestrates all 8 agents through perceive→decide→act→verify with 5-step retry ladder and adaptive replanning

Critical Invariant

The VLM never returns raw pixel coordinates. It only selects among numbered Set-of-Marks elements detected by YOLOv8. This invariant must never be broken.

LangGraph Orchestration

File: aura_graph/graph.py

The graph is a StateGraph(TaskState) with 15 nodes, compiled via compile_aura_graph() at server startup.

__start__
    │
    ├─ audio → [stt] → [parse_intent]
    └─ text/streaming → [parse_intent]
                            │
            ┌───────────────┼───────────────────┐
            │               │                   │
        [coordinator]  [perception]→[coordinator]  [speak]
            │                                   │
            └───────────────────────────────────┘
                            │
                         [speak] → END

Conditional routing (edges.py):

NO_UI actions (open_app, scroll, system) → coordinator directly (skip perception)
Multi-step commands (contains "and", "then") → coordinator
Conversational intents → speak directly
UI actions → perception → coordinator
Low confidence / general_interaction → coordinator for full planning

Retry loop (within coordinator):

perceive → decide → act → verify
    ↑                          │
    └── 5-step retry ladder ───┘
         1. SAME_ACTION          (retry exact)
         2. ALTERNATE_SELECTOR   (different element)
         3. SCROLL_AND_RETRY     (scroll to find)
         4. VISION_FALLBACK      (VLM coordinate mode)
         5. ABORT → replan (max 3 replans before giving up)

State (aura_graph/state.py): TaskState TypedDict with ~40 fields, custom LangGraph reducers:

error_message — accumulates (joins multiple errors with ;)
status — last-writer-wins
current_step — takes maximum value (concurrent-safe)
end_time — first-writer-wins (preserves actual completion time)

Tri-Provider Model Architecture

AURA uses a tri-provider strategy: Groq (primary, speed), Gemini (fallback, quality), NVIDIA NIM (optional, scale).

LLM Models

Task	Provider	Model	Notes
Intent parsing	Groq	`llama-3.1-8b-instant`	560 T/s, <300 tokens
Low-confidence intent	Groq	`llama-3.3-70b-versatile`	fallback
Planning / reasoning	Groq	`meta-llama/llama-4-scout-17b-16e-instruct`	16 experts
Planning fallback	Gemini	`gemini-2.5-flash`
Response generation	Groq	`meta-llama/llama-4-scout-17b-16e-instruct`
Intent classification	OpenRouter	`z-ai/glm-4.5-air:free`	free tier
Intent classif. fallback	OpenRouter	`meta-llama/llama-3.3-70b-instruct:free`
Intent classif. fallback 2	Groq	`llama-3.3-70b-versatile`
Safety screening	Groq	`meta-llama/llama-prompt-guard-2-86m`	specialized
ADK root agent	Google ADK	`gemini-2.5-flash`
Gemini Live bidi	Google	`gemini-2.0-flash-live-001`

VLM Models

Task	Provider	Model
UI analysis / screen understanding	Groq	`meta-llama/llama-4-scout-17b-16e-instruct`
Visual element selection (SoM)	Groq	`meta-llama/llama-4-scout-17b-16e-instruct`
VLM fallback	Gemini	`gemini-2.5-flash`

STT / TTS

Service	Provider	Model / Voice
Speech-to-Text	Groq	`whisper-large-v3-turbo` (16 kHz PCM mono)
Text-to-Speech	Edge-TTS (Microsoft)	`en-US-AriaNeural` (default, no API key)
Gemini Live voice	Google	`Charon` (configurable)

Perception Pipeline

Files: services/perception_controller.py, perception/omniparser_detector.py, perception/vlm_selector.py

Three-layer hybrid — each layer only runs if the previous is insufficient:

Layer 1: Android Accessibility UI Tree
  → Package name, activity, all interactive elements
  → Fast (~50 ms), but misses visual-only elements

Layer 2: YOLOv8 OmniParser (CV Detection)
  → Detects clickable elements in screenshot
  → Draws numbered Set-of-Marks boxes on image
  → Pixel-accurate without returning coordinates

Layer 3: VLM Selection (ScreenVLM)
  → Receives annotated screenshot with numbered elements
  → Returns index of target element (never raw coordinates)
  → Falls back to Gemini 2.5 Flash if Groq fails

Perception modalities (configurable via DEFAULT_PERCEPTION_MODALITY):

hybrid — UI tree + vision (default, most reliable)
ui_tree — fast path for settings-style apps
vision — screenshot-only for canvas/WebView apps
auto — controller selects based on app type

Caching: Screenshots cached for 2 s (configurable), invalidated after 1 gesture.

Reactive Step Generator

File: services/reactive_step_generator.py

Instead of committing to a full upfront plan (which breaks when screens deviate), AURA generates one concrete action at a time grounded in the live screen.

async def generate_next_step(
    goal,           # what user wants to accomplish
    screen_context, # current screen description
    step_history,   # what was done so far
    screenshot_b64, # actual screenshot bytes
    ui_hints,       # UI tree labels
) -> Subgoal:       # ONE next action

The planner creates skeleton phases (e.g., "Open Spotify", "Navigate to Liked Songs", "Start Playback"). For each phase, the reactive generator asks a VLM: "given the current screen and this phase goal, what is the single next UI action?" — grounding every decision in real screen state.

Google Cloud Architecture

┌─────────────────────────────────────────────────────────┐
│                    Google Cloud                          │
│                                                         │
│  ┌──────────────────┐    ┌───────────────────────────┐  │
│  │   Cloud Run      │    │   Cloud Storage           │  │
│  │  aura-backend    │───▶│  aura-execution-logs/     │  │
│  │  (this server)   │    │  logs/{task_id}.html      │  │
│  └────────┬─────────┘    └───────────────────────────┘  │
│           │                                             │
│  ┌────────▼─────────┐                                   │
│  │   Google ADK     │                                   │
│  │  root_agent      │                                   │
│  │  gemini-2.5-flash│                                   │
│  │  + FunctionTool  │                                   │
│  │  execute_aura_   │                                   │
│  │    task()        │                                   │
│  └────────┬─────────┘                                   │
│           │                                             │
│  ┌────────▼─────────┐                                   │
│  │   Gemini Live    │                                   │
│  │  /ws/live        │                                   │
│  │  gemini-2.0-     │                                   │
│  │  flash-live-001  │                                   │
│  │  Bidi audio+vis  │                                   │
│  └──────────────────┘                                   │
└─────────────────────────────────────────────────────────┘

ADK Root Agent (`adk_agent.py`)

root_agent = Agent(
    name="AURA",
    model="gemini-2.5-flash",
    tools=[aura_tool],   # FunctionTool wrapping execute_aura_task_from_text()
)

Lazy initialization — set_compiled_graph(app) must be called from main.py lifespan before any tool invocation. The tool returns success, response, steps_taken, and execution_log_url.

Gemini Live Bidirectional Streaming (`adk_streaming_server.py`)

Enabled when GEMINI_LIVE_ENABLED=true. Registers /ws/live in main.py.

Features:

Full Voice Activity Detection (prefix_padding_ms=160, silence_duration_ms=650, high start/end sensitivity)
Barge-in support (START_OF_ACTIVITY_INTERRUPTS)
Thinking content filter — strips **Bold** reasoning headers from model output
Transcript accumulation across sub-turns until turn_complete
Non-blocking live request queue (audio + screenshot frames)

Message protocol (Android ↔ server):

Direction	Type	Payload
Android → Server	`audio_chunk`	PCM 16 kHz mono int16, base64
Android → Server	`screenshot`	JPEG base64
Android → Server	`text_command`	plain text fallback
Server → Android	`audio_response`	PCM 24 kHz mono int16, base64
Server → Android	`transcript`	incremental + final text
Server → Android	`task_progress`	`"executing"` or `"idle"`

GCS Execution Logs (`gcs_log_uploader.py`)

After every task, the HTML execution log is uploaded to Cloud Storage:

log_url = await upload_log_to_gcs_async(log_path, task_id)
# → gs://aura-execution-logs/logs/{task_id}.html (public URL)

Non-fatal: failures are logged as warnings only. Disabled by default (GCS_LOGS_ENABLED=false).

Cloud Run Deployment

gcloud run deploy aura-backend \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 2Gi \
  --cpu 2 \
  --timeout 3600 \
  --set-secrets="GOOGLE_API_KEY=...,GROQ_API_KEY=...,GEMINI_API_KEY=..."

The server reads $PORT from Cloud Run automatically via Pydantic Settings. YOLOv8 is pre-warmed at Docker build time to eliminate cold-start latency.

Android Companion App

Located in UI/. Kotlin + Jetpack Compose.

GeminiLiveController (`voice/GeminiLiveController.kt`)

Handles continuous voice capture and WebSocket communication:

companion object {
    const val SAMPLE_RATE = 16000           // 16 kHz
    const val AUDIO_FORMAT = PCM_16BIT      // int16 mono
    const val CHUNK_MS = 100                // 100 ms chunks
    const val SCREENSHOT_INTERVAL_MS = 3000 // every 3 s
    const val UI_TREE_INTERVAL_MS = 5000    // every 5 s
    const val PING_INTERVAL_MS = 25_000     // keepalive
}

Continuous listen mode — no push-to-talk button required
Audio chunks encoded as base64 and sent as WebSocket frames
Screenshots and UI tree sent on independent timers
Auto-silence detection after 8 s of post-response inactivity

ConversationViewModel (`conversation/ConversationViewModel.kt`)

Manages conversation state as StateFlow:

ConversationState — current phase, session ID, connection status
List<ConversationMessage> — full message history
List<AgentOutput> — per-agent status updates (for debug overlay)

WebSocket Endpoints

Endpoint	Direction	Purpose	Notes
`ws://host/ws/audio`	Bidi	Legacy voice command streaming	Android app dependency — path must not change
`ws://host/ws/device`	Bidi	Device screenshot + UI tree polling	Android app dependency — path must not change
`ws://host/ws/live`	Bidi	Gemini Live bidi audio+vision	Requires `GEMINI_LIVE_ENABLED=true`
`ws://host/api/v1/tasks/ws`	Server→Client	Live task event streaming	For demo dashboard

REST API

Endpoint	Method	Description
`GET /health`	GET	Health check (legacy)
`GET /api/v1/health`	GET	Health check (versioned)
`POST /api/v1/graph/execute`	POST	Execute task from text
`GET /api/v1/device/screenshot`	GET	Live screenshot
`GET /demo`	GET	Judge dashboard (live screenshot, recent commands, GCS log links)
`GET /docs`	GET	OpenAPI docs (development only)

Installation

Prerequisites

Python 3.11+
Android device with USB debugging + Accessibility Service enabled
adb in PATH
Groq API key (required), Gemini API key (required), others optional

Setup

git clone <repo>
cd aura-live

# Install dependencies (note: filename has a space)
pip install -r "requirements copy.txt"

# Copy and fill in your API keys
cp .env.example .env

Configuration

All settings flow through config/settings.py (Pydantic BaseSettings). Never read os.environ directly.

Create a .env file:

# ── Required ──────────────────────────────────────
GROQ_API_KEY=gsk_...
GEMINI_API_KEY=AIza...
GOOGLE_API_KEY=AIza...          # same key, needed by google-genai SDK

# ── Providers (defaults shown) ────────────────────
DEFAULT_LLM_PROVIDER=groq
DEFAULT_VLM_PROVIDER=groq
DEFAULT_STT_PROVIDER=groq
DEFAULT_TTS_PROVIDER=edge-tts
PLANNING_PROVIDER=groq

# ── Models (defaults shown) ───────────────────────
DEFAULT_LLM_MODEL=llama-3.1-8b-instant
DEFAULT_VLM_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
PLANNING_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
DEFAULT_STT_MODEL=whisper-large-v3-turbo
DEFAULT_TTS_MODEL=en-US-AriaNeural

# ── Perception ────────────────────────────────────
DEFAULT_PERCEPTION_MODALITY=hybrid   # ui_tree | hybrid | vision | auto
PERCEPTION_CACHE_ENABLED=true
PERCEPTION_CACHE_TTL=2.0

# ── Google Cloud (for Gemini Live + GCS logs) ─────
GOOGLE_CLOUD_PROJECT=your-gcp-project
GOOGLE_CLOUD_REGION=us-central1
GCS_LOGS_BUCKET=aura-execution-logs
GCS_LOGS_ENABLED=false           # set true to upload HTML logs
GEMINI_LIVE_ENABLED=false        # set true to enable /ws/live
GEMINI_LIVE_MODEL=gemini-2.0-flash-live-001
GEMINI_LIVE_VOICE=Charon         # Aoede | Charon | Fenrir | Kore | Puck | ...

# ── Optional providers ────────────────────────────
NVIDIA_API_KEY=...
OPENROUTER_API_KEY=...

# ── LangGraph limits ──────────────────────────────
GRAPH_RECURSION_LIMIT=100
GRAPH_TIMEOUT_SECONDS=120.0

# ── Server ────────────────────────────────────────
HOST=0.0.0.0
PORT=8000
ENVIRONMENT=development
LOG_LEVEL=DEBUG
RELOAD=true

# ── Security ──────────────────────────────────────
REQUIRE_API_KEY=true
DEVICE_API_KEY=your-secret-key

Running

# Start the server
python main.py
# → http://0.0.0.0:8000
# → Docs: http://localhost:8000/docs
# → Health: GET http://localhost:8000/health
# → Demo dashboard: http://localhost:8000/demo

Utility scripts

python scripts/test_commander_live.py      # test intent parsing live
python scripts/test_sensitive_blocking.py  # test OPA policy blocking
python scripts/view_ui_tree.py             # inspect device UI tree
python scripts/dead_code_scanner.py        # scan for unused code

Tests

pytest tests/
pytest tests/test_foo.py::test_bar         # single test

Cloud Run Deployment

# Build + deploy from source
gcloud run deploy aura-backend \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 2Gi \
  --cpu 2 \
  --timeout 3600 \
  --set-secrets="GOOGLE_API_KEY=projects/.../secrets/GOOGLE_API_KEY/versions/latest,GROQ_API_KEY=...,GEMINI_API_KEY=..."

# Verify
curl https://aura-backend-xxx-uc.a.run.app/health

The Dockerfile pre-warms YOLOv8 at build time so the first real VLM call has no model-load latency. The server reads $PORT automatically from Cloud Run's injected environment variable.

Safety & Policies

Dual-layer safety:

Llama Prompt Guard 2 86M (services/prompt_guard.py) — screens every voice input before intent parsing. Blocks jailbreaks, prompt injections, and harmful commands. Fail-safe: allows on API error.
OPA Rego Policies (policies/, services/policy_engine.py) — gates every single gesture execution. Policies check:
- Action type (send message, make purchase, delete data require confirmation)
- Device state (locked screen, accessibility disabled)
- Target app context

Both layers are fail-safe (allow on error) so transient API failures don't block legitimate commands.

Critical Invariants

VLM never returns pixel coordinates — only selects from numbered SoM elements
5-stage retry ladder runs before any replanning
Every gesture passes through OPA policy check in gesture_executor.py
All new actions must be registered in config/action_types.py ACTION_REGISTRY
All service functions must be async def
All API keys go through config/settings.py — never os.environ directly
9 agents stay single-responsibility — no merging or scope creep
/ws/audio and /ws/device paths must not change (Android app dependency)

Project Structure

aura-live/
├── main.py                          # FastAPI app + lifespan
├── adk_agent.py                     # Google ADK root_agent (gemini-2.5-flash)
├── adk_streaming_server.py          # Gemini Live /ws/live handler
├── gcs_log_uploader.py              # Cloud Storage HTML log upload
├── Dockerfile                       # Cloud Run deployment
├── requirements copy.txt            # Python dependencies
│
├── agents/                          # The 9 single-responsibility agents
│   ├── commander.py                 # Intent parsing
│   ├── planner_agent.py             # Goal decomposition
│   ├── perceiver_agent.py           # Screen capture + SoM
│   ├── coordinator.py               # Multi-agent orchestrator
│   ├── actor_agent.py               # Zero-LLM gesture execution
│   ├── responder.py                 # Natural language responses
│   ├── validator.py                 # Rule-based validation
│   ├── verifier_agent.py            # Post-action verification
│   └── visual_locator.py            # ScreenVLM (SoM selection)
│
├── aura_graph/                      # LangGraph state machine
│   ├── graph.py                     # Graph assembly + entry points
│   ├── state.py                     # TaskState TypedDict (~40 fields)
│   ├── edges.py                     # Conditional routing functions
│   ├── core_nodes.py                # Node implementations
│   └── nodes/                       # Specialized nodes
│       ├── perception_node.py
│       ├── coordinator_node.py
│       ├── validate_outcome_node.py
│       ├── retry_router_node.py
│       ├── decompose_goal_node.py
│       └── next_subgoal_node.py
│
├── config/
│   ├── settings.py                  # Pydantic Settings (all env vars)
│   └── action_types.py              # ACTION_REGISTRY
│
├── services/
│   ├── perception_controller.py     # Tri-layer perception orchestration
│   ├── reactive_step_generator.py   # Per-screen action generation
│   ├── gesture_executor.py          # Gesture execution + strategy selection
│   ├── llm.py                       # Unified LLM interface (Groq/Gemini/NVIDIA)
│   ├── vlm.py                       # Unified VLM interface
│   ├── stt.py                       # Groq Whisper STT
│   ├── tts.py                       # Edge-TTS (Microsoft)
│   ├── prompt_guard.py              # Llama Prompt Guard 2 safety screening
│   ├── policy_engine.py             # OPA Rego policy gateway
│   └── command_logger.py            # HTML execution log builder
│
├── perception/
│   ├── perception_pipeline.py       # YOLOv8 + SoM pipeline
│   ├── omniparser_detector.py       # YOLOv8 UI element detection
│   └── vlm_selector.py              # VLM-based element selection
│
├── api/
│   ├── demo.py                      # /demo judge dashboard
│   ├── graph.py                     # /api/v1/graph/execute
│   ├── health.py                    # /health endpoints
│   └── tasks.py                     # /api/v1/tasks/ws streaming
│
├── api_handlers/
│   └── websocket_router.py          # /ws/audio and /ws/device handlers
│
├── policies/                        # OPA Rego policy files
├── prompts/                         # LLM prompt templates
└── UI/                              # Android companion app (Kotlin)
    └── app/src/main/java/com/aura/aura_ui/
        ├── conversation/ConversationViewModel.kt
        └── voice/GeminiLiveController.kt

License

MIT — see LICENSE.

Built for the Gemini Live Agent Challenge · Powered by Google ADK, Gemini Live, Groq, LangGraph, and Android Accessibility API

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.claude		.claude
.github		.github
UI		UI
agents		agents
api		api
api_handlers		api_handlers
aura brain vault		aura brain vault
aura		aura
aura_graph		aura_graph
config		config
data		data
docs		docs
exceptions		exceptions
middleware		middleware
models		models
perception		perception
policies		policies
prompts		prompts
scripts		scripts
services		services
static		static
tests		tests
tools		tools
utils		utils
validators		validators
websocket		websocket
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
SEMANTIC_MAP.md		SEMANTIC_MAP.md
adb_connect_phone.bat		adb_connect_phone.bat
adk_agent.py		adk_agent.py
adk_streaming_server.py		adk_streaming_server.py
aura_logcat.bat		aura_logcat.bat
build_log.txt		build_log.txt
constants.py		constants.py
device_app_inventory.json		device_app_inventory.json
exceptions_module.py		exceptions_module.py
gcloud		gcloud
gcs_log_uploader.py		gcs_log_uploader.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AURA — Autonomous User-Responsive Agent

What is AURA?

Table of Contents

Architecture

The 9 Agents

Critical Invariant

LangGraph Orchestration

Tri-Provider Model Architecture

LLM Models

VLM Models

STT / TTS

Perception Pipeline

Reactive Step Generator

Google Cloud Architecture

ADK Root Agent (adk_agent.py)

Gemini Live Bidirectional Streaming (adk_streaming_server.py)

GCS Execution Logs (gcs_log_uploader.py)

Cloud Run Deployment

Android Companion App

GeminiLiveController (voice/GeminiLiveController.kt)

ConversationViewModel (conversation/ConversationViewModel.kt)

WebSocket Endpoints

REST API

Installation

Prerequisites

Setup

Configuration

Running

Utility scripts

Tests

Cloud Run Deployment

Safety & Policies

Critical Invariants

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ADK Root Agent (`adk_agent.py`)

Gemini Live Bidirectional Streaming (`adk_streaming_server.py`)

GCS Execution Logs (`gcs_log_uploader.py`)

GeminiLiveController (`voice/GeminiLiveController.kt`)

ConversationViewModel (`conversation/ConversationViewModel.kt`)

Packages