Skip to content

tototozip/voice

Repository files navigation

ActivateYourVoice

A Next.js web app for live presentation coaching with three memory layers:

  • Long-term memory: project corpus (memory/long Markdown files).
  • Short-term memory: demo deck/project context (memory/short Markdown files) + active slide context.
  • Live memory: real-time transcript window for coaching cues.

Current MVP Foundation

  • Rehearsal workspace with a glassmorphic assistant bubble.
  • Expandable assistant panel with fast navigation between Chat, Memory, and Deck.
  • Dual deck ingestion:
    • Google Slides URL import (Google Slides API when configured, fallback-safe).
    • PDF upload import (text extraction per page -> deck slide model).
  • Live Speechmatics realtime STT wired from browser microphone.
  • Real-time cue generation API with Backboard LLM (with heuristic fallback).
  • ElevenLabs TTS (preferred) with browser speech fallback for spoken Plant feedback.
  • Memory reference retrieval API (local memory-first, Backboard optional, fallback-safe).
  • Integration status endpoint for Speechmatics, Backboard, and Google.

Stack

  • Next.js 16 (App Router)
  • TypeScript
  • Tailwind CSS v4
  • Framer Motion
  • Zod

Run Locally

  1. Install dependencies:
npm install
  1. Install Poppler tools (needed for PDF import):
brew install poppler
  1. Configure environment variables:
cp .env.example .env.local
  1. Start the app:
npm run dev
  1. Open http://localhost:3000

Environment Variables

  • BACKBOARD_API_KEY: your Backboard API key (already provided in your local .env.local).
  • BACKBOARD_API_URL: Backboard base URL used for memory retrieval.
  • BACKBOARD_ASSISTANT_ID: optional existing assistant ID to reuse (if unset, one is created automatically).
  • BACKBOARD_LLM_PROVIDER: provider used by Backboard for cue generation (default openai).
  • BACKBOARD_LLM_MODEL: model used by Backboard for cue generation (default gpt-5.2).
  • BACKBOARD_MEMORY_MODE: Backboard memory mode for cue runs (default Readonly).
  • PROJECT_MEMORY_ROOT: local project memory root folder (default memory).
  • PROJECT_MEMORY_ACTIVE_PROJECT: optional active project id override for local memory selection.
  • SPEECHMATICS_API_KEY: Speechmatics API key for live transcription connectivity.
  • SPEECH_API_KEY: backward-compatible alias for Speechmatics key.
  • SPEECHMATICS_REALTIME_URL: Speechmatics realtime websocket URL.
  • SPEECHMATICS_REGION: Speechmatics region for realtime JWT minting (eu, usa, au).
  • SPEECHMATICS_TTS_URL: Speechmatics TTS base URL (default preview endpoint).
  • SPEECHMATICS_TTS_VOICE: Speechmatics TTS voice id (default sarah).
  • ELEVENLABS_API_KEY: optional preferred TTS provider key.
  • ELEVENLABS_VOICE_ID: ElevenLabs voice id (default pNInz6obpgDQGcFmaJgB).
  • ELEVENLABS_MODEL: ElevenLabs model id (default eleven_turbo_v2_5).
  • ELEVENLABS_WARM_SPEED: ElevenLabs native synthesis speed (default 0.9, lower is slower).
  • GOOGLE_API_KEY: Google API key for importing public Slides.
  • GOOGLE_OAUTH_ACCESS_TOKEN: OAuth token for importing private Slides.

API Endpoints (MVP)

  • POST /api/session/start
    • Starts a rehearsal session, returns session ID and a Backboard thread ID when available.
  • POST /api/coach/cue
    • Generates one coaching cue from transcript + active slide + memory depth.
  • GET /api/speechmatics/token
    • Mints a short-lived Speechmatics realtime token for the browser client.
  • POST /api/speechmatics/tts
    • Synthesizes feedback text with ElevenLabs (client-side browser fallback can be enabled if playback fails).
  • POST /api/memory/references
    • Retrieves memory references from the active project's local memory (projects/<project>/short in shallow mode, plus projects/<project>/long in deep mode), then Backboard, then fallback.
  • POST /api/deck/google/import
    • Parses Google Slides URL and imports deck context via Google Slides API.
  • POST /api/deck/pdf/import
    • Accepts multipart/form-data with file field and imports PDF pages as deck slides.
  • GET /api/integrations/status
    • Returns connection status for Speechmatics, Backboard, and Google.

Important Notes

  • Google Slides import needs GOOGLE_API_KEY (public deck) or GOOGLE_OAUTH_ACCESS_TOKEN (private deck).
  • PDF import extracts text content; image-only scanned PDFs may produce weak slide text.
  • Local memory retrieval is markdown-driven from the active project under memory/projects/<project>/short/*.md and memory/projects/<project>/long/*.md (legacy memory/short and memory/long are still supported).
  • memoryDepth: "shallow" uses short memory only (demo mode); memoryDepth: "deep" expands retrieval to short + long memory.
  • Rehearsal uses the active project's short memory and prefers colocated pairs like short/<deck>.pdf + short/<deck>.pdf.md.
  • Plant supports three live modes: off, listens, talks (plus Plant think as a deep-memory action).
  • App boots in wake-listening mode (mic stream active) and supports voice commands (Plant listen, Plant talk, Plant think, Plant remember, Plant off) even while Plant is off.
  • Plant think runs deep retrieval (short + long memory) with a visible loading step and a 15-second cap before fast fallback.
  • Plant remember updates short/project-advices-memory.md for the active project (add/remove recurring advice that survives reloads).
  • The /memory page lets you edit markdown files directly, including project-advices-memory.md.
  • Delivery signals include fillers, disfluency tags, and silence events (end-of-utterance).
  • Cue generation uses Backboard LLM when configured; deterministic heuristics remain as fallback.

Next Build Steps

  1. Complete Google OAuth refresh flow and token management for private Slides.
  2. Replace the temporary ScriptProcessor capture path with AudioWorklet-based PCM capture.
  3. Tune Backboard model/provider defaults and prompt strategy for latency vs coaching depth.
  4. Add persistent storage for sessions, transcripts, and reports.

PRD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors