A high-performance Retrieval-Augmented Generation (RAG) backend featuring semantic search, cross-encoder reranking, and live PDF/XML ingestion. Built with FastAPI and local vector search.
- Search-First Design - Collapsible sidebars: filters (left) | docs (center) | chat (right)
- Case Browsing - Browse all indexed cases with pagination
- Metadata Filters - Filter by court, judge, year range, case type
- Relevance Score - Color-coded percentage (green/yellow/red) showing document relevance
- Case Cards - Document display with citation, court, judge, year badges
- Full Case View - Modal with complete document text
- AI Chat - Right sidebar for AI answers (triggered by search or "Ask AI" per case)
- Snippet Highlighting - Relevant passages with keyword marks
- PDF & Text Upload - Auto-parse, chunk, embed, and index
- Keyboard Shortcuts -
/search,ssearch,bbrowse,cchat,?help
- Backend: FastAPI, Python 3.11+
- Vector DB: Zvec (local)
- Embeddings: BAAI/bge-small-en-v1.5 (384 dimensions)
- Reranking: cross-encoder/ms-marco-MiniLM-L-6-v2
- LLMs: Cerebras (primary), Gemini, OpenRouter (fallback)
- PDF Processing: PyMuPDF
uv synccp .env.example .env
# Add your API keys to .envRequired: at least one of CEREBRAS_API_KEY, GEMINI_API_KEY, or OPENROUTER_API_KEY
cd src
uv run python -m pdf_ingest --generate-test-pdfs
uv run python sync_knowledge_base.pycd src
uv run uvicorn api_server:app --reload --port 8000Visit http://localhost:8000/demo
[Filters ◀] | [Search + Results] | [Chat ▶]
- Left sidebar (collapsible): Filters for court, case type, year range
- Center: Search bar + case cards with relevance scores + snippets
- Right sidebar (collapsible): AI chat, opens automatically when searching
- Search - Semantic search with metadata filters
- Browse Cases - Grid view of all indexed cases
- Playground - Configure prompts and generation settings
| Key | Action |
|---|---|
/ |
Focus search |
s |
Search tab |
b |
Browse tab |
c |
Toggle chat |
? |
Show help |
Esc |
Close modal |
curl http://localhost:8000/api/filterscurl "http://localhost:8000/api/cases?limit=10&offset=0"curl "http://localhost:8000/api/cases/contract_case_001.pdf"curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "contract damages", "top_k": 5, "filters": {"court": "High Court"}}'curl -N -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"question": "What did the court decide?"}'curl -X POST http://localhost:8000/api/upload -F "file=@case.pdf"curl http://localhost:8000/api/healthSet in .env:
| Variable | Default | Description |
|---|---|---|
RAG_ENABLE_RERANKING |
true | Enable cross-encoder reranking |
RAG_TOP_K |
5 | Results to retrieve |
LLM_PROVIDER |
cerebras | cerebras, gemini, openrouter |
legal-rag/
├── src/
│ ├── api_server.py # FastAPI application
│ ├── rag_pipeline.py # RAG orchestration
│ ├── pdf_ingest.py # PDF + TXT parsing
│ ├── reranker.py # Cross-encoder reranking
│ ├── snippet_extractor.py # Snippet highlighting
│ ├── sync_knowledge_base.py # Batch ingestion
│ ├── config.py # Configuration
│ ├── llm_client.py # LLM client
│ ├── embedding_client.py # Embeddings
│ └── vector_store.py # Zvec backend
├── static/
│ └── demo.html # Demo interface
├── data/
│ └── legal-docs/ # PDF & TXT documents
├── docs/
│ └── plans/ # Implementation plans
├── pyproject.toml # UV package config
└── .env.example # Environment template
MIT
