Investigation document pipeline and FOIA automation for animal advocacy.
Ingests USDA regulatory data, automates FOIA requests, processes investigation documents with a cryptographic chain of custody, and provides an offline-capable field tool for investigators.
All investigation data is treated as potential legal evidence. Three-adversary security model (state surveillance, industry infiltration, AI model bias) applied throughout.
- FOIA/RTI request generation — US federal, California, Texas, New York, India RTI
- USDA APHIS inspection data pipeline — PDF ingestion, repeat offender detection
- Document processing — OCR + AI summarization with cryptographic chain of custody
- Offline field server — encrypted SQLite, localhost-only, no telemetry
- Coalition API — tiered access controls (public / coalition / investigator)
- Investigation dashboard — Streamlit UI for violation search and FOIA tracking
pip install -e "."
# Generate a FOIA request to USDA APHIS
python -m src.foia.generator --agency USDA-APHIS --subject "AWA inspection records 2023-2024" --save
# Ingest USDA inspection PDFs (put PDFs in data/raw_pdfs/)
python -m src.regulatory.usda_pipeline --ingest
# Run the dashboard
streamlit run dashboard/app.py
# Start the API server
uvicorn src.api.server:app --reload --port 8000
# Start the offline field server
python -m src.offline.field_server --port 8080src/
├── foia/ FOIA/RTI request generation and tracking
│ ├── generator.py Letter generation (US multi-agency + India RTI)
│ ├── dispatcher.py Overdue tracking and watchdog
│ └── templates/ Jinja2 legal letter templates
├── regulatory/ USDA APHIS inspection data ingestion
│ ├── usda_pipeline.py PDF parser + SQLite ingestion
│ ├── models.py Data model dataclasses
│ └── analyzer.py Repeat offender detection, violation alerts
├── documents/ Investigation document processing
│ ├── ingester.py OCR + AI summarization pipeline
│ ├── classifier.py Violation type classification
│ └── chain_of_custody.py Cryptographic audit trail
├── offline/
│ └── field_server.py Offline-first FastAPI server (encrypted SQLite)
└── api/
└── server.py Coalition FastAPI server with access control tiers
dashboard/
└── app.py Streamlit investigation dashboard
docs/
├── security.md Encrypted storage, ag-gag, AI provider routing
└── jurisdiction-guide.md FOIA jurisdictions + ag-gag law by state
See docs/security.md for:
- Encrypted storage requirements (AES-256-GCM)
- What NOT to store (witness identities, investigator names)
- Offline mode device seizure preparation
- Ag-gag exposure by jurisdiction
- AI provider zero-retention requirements
- Coalition API key management
This repo is the Investigation Operations bounded context. Data here does not flow to Public Campaigns or Coalition Coordination without explicit declassification. See the global CLAUDE.md for bounded context rules.
Adapted from:
- AutomatedFOIA — FastAPI FOIA pipeline with AES-256 encryption
- FOIARTI-Request-Generator — Multi-jurisdiction FOIA/RTI generation
- USDA-AWA-Inspection-Pipeline — PDF parser and violation database
- project-nomad — Offline-first server patterns
- DocMind — AI document intelligence and RAG patterns