Skip to content

Open-Paws/open-paws-intelligence

Repository files navigation

open-paws-intelligence

Investigation document pipeline and FOIA automation for animal advocacy.

Ingests USDA regulatory data, automates FOIA requests, processes investigation documents with a cryptographic chain of custody, and provides an offline-capable field tool for investigators.

All investigation data is treated as potential legal evidence. Three-adversary security model (state surveillance, industry infiltration, AI model bias) applied throughout.

Capabilities

  • FOIA/RTI request generation — US federal, California, Texas, New York, India RTI
  • USDA APHIS inspection data pipeline — PDF ingestion, repeat offender detection
  • Document processing — OCR + AI summarization with cryptographic chain of custody
  • Offline field server — encrypted SQLite, localhost-only, no telemetry
  • Coalition API — tiered access controls (public / coalition / investigator)
  • Investigation dashboard — Streamlit UI for violation search and FOIA tracking

Quick Start

pip install -e "."

# Generate a FOIA request to USDA APHIS
python -m src.foia.generator --agency USDA-APHIS --subject "AWA inspection records 2023-2024" --save

# Ingest USDA inspection PDFs (put PDFs in data/raw_pdfs/)
python -m src.regulatory.usda_pipeline --ingest

# Run the dashboard
streamlit run dashboard/app.py

# Start the API server
uvicorn src.api.server:app --reload --port 8000

# Start the offline field server
python -m src.offline.field_server --port 8080

Architecture

src/
├── foia/               FOIA/RTI request generation and tracking
│   ├── generator.py    Letter generation (US multi-agency + India RTI)
│   ├── dispatcher.py   Overdue tracking and watchdog
│   └── templates/      Jinja2 legal letter templates
├── regulatory/         USDA APHIS inspection data ingestion
│   ├── usda_pipeline.py PDF parser + SQLite ingestion
│   ├── models.py       Data model dataclasses
│   └── analyzer.py     Repeat offender detection, violation alerts
├── documents/          Investigation document processing
│   ├── ingester.py     OCR + AI summarization pipeline
│   ├── classifier.py   Violation type classification
│   └── chain_of_custody.py  Cryptographic audit trail
├── offline/
│   └── field_server.py Offline-first FastAPI server (encrypted SQLite)
└── api/
    └── server.py       Coalition FastAPI server with access control tiers
dashboard/
└── app.py              Streamlit investigation dashboard
docs/
├── security.md         Encrypted storage, ag-gag, AI provider routing
└── jurisdiction-guide.md  FOIA jurisdictions + ag-gag law by state

Security

See docs/security.md for:

  • Encrypted storage requirements (AES-256-GCM)
  • What NOT to store (witness identities, investigator names)
  • Offline mode device seizure preparation
  • Ag-gag exposure by jurisdiction
  • AI provider zero-retention requirements
  • Coalition API key management

Bounded Context

This repo is the Investigation Operations bounded context. Data here does not flow to Public Campaigns or Coalition Coordination without explicit declassification. See the global CLAUDE.md for bounded context rules.

Sources

Adapted from:

About

Investigation document pipeline and FOIA automation for animal advocacy

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors