This system handles potential legal evidence. Every design decision must account for the three-adversary threat model: state surveillance, industry infiltration, and AI model bias. Data compromise here can endanger witnesses, investigators, and ongoing operations.
All investigation data (evidence documents, field notes, chain of custody logs) must be stored encrypted at rest using AES-256-GCM.
For the evidence database and custody log:
- Use an encrypted filesystem volume (LUKS on Linux, FileVault on macOS) for the
entire
evidence/directory. - The SQLite evidence index (
evidence/index.db) should additionally be encrypted using SQLCipher if the pysqlcipher3 package is available. - Set
FIELD_DB_KEYin.envfor the field server's local SQLite encryption.
For uploaded documents in transit:
- The AutomatedFOIA pattern (AES-256-GCM, 12-byte nonce prepended) is implemented
in
src/api/server.pyfor document uploads. - Documents are processed in RAM using
io.BytesIO— no unencrypted temp files. - On completion, decrypted bytes are explicitly zeroed from memory where possible.
Key management:
- Store
AES_SECRET_KEYandFIELD_DB_KEYin.env— never commit to git. - Rotate keys after any suspected compromise. The
.gitignoreexcludes.env.*. - For production: use a secrets manager (Vault, AWS Secrets Manager) not
.envfiles.
These data types must NEVER appear in plaintext logs, database fields, or anywhere that could be subpoenaed or seized:
- Witness identities — real names, contact information, locations
- Investigator identities — names, pseudonyms that link to real identities, device IDs
- Undercover operation details — facility entry methods, timing, cover stories
- Source recruitment or communication — any record of how a witness was contacted
Use pseudonymous IDs instead (e.g. OP-001, WIT-042). Maintain the mapping from
pseudonym to real identity ONLY in a separate, separately secured system that does
not sync with this pipeline.
Regulatory data (USDA APHIS inspection records) is public and carries no restriction. FOIA request tracking (agency, subject, dates) is also low-sensitivity.
The field server (src/offline/field_server.py) is designed for air-gapped operation:
- Binds to
127.0.0.1by default. Never expose to the internet. - Network access requires the explicit
--networkflag and is still localhost-local. pysqlcipher3encrypts the local SQLite database. If unavailable, a warning is logged. Do not use the unencrypted fallback for active investigations.- Sync is always a deliberate operator action. There is no automatic background sync.
- Investigators should disable Wi-Fi and mobile data before running the field server in high-risk environments.
Device seizure preparation:
- Enable full-disk encryption (LUKS/FileVault) on any device running this software.
- Configure remote wipe capability.
- Set an auto-lock timeout. Consider panic-wipe trigger (e.g. multiple wrong PINs).
- Do not keep unencrypted backups.
- Destroy devices that cannot be wiped before seizure when legally permissible.
Ag-gag statutes criminalize undercover investigation of agricultural operations. Exposure risk varies by jurisdiction:
| Jurisdiction | Ag-Gag Law | Risk Level | Notes |
|---|---|---|---|
| Iowa | Iowa Code § 717A.3A | High | Criminal penalties for trespass + documentation |
| North Carolina | N.C. Gen. Stat. § 99A-2 | High | Civil liability for gaining employment under false pretenses |
| Kansas | K.S.A. § 47-1827 | High | Criminal for entering ag facilities to photograph |
| Alabama | Ala. Code § 2-15-110 | Medium | Recording without consent at commercial facilities |
| Montana | Mont. Code Ann. § 81-30-103 | Medium | Civil cause of action for facility operators |
| Idaho | Struck down (9th Cir. 2018) | Low | Animal Legal Defense Fund v. Wasden |
| Utah | Utah Code § 76-6-112 | High | Criminal ag-gag, survived facial challenge |
| Arkansas | Ark. Code Ann. § 2-5-101 | Medium | Agricultural operation recording |
Operational guidance:
- Before any investigation, consult legal counsel on current ag-gag exposure.
- Evidence from ag-gag jurisdictions may be inadmissible or may expose investigators to criminal prosecution. Handle separately and flag clearly.
- FOIA requests are protected speech and carry no ag-gag exposure risk.
- Public records from USDA APHIS are public information — no exposure.
See docs/jurisdiction-guide.md for full jurisdiction analysis.
Zero-retention requirement: Investigation documents (evidence, witness testimony, operational data) must NEVER be routed through AI providers that retain inputs.
Permitted providers for investigation data:
- Locally hosted models (Ollama, llama.cpp, LM Studio)
- Providers with verified zero-retention agreements (verify contractually, not just from marketing copy — see the provider's DPA)
Permitted for public regulatory data only:
- Google Gemini (verify zero-retention tier — standard API retains inputs)
- OpenAI (verify zero-retention tier — standard API retains inputs)
The ai_provider parameter in src/documents/ingester.py defaults to "auto".
Set GEMINI_API_KEY or OPENAI_API_KEY in .env only if you have verified
zero-retention agreements. For investigations, use ai_provider="local".
API keys are tiered: PUBLIC (no key), COALITION, INVESTIGATOR.
- PUBLIC endpoints serve only USDA APHIS public data.
- COALITION keys allow FOIA request generation and cross-org violation tracking.
- INVESTIGATOR keys allow document ingestion and evidence access.
- All keys are masked in logs (last 4 chars only) — never log full keys.
- Rotate COALITION and INVESTIGATOR keys if any partner organization is compromised.
- Consider IP allowlisting for INVESTIGATOR-tier endpoints.
Set keys in .env:
COALITION_API_KEYS=key1,key2,key3
INVESTIGATOR_API_KEYS=key4,key5
Before adding any new dependency, verify it exists and has legitimate maintainers. ~20% of AI-recommended packages are hallucinated. Run:
pip index versions <package-name> # verify package exists on PyPICurrent dependencies are verified. New dependencies must be added deliberately — never accept AI suggestions for package names without checking PyPI directly.