Skip to content

MrRobotop/SecureAlphaAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

SecureAlphaAI

IP-safe, LLM-powered financial strategy analysis — built on Anthropic Claude.

Python 3.11+ License: MIT

SecureAlphaAI lets quant firms and asset managers harness frontier AI for portfolio commentary and report generation without leaking proprietary alpha — the signals, strategy names, thresholds, and position sizes that constitute their intellectual property.


Table of Contents

  1. The Problem
  2. Theoretical Framework
  3. Architecture
  4. Defence Layers in Detail
  5. Frontend Dashboard
  6. API Reference
  7. Quick Start (Docker — recommended)
  8. Quick Start (Local Dev)
  9. Configuration
  10. Running the Tests
  11. Python Client SDK
  12. Threat Model
  13. Tech Stack

The Problem

Modern quantitative finance is built on proprietary alpha — carefully engineered signals, strategy parameters, and portfolio construction logic that represent years of research and millions in development cost. When firms began integrating LLMs into their workflows, an uncomfortable tension emerged:

LLMs need context to be useful. Context contains your IP. Sending your IP to an external API is a risk.

The four failure modes this creates:

Failure Mode What Happens Consequence
Training data leakage Your prompt is retained and used to fine-tune future models Competitors' models learn your alpha
Prompt injection A malicious payload in analyst notes causes the LLM to echo back sensitive values Secrets exfiltrated through the model response
Inference attacks High-precision numbers (e.g. a 6-decimal factor loading) uniquely identify your strategy IP reconstructable even from "anonymised" outputs
Compliance exposure Non-public portfolio positions and strategy parameters may constitute MNPI Regulatory liability under SEC Rule 10b-5, MiFID II, etc.

The naive solution — "just don't send sensitive data" — breaks the value proposition entirely. The correct solution is a sanitisation pipeline that strips the identifying information while preserving the analytical signal.

That is what SecureAlphaAI is.


Theoretical Framework

Information Theory: Why Precision Is the Attack Surface

Consider a conviction score reported as 0.8347561. The Shannon entropy of a 7-decimal floating-point value drawn from a continuous distribution approaches log₂(10⁷) ≈ 23 bits. That is enough entropy to uniquely identify the strategy from a universe of ~8 million candidates.

By contrast, a value reported as ~0.83 (2 decimal places) carries only log₂(100) ≈ 6.6 bits — insufficient to distinguish your strategy from hundreds of others running similar momentum overlays.

The Sanitiser's threshold rules are therefore not arbitrary. They are grounded in the observation that:

  • Public market data (prices, returns, volumes) uses at most 4 decimal places in standard data feeds.
  • Internal risk parameters, factor loadings, and signal scores are routinely computed to 6–8 decimal places.
  • Anything ≥ 6 decimal places is therefore almost certainly an internal parameter, not a public market datum.
Identifying entropy by decimal precision
─────────────────────────────────────────────────────────────────────────
Decimal places │ Example       │ Entropy  │ Risk
───────────────┼───────────────┼──────────┼─────────────────────────────
2              │ 0.83          │  6.6 bit │ Safe — ambiguous
3              │ 0.835         │  9.9 bit │ Caution — borderline
4              │ 0.8347        │ 13.3 bit │ Sensitive — internal range
5              │ 0.83475       │ 16.6 bit │ High risk
6+             │ 0.834756...   │ 20+ bit  │ BLOCKED — uniquely identifying
─────────────────────────────────────────────────────────────────────────

Fail-Closed Design Philosophy

The PromptGuard operates on a fail-closed principle borrowed from network security: when in doubt, deny.

This asymmetry is deliberate. In access control systems, false positives (legitimate traffic blocked) cause operational friction. False negatives (malicious traffic allowed) cause breaches. In IP protection, the cost asymmetry is even more extreme: a false positive means a developer gets a 422 and adds a new config entry; a false negative means proprietary signals leave the building.

Defence in Depth

No single control is sufficient. SecureAlphaAI layers three independent mechanisms:

  ┌─────────────────────────────────────────────────────────────────────┐
  │  LAYER 1 — Sanitiser (transform)                                    │
  │  Replace known-sensitive values with stable placeholders.           │
  │  Maintains reverse_map for internal audit. Never sent to LLM.       │
  └─────────────────────────────────────┬───────────────────────────────┘
                                        │ sanitised text
  ┌─────────────────────────────────────▼───────────────────────────────┐
  │  LAYER 2 — PromptGuard (detect + reject)                            │
  │  Regex-based inspection for residual sensitive patterns.            │
  │  Catches config gaps. Fail-closed: reject on any match.             │
  └─────────────────────────────────────┬───────────────────────────────┘
                                        │ approved context only
  ┌─────────────────────────────────────▼───────────────────────────────┐
  │  LAYER 3 — API Boundary (validate + limit)                          │
  │  Pydantic schema validation. String length limits on free-text.     │
  │  Prevents prompt injection via oversized payloads.                  │
  └─────────────────────────────────────────────────────────────────────┘

Even if an attacker crafts a payload that defeats one layer, the remaining two provide independent barriers.


Architecture

System Overview

  ┌──────────────────────────────────────────────────────────────────────┐
  │  Client (Browser / SDK / curl)                                       │
  └──────────────────────────────┬───────────────────────────────────────┘
                                 │ HTTPS
  ┌──────────────────────────────▼───────────────────────────────────────┐
  │  Next.js Frontend  (port 3000)                                       │
  │  Strategy list · Detail tabs · Compare · Report · Admin              │
  └──────────────────────────────┬───────────────────────────────────────┘
                                 │ REST + SSE
  ┌──────────────────────────────▼───────────────────────────────────────┐
  │  FastAPI Backend  (port 8000)                                        │
  │                                                                      │
  │   ┌──────────────────────────────────────────────────────────────┐   │
  │   │  Request Boundary                                            │   │
  │   │  Pydantic v2 validation · API-key auth · Rate limiting       │   │
  │   └──────────────────────┬───────────────────────────────────────┘   │
  │                          │ typed domain objects                      │
  │   ┌──────────────────────▼───────────────────────────────────────┐   │
  │   │  ContextBuilder                                              │   │
  │   │  ┌────────────────────────────────────────────────────────┐  │   │
  │   │  │  Sanitiser — replace protected values → placeholders   │  │   │
  │   │  └───────────────────┬────────────────────────────────────┘  │   │
  │   │  ┌───────────────────▼────────────────────────────────────┐  │   │
  │   │  │  PromptGuard — inspect, reject if residual IP found    │  │   │
  │   │  └───────────────────┬────────────────────────────────────┘  │   │
  │   └──────────────────────┼───────────────────────────────────────┘   │
  │                          │ approved context                          │
  │   ┌──────────────────────▼───────────────────────────────────────┐   │
  │   │  StrategyAnalyst / ReportGenerator                           │   │
  │   │  Prompt templates wrapping the safe context                  │   │
  │   └──────────────────────┬───────────────────────────────────────┘   │
  │                          │ formatted prompt                          │
  │   ┌──────────────────────▼───────────────────────────────────────┐   │
  │   │  LLMClient (AsyncAnthropic)                                  │   │
  │   │  claude-opus-4-6 · Prompt caching · Streaming SSE            │   │
  │   └──────────────────────┬───────────────────────────────────────┘   │
  └──────────────────────────┼───────────────────────────────────────────┘
                             │
  ┌──────────────────────────▼───────────────────────────────────────┐
  │  Anthropic API                                                   │
  │  Receives: sanitised context only. Never sees your raw IP.       │
  └──────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────────┐
  │  Supporting Services                                            │
  │  Redis ── arq job queue ── background report generation         │
  │  SQLite/Postgres ── strategy store, API keys, audit log         │
  │  Prometheus ── metrics endpoint (/metrics)                      │
  └─────────────────────────────────────────────────────────────────┘

Data Flow: Concrete Example

POST /analyse

1. Raw request:
   {
     "asset_returns": [{"ticker": "AAPL", "daily_return_pct": 2.3}],
     "strategy_signal": {
       "strategy_name": "AlphaV1",
       "conviction_score": 0.8347561,        ← internal precision
       "notes": "TRD-ABC12345 closed above target"
     }
   }

2. Pydantic validates types and field constraints. ✓

3. ContextBuilder formats raw text:
   "AAPL: +2.30% | Strategy: AlphaV1 | Conviction: 0.8347561 | TRD-ABC12345"

4. Sanitiser transforms:
   "ASSET_A: +2.30% | Strategy: STRATEGY_1 | Conviction: [REDACTED_THRESHOLD]
    | [REDACTED_ID]"
   reverse_map = {"ASSET_A": "AAPL", "STRATEGY_1": "AlphaV1"}
   (stored internally, never transmitted)

5. PromptGuard inspects sanitised text:
   ✓ No numbers with ≥ 6 decimal places
   ✓ No forbidden internal phrases
   ✓ No residual trade/order IDs
   ✓ No suspicious all-caps tickers
   → APPROVED

6. StrategyAnalyst wraps context in analysis prompt template.

7. LLMClient sends to Anthropic (claude-opus-4-6):
   - Stable system prompt (cached — ~90% token cost reduction)
   - Sanitised user prompt only

8. Response returned to caller.
   ✓ Your IP never left the system.

Defence Layers in Detail

Sanitiser (core/sanitiser.py)

A deterministic, lossless transformation layer. Every substitution is recorded in a reverse_map for internal audit — but the map stays inside the system and is never sent to the LLM.

Rule Input Example Output Why
Ticker replacement AAPL ASSET_A Identifies specific positions (MNPI)
Strategy replacement AlphaV1 STRATEGY_1 Trade secret — identifies proprietary model
Dollar amounts $1,234,567 [REDACTED_AMOUNT] Position size reveals NAV/exposure
Threshold redaction 0.0347% [REDACTED_THRESHOLD] High-precision % = internal risk param
UUID redaction 550e8400-… [REDACTED_ID] Internal system identifiers
Trade/order IDs TRD-ABC12345 [REDACTED_ID] Links to specific internal orders
BIC/SWIFT codes GSBEFRPP [REDACTED_BIC] Identifies counterparties
ISIN/CUSIP/SEDOL US0378331005 [REDACTED_ISIN] Identifies specific securities globally
Dates (contextual) 2024-03-15 [REDACTED_DATE] Can reveal entry/exit timing

Stability guarantee: tickers are replaced in the order they appear in PROTECTED_TICKERS, so AAPL is always ASSET_A across all requests — consistent across time.

PromptGuard (core/prompt_guard.py)

A fail-closed inspection layer that runs after the Sanitiser. Its job is to catch anything the Sanitiser missed — because a config gap exists, or a new data field was added without sanitisation.

Check Rationale
Numbers with ≥ 6 decimal places Only internal parameters reach this precision; public data uses ≤ 4 dp
Ratio expressions (3.14159:1) Internal risk-ratio formatting is distinctive
Forbidden phrases (live PnL, strategy id:, internal signal) Indicator of unsanitised internal system output
Trade/order/position IDs (TRD-, ORD-, POS-) Systematic internal ID prefixes
Residual all-caps tokens Flags tickers not listed in PROTECTED_TICKERS

Custom rules can be added by passing extra_rules: list[GuardRule] to the constructor — each rule is a compiled regex + human-readable description.

ContextBuilder (core/context_builder.py)

The single, mandatory path from your internal domain model to the LLM. There is no other route — you cannot accidentally bypass sanitisation by formatting data yourself. It chains Sanitiser → PromptGuard and returns the approved string or raises ValueError.

LLMClient (core/llm_client.py)

Thin wrapper around AsyncAnthropic with:

  • complete() — single-shot async completion
  • stream_complete() — async generator streaming text chunks via SSE

Prompt caching: The base system prompt is marked cache_control: {type: "ephemeral"}. Since it is stable across all calls, Anthropic caches it server-side — reducing token costs by up to 90% on repeated requests.


Frontend Dashboard

A full Next.js 15 dashboard ships alongside the API, providing a visual interface for all backend capabilities.

Pages

Page Path What it does
Strategy List /strategies Browse all strategies, create new, delete
Strategy Detail /strategies/[id] Four tabs: Overview · Performance · Snapshots · Reports
Compare /compare Side-by-side multi-strategy comparison with chip picker
Admin — API Keys /admin/keys Create/revoke API keys with role assignment
Admin — Audit Log /admin/audit Full audit trail of all API actions
Admin — Guard /admin/guard Live PromptGuard rule inspection and testing
Settings /settings Configure API base URL and key for the session

All pages are covered by Playwright end-to-end tests (frontend/e2e/).


API Reference

Full interactive docs available at http://localhost:8000/docs (Swagger UI) and http://localhost:8000/redoc (ReDoc) when the server is running.

Core Endpoints

Method Path Description
POST /analyse Analyse a market snapshot + strategy signal
POST /report Generate a full narrative report (queued via arq)
GET /report/{id} Poll report job status
GET /report/{id}/stream Stream report output via SSE

Strategy Management

Method Path Description
POST /strategies Create a new strategy
GET /strategies List all strategies
GET /strategies/{id} Get a single strategy
DELETE /strategies/{id} Delete a strategy
POST /strategies/{id}/snapshots Record a performance snapshot
GET /strategies/{id}/snapshots Retrieve snapshot history
GET /strategies/{id}/trend Analyse performance trend
GET /strategies/compare Compare multiple strategies

Admin & Governance

Method Path Description
POST /admin/keys Create an API key
GET /admin/keys List all API keys
DELETE /admin/keys/{id} Revoke an API key
GET /admin/audit Query the audit log
GET /admin/dsar/export/{user_id} GDPR data export
DELETE /admin/dsar/delete/{user_id} GDPR right-to-erasure

Observability

Method Path Description
GET /health Liveness check
GET /metrics Prometheus metrics

Quick Start (Docker — recommended)

Prerequisites: Docker and Docker Compose installed.

# 1. Clone the repo
git clone https://github.com/MrRobotop/SecureAlphaAI.git
cd SecureAlphaAI/secure-alpha-ai

# 2. Create your environment file
cp .env.example .env

Open .env and set your Anthropic API key:

ANTHROPIC_API_KEY=sk-ant-your-key-here

Get a key at console.anthropic.com.

# 3. Build and start all services
docker compose up --build

That starts four containers:

Container URL Description
secure-alpha-frontend http://localhost:3000 Next.js dashboard
secure-alpha-ai http://localhost:8000 FastAPI backend
secure-alpha-arq-worker Background job worker
secure-alpha-redis localhost:6379 Job queue / cache
# 4. Try it — analyse a strategy signal
curl -s -X POST http://localhost:8000/analyse \
  -H "Content-Type: application/json" \
  -d '{
    "timestamp": "2024-01-15T09:30:00Z",
    "volatility_regime": "high",
    "sector_rotation_signal": 0.32,
    "asset_returns": [
      {"ticker": "AAPL", "daily_return_pct": 2.3},
      {"ticker": "MSFT", "daily_return_pct": -0.8}
    ],
    "analyst_note": "Tech outperforming on earnings beat.",
    "strategy_signal": {
      "strategy_name": "AlphaV1",
      "signal_direction": "long",
      "conviction_score": 0.75,
      "target_tickers": ["AAPL"],
      "notes": "Momentum building post-earnings."
    },
    "analysis_question": "Is the long signal well-supported by market conditions?"
  }' | python -m json.tool

Quick Start (Local Dev)

# 1. Clone and enter
git clone https://github.com/MrRobotop/SecureAlphaAI.git
cd SecureAlphaAI/secure-alpha-ai

# 2. Python virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install backend dependencies
pip install -e ".[dev]"

# 4. Configure environment
cp .env.example .env
# Edit .env — set ANTHROPIC_API_KEY and your protected tickers/strategies

# 5. Run database migrations
alembic upgrade head

# 6. Start the API (auto-reload)
uvicorn api.routes:app --reload --host 0.0.0.0 --port 8000

For the frontend:

cd frontend
npm install
npm run dev     # http://localhost:3000

Configuration

All configuration is via environment variables. Copy .env.example to .env and fill in your values — the file is gitignored and will never be committed.

Variable Required Default Description
ANTHROPIC_API_KEY Yes Your Anthropic API key (get one here)
PROTECTED_TICKERS No AAPL,MSFT,NVDA,GOOGL,AMZN Comma-separated ticker symbols to anonymise
PROTECTED_STRATEGY_NAMES No (empty) Comma-separated strategy names to redact
PROTECTED_COUNTERPARTIES No (empty) Comma-separated counterparty names to redact
REDIS_URL No redis://localhost:6379/0 Redis connection string (needed for background reports)
STRATEGY_RETENTION_DAYS No 90 Days before strategies are swept by the retention cleaner
APP_ENV No development development or production
LOG_LEVEL No INFO Python logging level
API_PORT No 8000 Port to expose the API
FRONTEND_PORT No 3000 Port to expose the frontend

Extending the protection list: Add tickers and strategy names to PROTECTED_TICKERS and PROTECTED_STRATEGY_NAMES. No code changes required — both the Sanitiser and PromptGuard read these at startup.


Running the Tests

The test suite runs entirely offline — no Anthropic API key required.

# Install dev dependencies (if not already done)
pip install -e ".[dev]"

# Run all 292 tests
pytest

# Run with coverage report
pytest --cov=core --cov=api --cov=analysis --cov-report=term-missing

# Run the adversarial corpus (58 injection payloads)
pytest tests/test_adversarial.py -v

Test Coverage

Module Tests
core/sanitiser.py Unit + property-based (Hypothesis)
core/prompt_guard.py Unit + adversarial corpus (58 payloads)
core/context_builder.py Integration — sanitiser + guard pipeline
core/output_validator.py Unit — hallucination detection
api/routes.py Integration — full HTTP round-trip (respx)
client/ SDK unit tests

The adversarial corpus covers:

  • Direct injection attempts (ignore previous instructions)
  • Unicode homoglyph substitution
  • Whitespace obfuscation around sensitive values
  • Nested JSON inside free-text fields
  • Base64-encoded payloads
  • Prompt continuation attacks

Frontend E2E

cd frontend
npx playwright test         # headless
npx playwright test --ui    # interactive UI mode

Python Client SDK

A typed Python client ships in client/ for use in your own pipelines:

from secure_alpha_ai_client import SecureAlphaAIClient

client = SecureAlphaAIClient(
    base_url="http://localhost:8000",
    api_key="your-api-key",
)

# Analyse a signal
result = client.analyse(
    asset_returns=[{"ticker": "AAPL", "daily_return_pct": 2.3}],
    strategy_signal={
        "strategy_name": "AlphaV1",
        "signal_direction": "long",
        "conviction_score": 0.75,
    },
    analysis_question="Is the long signal supported by current conditions?",
)
print(result.analysis)

Async client available as AsyncSecureAlphaAIClient.

CLI (built on Typer):

pip install -e client/
secure-alpha analyse --ticker AAPL --strategy AlphaV1 --conviction 0.75
secure-alpha report --strategy-id <uuid> --style executive_summary

Threat Model

In Scope (mitigated)

  • Accidental inclusion of proprietary data in LLM prompts
  • Config gaps (untlisted tickers/strategies) caught by PromptGuard
  • Prompt injection via analyst notes (length limits + guard pattern matching)
  • Internal ID leakage (trade IDs, order IDs, position IDs)
  • High-precision parameter fingerprinting

Out of Scope (requires additional controls)

Risk Recommended Control
LLM provider data retention Use Anthropic's zero data retention tier
Network-level exfiltration TLS everywhere + egress filtering
Insider threat (config tampering) Access-control .env / secrets manager
Model training data contamination If data was ever public, it may already be in the model
Side-channel timing attacks Not addressed at the application layer

Tech Stack

Layer Technology
LLM Anthropic Claude (claude-opus-4-6) via anthropic Python SDK
Backend FastAPI + Pydantic v2 + Uvicorn
Database SQLite (dev) / PostgreSQL (prod) via SQLAlchemy async + Alembic
Auth API key auth with RBAC (admin / analyst / viewer)
Job Queue arq + Redis — background report generation
Observability Prometheus metrics, structured logging, request-ID tracing
Frontend Next.js 15 (App Router) + Tailwind CSS + TanStack Query
Testing pytest + Hypothesis (property-based) + respx + Playwright (E2E)
CI/CD GitHub Actions — lint (ruff), type-check (mypy), tests, coverage
Containerisation Docker multi-stage build + Docker Compose
IP Protection Custom Sanitiser + PromptGuard pipeline (zero external deps)

Built by Rishabh Patil — Quant Developer.

About

IP-safe, LLM-powered financial strategy analysis built on Anthropic Claude for Quant Researchers, Traders and Developers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors