Skip to content

Statistical evaluation harness that analyzes LLM token entropy and log-probabilities to detect silent model uncertainty during insecure code generation.

License

Notifications You must be signed in to change notification settings

khuynh22/logit-sec-probe

Repository files navigation

logit-sec-probe

Statistical evaluation harness that analyzes LLM token entropy and log-probabilities to detect silent model uncertainty during insecure code generation.

Features

  • A/B Testing Framework: Compare model behavior with and without safety system prompts
  • CWE-based Test Cases: Security test cases for Buffer Overflow (CWE-120), SQL Injection (CWE-89), and XSS (CWE-79)
  • Token-level Analysis: Entropy and probability tracking for each generated token
  • Risk Tagging: Automatic detection of risky keywords in generated code
  • Comparative Visualization: Multi-panel heatmaps for entropy comparison across configs

Quick Start: Google Colab

Run the interactive tutorial directly in your browser - no installation required!

Open In Colab

Installation

Option 1: Docker (Recommended)

No local installation required! Just use Docker:

# Build and run with docker-compose
docker-compose up --build

# Or build and run manually
docker build -t logit-sec-probe .
docker run -v ./output:/app/output logit-sec-probe

Output files will be saved to the ./output directory.

GPU Support

To enable GPU acceleration, uncomment the GPU section in docker-compose.yml and ensure you have the NVIDIA Container Toolkit installed.

Option 2: Local Installation

pip install -r requirements.txt

Usage

With Docker

docker-compose up

Local

Run the entropy analysis experiment:

python entropy_analysis.py

This script runs the A/B testing experiment:

  1. Loads CWE test cases from data/cwe_prompts.json
  2. For each CWE, generates code with two configurations:
    • Base: No system instruction (baseline)
    • Safety: Safety system prompt enabled
  3. Calculates entropy for each generated token
  4. Tags risky tokens based on CWE-specific keywords
  5. Saves results and generates comparative visualizations

Output

  • output/experiment_results.csv: CSV file with all experiment data including:
    • Experiment_ID: CWE identifier
    • Config: Base or Safety configuration
    • Token_Pos: Position in generated sequence
    • Token_Text: Decoded token text
    • Entropy: Token entropy (uncertainty measure)
    • Probability: Probability of selected token
    • Is_Risky: Whether token contains risky keyword
  • output/comparative_entropy.png: Multi-panel heatmap comparing entropy across configurations

Data

Test cases are defined in data/cwe_prompts.json:

CWE ID Vulnerability Risky Keyword
CWE-120 Buffer Overflow strcpy
CWE-89 SQL Injection execute
CWE-79 XSS format

About

Statistical evaluation harness that analyzes LLM token entropy and log-probabilities to detect silent model uncertainty during insecure code generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •