Statistical evaluation harness that analyzes LLM token entropy and log-probabilities to detect silent model uncertainty during insecure code generation.
- A/B Testing Framework: Compare model behavior with and without safety system prompts
- CWE-based Test Cases: Security test cases for Buffer Overflow (CWE-120), SQL Injection (CWE-89), and XSS (CWE-79)
- Token-level Analysis: Entropy and probability tracking for each generated token
- Risk Tagging: Automatic detection of risky keywords in generated code
- Comparative Visualization: Multi-panel heatmaps for entropy comparison across configs
Run the interactive tutorial directly in your browser - no installation required!
No local installation required! Just use Docker:
# Build and run with docker-compose
docker-compose up --build
# Or build and run manually
docker build -t logit-sec-probe .
docker run -v ./output:/app/output logit-sec-probeOutput files will be saved to the ./output directory.
To enable GPU acceleration, uncomment the GPU section in docker-compose.yml and ensure you have the NVIDIA Container Toolkit installed.
pip install -r requirements.txtdocker-compose upRun the entropy analysis experiment:
python entropy_analysis.pyThis script runs the A/B testing experiment:
- Loads CWE test cases from
data/cwe_prompts.json - For each CWE, generates code with two configurations:
- Base: No system instruction (baseline)
- Safety: Safety system prompt enabled
- Calculates entropy for each generated token
- Tags risky tokens based on CWE-specific keywords
- Saves results and generates comparative visualizations
output/experiment_results.csv: CSV file with all experiment data including:Experiment_ID: CWE identifierConfig: Base or Safety configurationToken_Pos: Position in generated sequenceToken_Text: Decoded token textEntropy: Token entropy (uncertainty measure)Probability: Probability of selected tokenIs_Risky: Whether token contains risky keyword
output/comparative_entropy.png: Multi-panel heatmap comparing entropy across configurations
Test cases are defined in data/cwe_prompts.json:
| CWE ID | Vulnerability | Risky Keyword |
|---|---|---|
| CWE-120 | Buffer Overflow | strcpy |
| CWE-89 | SQL Injection | execute |
| CWE-79 | XSS | format |