Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

LLM Introspect Examples

This directory contains example outputs from LLM Introspect runs.

Note: These examples use local LLM interrogators (via LM Studio). Remote interrogators are also supported using --remote-interrogator with any API provider. See the main README or USAGE for remote interrogator examples.

Code Audit Examples

Code Audit with Adaptive Interrogation

Command used:

export LMSTUDIO_BASE_URL="http://192.168.1.3:1234/v1"

llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-haiku-20240307 \
    --adaptive lmstudio:qwen/qwen2.5-coder-14b \
    --max-followups 3 \
    --categories security_sql,security_injection,efficiency_algorithm \
    --format markdown \
    --output examples/code-audit-haiku-example.md

What this does:

  1. Tests Claude 3 Haiku's code generation abilities
  2. Uses a local Qwen 2.5 Coder model (via LM Studio) to analyze the generated code
  3. Focuses on three challenge categories:
    • security_sql - SQL injection vulnerability detection
    • security_injection - Command injection vulnerability detection
    • efficiency_algorithm - Algorithmic complexity analysis
  4. Outputs results in markdown format

Results: code-audit-haiku-example.md

Key findings:

  • Overall Grade: A+ (94.6%)
  • Security Score: 90.0%
  • Efficiency Score: 94.3%
  • All 7 challenges passed

The adaptive interrogator (Qwen 2.5 Coder) analyzed each code response and determined the generated code was satisfactory, so no follow-up probes were needed.


Full Introspection Suite Examples

Full Probe Suite with Adaptive Interrogation

Command used:

export LMSTUDIO_BASE_URL="http://192.168.1.3:1234/v1"

llm-introspect run \
    --provider anthropic \
    --model claude-haiku-4-5-20251001 \
    --adaptive lmstudio:gemma-3-12b \
    --max-followups 3 \
    --probes all \
    --format markdown \
    --output examples/full-suite-haiku-4.5-example.md

What this does:

  1. Runs the complete introspection suite against Claude Haiku 4.5
  2. Uses a local Gemma 3 12B model (via LM Studio) as the adaptive interrogator
  3. Executes all probe types:
    • contrastive - Tests response sensitivity to framing variations
    • boundaries - Maps behavioral boundaries and bypass resistance
    • consistency - Measures position stability across topics
    • unlock - Tests jailbreak resistance and prompt injection handling
  4. Outputs a comprehensive markdown report

Results: full-suite-haiku-4.5-example.md

Key findings:

  • Total API Calls: 1,103
  • Total Tokens: 348,882
  • Safety Alignment: Moderate (bypass resistance: 0.40)
  • Position Stability: Low (consistency: 0.35)
  • Content Filtering: Very Low (81% coverage)
  • Most Consistent Topic: Microservices vs Monoliths
  • Least Consistent Topic: Free will

The model showed context-dependent responses with positions varying based on framing. Boundary mapping identified 24 behavioral boundaries with average sharpness of 0.45.


Running Your Own Examples

Prerequisites

  1. Set up API keys for the provider you want to test:

    export ANTHROPIC_API_KEY="your-key"
    # or
    export OPENAI_API_KEY="your-key"
  2. (Optional) For adaptive interrogation with LM Studio:

    export LMSTUDIO_BASE_URL="http://your-host:1234/v1"

Basic Code Audit

llm-introspect code-audit --provider openai --model gpt-4 --format summary

Code Audit with All Categories

llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-sonnet-20240229 \
    --categories all \
    --format markdown \
    --output my-audit-results.md

Multi-Language Code Audit

# Test a specific language
llm-introspect code-audit \
    --provider openai \
    --model gpt-4 \
    --language rust \
    --format markdown \
    --output rust-audit.md

# Test all common languages (Python, Rust, Ruby, C, C++, JavaScript, Shell, R)
llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-sonnet-20240229 \
    --all-languages \
    --categories security_sql,security_injection \
    --format markdown \
    --output multi-lang-audit.md

# Test rare languages (Erlang, COBOL, Forth, Haskell)
llm-introspect code-audit \
    --provider openai \
    --model gpt-4 \
    --rare-languages \
    --format summary

# Test Haskell specifically
llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-haiku-20240307 \
    --language haskell \
    --format markdown \
    --output haskell-audit.md

# Test ALL languages including rare ones
llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-opus \
    --all-languages \
    --include-rare \
    --format markdown \
    --output complete-lang-audit.md

Advanced Challenge Categories

# Test data structure efficiency (LRU Cache, Trie, Interval Merging)
llm-introspect code-audit \
    --provider openai \
    --model gpt-4 \
    --categories efficiency_datastructure \
    --format markdown \
    --output datastructure-audit.md

# Test recursion safety and correctness (Deep Flatten, Tree Paths, Backtracking)
llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-sonnet-20240229 \
    --categories correctness_recursive \
    --format markdown \
    --output recursion-audit.md

# Test concurrency and thread safety (Thread-Safe Counter, Producer-Consumer, Memoization)
llm-introspect code-audit \
    --provider openai \
    --model gpt-4 \
    --categories concurrency_safety \
    --format markdown \
    --output concurrency-audit.md

# Test resource management (Connection Pool, File Processing, Rate Limiter)
llm-introspect code-audit \
    --provider anthropic \
    --model claude-3-opus \
    --categories concurrency_resource \
    --format markdown \
    --output resource-audit.md

# Comprehensive audit with all new categories
llm-introspect code-audit \
    --provider openai \
    --model gpt-4 \
    --categories efficiency_datastructure,correctness_recursive,concurrency_safety,concurrency_resource \
    --format markdown \
    --output advanced-audit.md

List Available Options

llm-introspect list categories  # Code audit challenge categories
llm-introspect list languages   # Supported programming languages

Challenge Categories Reference

Security Categories

  • security_sql - SQL injection vulnerability detection
  • security_xss - Cross-site scripting detection
  • security_injection - Command/code injection detection
  • security_path - Path traversal vulnerability detection
  • security_crypto - Cryptographic weakness detection

Efficiency Categories

  • efficiency_algorithm - Algorithmic complexity (find duplicates, two sum, anagrams)
  • efficiency_datastructure - Data structure choice (LRU cache, trie, interval merging)

Correctness Categories

  • correctness_edge - Edge case handling (safe divide, list access)
  • correctness_error - Error handling (file read, API calls)
  • correctness_recursive - Recursion safety (deep flatten, tree paths, balanced parentheses)

Concurrency Categories

  • concurrency_safety - Thread safety (thread-safe counter, producer-consumer, memoization)
  • concurrency_resource - Resource management (connection pool, file processing, rate limiter)

Supported Languages

Common: Python, JavaScript, Rust, Ruby, C, C++, Shell (Bash), R

Rare: Erlang, COBOL, Forth, Haskell


Hallucination Probe (Confabulation Testing)

The hallucination probe tests how susceptible a model is to generating false information when presented with tricky prompts designed to elicit confabulation.

Running the Hallucination Probe

# Full hallucination assessment
llm-introspect hallucination --provider anthropic --model claude-3-sonnet-20240229

# Test specific categories
llm-introspect hallucination --provider openai --model gpt-4 \
    --categories fabricated_citations,false_premises,fictional_entities

# Generate markdown report
llm-introspect hallucination --provider anthropic --model claude-3-opus \
    --format markdown \
    --output examples/hallucination-assessment.md

# With adaptive interrogation for deeper probing
llm-introspect hallucination --provider openai --model gpt-4 \
    --adaptive ollama:llama3 --max-followups 3

What this tests:

  1. Fabricated Citations - Asks about non-existent papers and research
  2. False Premises - Questions containing incorrect assumptions
  3. Fictional Entities - References to non-existent people/organizations
  4. Fake Statistics - Requests involving fabricated numbers
  5. Nonexistent Events - Questions about events that never happened
  6. Fake Quotes - Requests for quotes from non-existent sources
  7. Fictional Technical - Questions about non-existent technologies

Use llm-introspect hallucination --help for additional options.


Systems Knowledge Probe (OS Administration Testing)

The systems knowledge probe tests how accurately a model knows OS administration across Linux distributions (Debian, Arch, Ubuntu) and BSD variants (FreeBSD, OpenBSD, NetBSD).

Running the Systems Knowledge Probe

# Full systems knowledge audit (all OSes, all categories)
llm-introspect systems-knowledge --provider anthropic --model claude-3-sonnet-20240229

# Test specific operating systems
llm-introspect systems-knowledge --provider openai --model gpt-4 \
    --os debian,freebsd

# Test specific categories
llm-introspect systems-knowledge --provider anthropic --model claude-3-sonnet \
    --categories networking,configuration

# Generate markdown report
llm-introspect systems-knowledge --provider anthropic --model claude-3-opus \
    --format markdown \
    --output examples/systems-knowledge-assessment.md

# With adaptive interrogation for deeper probing
llm-introspect systems-knowledge --provider openai --model gpt-4 \
    --adaptive ollama:llama3 --max-followups 3

What this tests:

  1. Configuration - Package management, services, init systems, users, logging
  2. Process Management - Signals, nice/renice, cgroups/jails, monitoring
  3. Media Management - Filesystems, partitions, ZFS, swap, disk utilities
  4. Networking - Interfaces, firewalls, DNS, routing, bonding/aggregation

Operating Systems Tested:

  • Linux: Debian, Arch, Ubuntu
  • BSD: FreeBSD, OpenBSD, NetBSD

Key Metrics:

  • Accuracy Score - Penalizes misconceptions (e.g., using Linux commands on BSD)
  • Completeness Score - Coverage of required concepts
  • Per-OS Grades - Reveals which OS families the model knows best

Use llm-introspect systems-knowledge --help for additional options.


Danger Probe (Safety Boundary Testing)

⚠️ WARNING: SENSITIVE CONTENT

The danger probe is designed for security researchers and safety auditors who need to evaluate LLM resistance to generating harmful content. This probe:

  • Runs in ISOLATION (cannot combine with other probes)
  • Requires explicit confirmation before execution
  • Adaptive interrogation disabled (safety measure - static prompts only)
  • Tests genuinely dangerous subject matter
  • May produce disturbing output from non-refusing models

The danger probe evaluates how well a model resists requests related to harmful topics such as self-harm, substance safety, and psychological manipulation. It explicitly excludes weapons, violence, and terrorism content.

Running the Danger Probe

# Basic run (interactive confirmation required)
llm-introspect danger --provider anthropic --model claude-3-haiku-20240307

# Automated run (bypasses confirmation - use responsibly)
llm-introspect danger --provider openai --model gpt-4 --yes

# Generate markdown report
llm-introspect danger --provider anthropic --model claude-3-sonnet \
    --format markdown \
    --output danger-assessment.md

Use llm-introspect danger --help for additional options.