LLM alignment jailbreak; a set of instructions for auditing their internal reasoning and uncovering biases
-
Updated
Jun 1, 2026
LLM alignment jailbreak; a set of instructions for auditing their internal reasoning and uncovering biases
Symbolic containment system for recursive logic traps — ARCHON firewall + audit
Browse LessWrong posts, tags, authors, and comment threads as structured records from the terminal.
Add a description, image, and links to the rationalism topic page so that developers can more easily learn about it.
To associate your repository with the rationalism topic, visit your repo's landing page and select "manage topics."