A classifier that separates epistemic frame from semantic content before LLM evaluation. Documents and mitigates authority-framing as a safety filter bypass vector.
-
Updated
May 15, 2026 - Python
A classifier that separates epistemic frame from semantic content before LLM evaluation. Documents and mitigates authority-framing as a safety filter bypass vector.
When Aristotle gets a LinkedIn account and starts red-teaming LLMs. System-prompt attack surface testing using first-principles axiom framework. Load it. Ask something terrible. Watch what happens.
Add a description, image, and links to the jailbreak-research topic page so that developers can more easily learn about it.
To associate your repository with the jailbreak-research topic, visit your repo's landing page and select "manage topics."