Skip to content

Add embedding attack paper to Sentence-level Attack#59

Open
WhymustIhaveaname wants to merge 1 commit intothunlp:masterfrom
WhymustIhaveaname:add-magic-words-paper
Open

Add embedding attack paper to Sentence-level Attack#59
WhymustIhaveaname wants to merge 1 commit intothunlp:masterfrom
WhymustIhaveaname:add-magic-words-paper

Conversation

@WhymustIhaveaname
Copy link
Copy Markdown

Hi, this PR adds one paper to the Sentence-level Attack section:

Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models (arXiv 2501.18280)

The paper discovers universal adversarial suffixes ("magic words") by exploiting bias in text embedding models, enabling both black-box and white-box attacks on LLM safeguards. Tagged as blind and gradient per the repo convention.

Also updated PaperNumber from 155 to 156.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant