Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Must-read Papers on Textual Adversarial Attack and Defense (TAAD)


![](https://img.shields.io/github/last-commit/thunlp/TAADpapers?color=blue) ![](https://img.shields.io/badge/PaperNumber-155-brightgreen) ![](https://img.shields.io/badge/PRs-Welcome-red)
![](https://img.shields.io/github/last-commit/thunlp/TAADpapers?color=blue) ![](https://img.shields.io/badge/PaperNumber-156-brightgreen) ![](https://img.shields.io/badge/PRs-Welcome-red)


This list is currently maintained by [Chenghao Yang](https://yangalan123.github.io/) at UChicago.
Expand Down Expand Up @@ -44,6 +44,7 @@ We thank all the great [contributors](#contributors) very much.
Each paper is attached to one or more following labels indicating how much information the **attack model** knows about the **victim model**: `gradient` (=`white`, all information), `score` (output decision and scores), `decision` (only output decision) and `blind` (nothing)

### 2.1 Sentence-level Attack
1. **Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models**. *Haoyu Liang, Youran Sun, Yunfeng Cai, Jun Zhu, Bo Zhang*. arXiv 2025. `blind` `gradient` [[pdf](https://arxiv.org/abs/2501.18280)]
1. **Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models**. *Jieyu Lin, Jiajie Zou, Nai Ding*. ACL-IJCNLP 2021. `blind` [[pdf](https://aclanthology.org/2021.acl-short.43.pdf)]
1. **Grey-box Adversarial Attack And Defence For Sentiment Classification**. *Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau*. NAACL-HLT 2021. `gradient` [[pdf](https://aclanthology.org/2021.naacl-main.321.pdf)] [[code](https://github.com/ibm-aur-nlp/adv-def-text-dist)]
1. **Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs**. *Kuan-Hao Huang and Kai-Wei Chang*. EACL 2021. [[pdf](https://aclanthology.org/2021.eacl-main.88.pdf)] [[code](https://github.com/uclanlp/synpg)]
Expand Down