diff --git a/README.md b/README.md index a80367b..37748c2 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Must-read Papers on Textual Adversarial Attack and Defense (TAAD) -![](https://img.shields.io/github/last-commit/thunlp/TAADpapers?color=blue) ![](https://img.shields.io/badge/PaperNumber-155-brightgreen) ![](https://img.shields.io/badge/PRs-Welcome-red) +![](https://img.shields.io/github/last-commit/thunlp/TAADpapers?color=blue) ![](https://img.shields.io/badge/PaperNumber-156-brightgreen) ![](https://img.shields.io/badge/PRs-Welcome-red) This list is currently maintained by [Chenghao Yang](https://yangalan123.github.io/) at UChicago. @@ -44,6 +44,7 @@ We thank all the great [contributors](#contributors) very much. Each paper is attached to one or more following labels indicating how much information the **attack model** knows about the **victim model**: `gradient` (=`white`, all information), `score` (output decision and scores), `decision` (only output decision) and `blind` (nothing) ### 2.1 Sentence-level Attack +1. **Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models**. *Haoyu Liang, Youran Sun, Yunfeng Cai, Jun Zhu, Bo Zhang*. arXiv 2025. `blind` `gradient` [[pdf](https://arxiv.org/abs/2501.18280)] 1. **Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models**. *Jieyu Lin, Jiajie Zou, Nai Ding*. ACL-IJCNLP 2021. `blind` [[pdf](https://aclanthology.org/2021.acl-short.43.pdf)] 1. **Grey-box Adversarial Attack And Defence For Sentiment Classification**. *Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau*. NAACL-HLT 2021. `gradient` [[pdf](https://aclanthology.org/2021.naacl-main.321.pdf)] [[code](https://github.com/ibm-aur-nlp/adv-def-text-dist)] 1. **Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs**. *Kuan-Hao Huang and Kai-Wei Chang*. EACL 2021. [[pdf](https://aclanthology.org/2021.eacl-main.88.pdf)] [[code](https://github.com/uclanlp/synpg)]