-
Notifications
You must be signed in to change notification settings - Fork 148
Pretraining data
zhezhaoa edited this page Aug 25, 2023
·
2 revisions
CLUECorpusSmall consists of news, web, wiki, and comments corpus. The original data and detailed description can be found here.
| Corpus | Link |
|---|---|
| CLUECorpusSmall | https://share.weiyun.com/sC6PMhxx |
| CLUECorpusSmall (BERT format) | https://share.weiyun.com/9SPPGUOK |
News Commentary v13 consists of parallel data and can be downloaded from here.
| Corpus | Link |
|---|---|
| news-Commentary-v13-en-zh | https://share.weiyun.com/PLMxw6ae |
| news-Commentary-v13-zh-en | https://share.weiyun.com/5rMwRhDi |
| news-Commentary-v13-en-zh_sampled | https://share.weiyun.com/1KTxq3Dc |
CIFAR100_nolabel consists of 50 thousand images which can be used by unsupervised pre-training. CIFAR100_nolabel can be downloaded from here
| Corpus | Link |
|---|---|
| CIFAR100_nolabel | https://share.weiyun.com/M2tA9P8p |