Skip to content

Commit 3504d6f

Browse files
docs: update README
1 parent 9c6f678 commit 3504d6f

File tree

2 files changed

+28
-20
lines changed

2 files changed

+28
-20
lines changed

README.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,15 @@ Furthermore, GraphGen incorporates multi-hop neighborhood sampling to capture co
6161
After data generation, you can use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) and [xtuner](https://github.com/InternLM/xtuner) to finetune your LLMs.
6262

6363
## 📌 Latest Updates
64-
64+
- **2026.01.15**: Added support for synthesizing single-choice, multiple-choice, and fill-in-the-blank question types, suitable for educational and evaluation scenarios.
6565
- **2025.12.26**: Added comprehensive knowledge graph evaluation metrics including accuracy assessment (entity/relation extraction quality), consistency assessment (conflict detection), and structural robustness assessment (noise ratio, connectivity, degree distribution).
6666
- **2025.12.16**: Added [rocksdb](https://github.com/facebook/rocksdb) for key-value storage backend and [kuzudb](https://github.com/kuzudb/kuzu) for graph database backend support.
67-
- **2025.12.16**: Added [vllm](https://github.com/vllm-project/vllm) for local inference backend support.
68-
- **2025.12.16**: Refactored the data generation pipeline using [ray](https://github.com/ray-project/ray) to improve the efficiency of distributed execution and resource management.
6967

7068
<details>
7169
<summary>History</summary>
7270

71+
- **2025.12.16**: Added [vllm](https://github.com/vllm-project/vllm) for local inference backend support.
72+
- **2025.12.16**: Refactored the data generation pipeline using [ray](https://github.com/ray-project/ray) to improve the efficiency of distributed execution and resource management.
7373
- **2025.12.1**: Added search support for [NCBI](https://www.ncbi.nlm.nih.gov/) and [RNAcentral](https://rnacentral.org/) databases, enabling extraction of DNA and RNA data from these bioinformatics databases.
7474
- **2025.10.30**: We support several new LLM clients and inference backends including [Ollama_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/ollama_client.py), [http_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/http_client.py), [HuggingFace Transformers](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/hf_wrapper.py) and [SGLang](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/sglang_wrapper.py).
7575
- **2025.10.23**: We support VQA(Visual Question Answering) data generation now. Run script: `bash scripts/generate/generate_vqa.sh`.
@@ -286,14 +286,17 @@ For any questions, please check [FAQ](https://github.com/open-sciencelab/GraphGe
286286

287287
Pick the desired format and run the matching script:
288288

289-
| Format | Script to run | Notes |
290-
| ------------ | ---------------------------------------------------------------------- | -------------------------------------------------------------------------- |
291-
| `cot` | `bash examples/generate/generate_cot_qa/generate_cot.sh` | Chain-of-Thought Q\&A pairs |
292-
| `atomic` | `bash examples/generate/generate_atomic_qa/generate_atomic.sh` | Atomic Q\&A pairs covering basic knowledge |
293-
| `aggregated` | `bash examples/generate/generate_aggregated_qa/generate_aggregated.sh` | Aggregated Q\&A pairs incorporating complex, integrated knowledge |
294-
| `multi-hop` | `examples/generate/generate_multi_hop_qa/generate_multi_hop.sh` | Multi-hop reasoning Q\&A pairs |
295-
| `vqa` | `bash examples/generate/generate_vqa/generate_vqa.sh` | Visual Question Answering pairs combining visual and textual understanding |
296-
289+
| Format | Script to run | Notes |
290+
|-----------------|------------------------------------------------------------------------------|----------------------------------------------------------------------------|
291+
| `cot` | `bash examples/generate/generate_cot_qa/generate_cot.sh` | Chain-of-Thought Q\&A pairs |
292+
| `atomic` | `bash examples/generate/generate_atomic_qa/generate_atomic.sh` | Atomic Q\&A pairs covering basic knowledge |
293+
| `aggregated` | `bash examples/generate/generate_aggregated_qa/generate_aggregated.sh` | Aggregated Q\&A pairs incorporating complex, integrated knowledge |
294+
| `multi-hop` | `examples/generate/generate_multi_hop_qa/generate_multi_hop.sh` | Multi-hop reasoning Q\&A pairs |
295+
| `vqa` | `bash examples/generate/generate_vqa/generate_vqa.sh` | Visual Question Answering pairs combining visual and textual understanding |
296+
| `multi_choice` | `bash examples/generate/generate_multi_choice_qa/generate_multi_choice.sh` | Multiple-choice question-answer pairs |
297+
| `multi_answer` | `bash examples/generate/generate_multi_answer_qa/generate_multi_answer.sh` | Multiple-answer question-answer pairs |
298+
| `fill_in_blank` | `bash examples/generate/generate_fill_in_blank_qa/generate_fill_in_blank.sh` | Fill-in-the-blank question-answer pairs |
299+
297300

298301
4. Get the generated data
299302
```bash

README_zh.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -62,14 +62,16 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
6262
在数据生成后,您可以使用[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)[xtuner](https://github.com/InternLM/xtuner)对大语言模型进行微调。
6363

6464
## 📌 最新更新
65+
- **2026.01.15**: 支持合成单选题、多选题和填空题型数据,适用于教育和评测场景。
6566
- **2025.12.26**: 新增知识图谱评估指标,包括准确度评估(实体/关系抽取质量)、一致性评估(冲突检测)和结构鲁棒性评估(噪声比、连通性、度分布)。
6667
- **2025.12.16**:新增 [rocksdb](https://github.com/facebook/rocksdb) 作为键值存储后端, [kuzudb](https://github.com/kuzudb/kuzu) 作为图数据库后端的支持。
67-
- **2025.12.16**:新增 [vllm](https://github.com/vllm-project/vllm) 作为本地推理后端的支持。
68-
- **2025.12.16**:使用 [ray](https://github.com/ray-project/ray) 重构了数据生成 pipeline,提升了分布式执行和资源管理的效率。
68+
6969

7070
<details>
7171
<summary>历史更新</summary>
7272

73+
- **2025.12.16**:新增 [vllm](https://github.com/vllm-project/vllm) 作为本地推理后端的支持。
74+
- **2025.12.16**:使用 [ray](https://github.com/ray-project/ray) 重构了数据生成 pipeline,提升了分布式执行和资源管理的效率。
7375
- **2025.12.1**:新增对 [NCBI](https://www.ncbi.nlm.nih.gov/)[RNAcentral](https://rnacentral.org/) 数据库的检索支持,现在可以从这些生物信息学数据库中提取DNA和RNA数据。
7476
- **2025.10.30**:我们支持多种新的 LLM 客户端和推理后端,包括 [Ollama_client]([Ollama_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/ollama_client.py), [http_client](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/api/http_client.py), [HuggingFace Transformers](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/hf_wrapper.py)[SGLang](https://github.com/open-sciencelab/GraphGen/blob/main/graphgen/models/llm/local/sglang_wrapper.py)
7577
- **2025.10.23**:我们现在支持视觉问答(VQA)数据生成。运行脚本:`bash scripts/generate/generate_vqa.sh`
@@ -283,13 +285,16 @@ GraphGen 首先根据源文本构建细粒度的知识图谱,然后利用期
283285

284286
选择所需格式并运行对应脚本:
285287

286-
| 格式 | 运行脚本 | 说明 |
287-
| ------------ | ---------------------------------------------------------------------- | --------------- |
288-
| `cot` | `bash examples/generate/generate_cot_qa/generate_cot.sh` | 思维链问答对 |
289-
| `atomic` | `bash examples/generate/generate_atomic_qa/generate_atomic.sh` | 覆盖基础知识的原子问答对 |
290-
| `aggregated` | `bash examples/generate/generate_aggregated_qa/generate_aggregated.sh` | 整合复杂知识的聚合问答对 |
291-
| `multi-hop` | `bash examples/generate/generate_multi_hop_qa/generate_multi_hop.sh` | 多跳推理问答对 |
292-
| `vqa` | `bash examples/generate/generate_vqa/generate_vqa.sh` | 视觉问答对,结合视觉和文本理解 |
288+
| 格式 | 运行脚本 | 说明 |
289+
|-----------------|------------------------------------------------------------------------------|-----------------|
290+
| `cot` | `bash examples/generate/generate_cot_qa/generate_cot.sh` | 思维链问答对 |
291+
| `atomic` | `bash examples/generate/generate_atomic_qa/generate_atomic.sh` | 覆盖基础知识的原子问答对 |
292+
| `aggregated` | `bash examples/generate/generate_aggregated_qa/generate_aggregated.sh` | 整合复杂知识的聚合问答对 |
293+
| `multi_hop` | `bash examples/generate/generate_multi_hop_qa/generate_multi_hop.sh` | 多跳推理问答对 |
294+
| `vqa` | `bash examples/generate/generate_vqa/generate_vqa.sh` | 视觉问答对,结合视觉和文本理解 |
295+
| `multi_choice` | `bash examples/generate/generate_multi_choice_qa/generate_multi_choice.sh` | 单项选择题问答对 |
296+
| `multi_answer` | `bash examples/generate/generate_multi_answer_qa/generate_multi_answer.sh` | 多项选择题问答对 |
297+
| `fill_in_blank` | `bash examples/generate/generate_fill_in_blank_qa/generate_fill_in_blank.sh` | 填空题问答对 |
293298

294299

295300

0 commit comments

Comments
 (0)