Include response for C2C by minzh23 · Pull Request #6 · thu-nics/C2C

minzh23 · 2025-12-20T05:50:23Z

This PR extends the Rosetta model to support the include-response functionality.
It also adds utility scripts for generating LLM responses and for verifying consistency across generated outputs, facilitating data preparation and evaluation.

Copilot

Pull request overview

This PR extends the Rosetta model to support include-response functionality for Cache-to-Cache (C2C) knowledge transfer, enabling the model to apply teacher model knowledge during the response generation phase. Additionally, it introduces utility scripts for generating LLM responses and verifying consistency across model outputs, facilitating dataset preparation and model evaluation.

Key Changes:

Added include_response flag to RosettaModel with monkeypatch-based attention hook mechanism for runtime KV cache injection during response generation
Introduced consistency checking scripts to measure alignment between Rosetta, SLM, and LLM predictions on label segments
Updated dataset generation and training scripts to support GSM8K dataset with new prompt templates

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 26 comments.

Show a summary per file

File	Description
`rosetta/model/wrapper.py`	Added `include_response` parameter and implemented attention monkeypatching mechanism with hooks; refactored forward logic to handle last section differently when include_response is enabled
`rosetta/model/projector.py`	Initialized C2CProjector output layer weights and biases to zero for stable training initialization
`script/train/SFT_train.py`	Added `include_response` and `multi_source_fusion_mode` parameters to RosettaModel instantiation
`rosetta/utils/evaluate.py`	Added `include_response` parameter loading from config for evaluation
`rosetta/train/dataset_adapters.py`	Simplified prompt template to use raw questions without additional instructions; added unused extraction function
`recipe/train_recipe/include_response.json`	New training configuration for include_response experiments with Qwen3-0.6B and Qwen3-32B models
`script/dataset/create_gsm8k.py`	Simplified prompt template, fixed CSV filename from "Dolly" to "gsm8k", enabled thinking mode, and made split configurable
`script/dataset/run_generation.sh`	Updated to generate GSM8K dataset with Qwen3-32B model and adjusted generation parameters
`script/dataset/launch_server.sh`	Updated model path to Qwen3-32B for server deployment
`consistency_scripts/check_rosetta_consistency.py`	New script to compute label consistency between Rosetta, SLM, and LLM models on generated responses
`consistency_scripts/batch_check_consistency.py`	New script to batch-process consistency checks across multiple checkpoints
`consistency_scripts/plot_consistency.py`	New script to visualize consistency rates across training checkpoints
`consistency_scripts/rosetta_consistency_config.json`	Configuration file for consistency checking experiments
`bash/train/include_response.sh`	Training launch script for include_response experiments

Comments suppressed due to low confidence (1)

rosetta/train/dataset_adapters.py:867

Variable _extract_question is not used.

        def _extract_question(text: str) -> str:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fuvty · 2025-12-20T17:15:45Z

@minzh23 Thank you so much for the PR. Could you take a look at copilot's review, fix issues if the review make sense? Otherwise, just notify me and I will do careful human review and test, then merge the code

minzh23 · 2025-12-21T04:31:26Z

@fuvty Thanks for the feedback! I went through the suggestions from Copilot and incorporated the ones that made sense. The updates are mainly comment/docstring polishing and removing some unused code, and they do not change any core logic or behavior.

Could you please take another careful look and review the latest revision manually?

Automated fix for model path references

github-actions · 2025-12-21T20:03:11Z

🔧 Model Path Fix Applied

Found and automatically fixed references to /share/public/public_models/Qwen3-*B paths in the following files:

.github/workflows/fix-qwen-model-paths.yml
recipe/train_recipe/include_response.json
script/consistency/rosetta_consistency_config.json
script/dataset/launch_server.sh
script/dataset/run_generation.sh

These have been changed to use the Qwen/Qwen3-*B format instead. The fix has been committed to this PR.

fuvty

Thank you @minzh23 , I have finished my review and made a few modifications. The code looks good to me now. I am merging it to main. In the meantime, I notice that we are still having undefined load_aggregator functions across files. Perhaps we should also fix that, or remove the unused files entirely.

fuvty · 2025-12-21T20:04:49Z

Please refer to this on how to modify the paths

minzh23 added 4 commits December 20, 2025 13:43

[update] support include response for C2C

022800a

[minor] update data generation scripts

daee1ff

[update] add scripts for consistency check

2709679

[update] projector final layer zero init

fb14cb6

minzh23 requested a review from fuvty December 20, 2025 05:50

fuvty assigned fuvty and Copilot Dec 20, 2025

fuvty added the enhancement New feature or request label Dec 20, 2025

fuvty requested a review from Copilot December 20, 2025 15:06

fuvty unassigned Copilot Dec 20, 2025

Copilot started reviewing on behalf of fuvty December 20, 2025 15:06 View session

Copilot AI reviewed Dec 20, 2025

View reviewed changes

[minor] polish comments

f5ef684

fuvty and others added 2 commits December 21, 2025 15:02

[update] move files and add auto path-check

57b9269

fix: Replace /share/public/public_models/Qwen3-*B with Qwen/Qwen3-*B

38a537e

Automated fix for model path references

fuvty approved these changes Dec 21, 2025

View reviewed changes

Comment thread .github/workflows/fix-qwen-model-paths.yml

Copy link
Copy Markdown

Member

fuvty Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to this on how to modify the paths

fuvty merged commit 6971bef into main Dec 21, 2025

fuvty deleted the include_response branch December 21, 2025 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include response for C2C#6

Include response for C2C#6
fuvty merged 7 commits intomainfrom
include_response

minzh23 commented Dec 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fuvty commented Dec 20, 2025

Uh oh!

minzh23 commented Dec 21, 2025

Uh oh!

github-actions Bot commented Dec 21, 2025

Uh oh!

fuvty left a comment

Uh oh!

fuvty Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

minzh23 commented Dec 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fuvty commented Dec 20, 2025

Uh oh!

minzh23 commented Dec 21, 2025

Uh oh!

github-actions Bot commented Dec 21, 2025

Uh oh!

fuvty left a comment

Choose a reason for hiding this comment

Uh oh!

fuvty Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants