Skip to content

Include response for C2C#6

Merged
fuvty merged 7 commits intomainfrom
include_response
Dec 21, 2025
Merged

Include response for C2C#6
fuvty merged 7 commits intomainfrom
include_response

Conversation

@minzh23
Copy link
Copy Markdown
Collaborator

@minzh23 minzh23 commented Dec 20, 2025

This PR extends the Rosetta model to support the include-response functionality.
It also adds utility scripts for generating LLM responses and for verifying consistency across generated outputs, facilitating data preparation and evaluation.

@minzh23 minzh23 requested a review from fuvty December 20, 2025 05:50
@fuvty fuvty added the enhancement New feature or request label Dec 20, 2025
@fuvty fuvty requested a review from Copilot December 20, 2025 15:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Rosetta model to support include-response functionality for Cache-to-Cache (C2C) knowledge transfer, enabling the model to apply teacher model knowledge during the response generation phase. Additionally, it introduces utility scripts for generating LLM responses and verifying consistency across model outputs, facilitating dataset preparation and model evaluation.

Key Changes:

  • Added include_response flag to RosettaModel with monkeypatch-based attention hook mechanism for runtime KV cache injection during response generation
  • Introduced consistency checking scripts to measure alignment between Rosetta, SLM, and LLM predictions on label segments
  • Updated dataset generation and training scripts to support GSM8K dataset with new prompt templates

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 26 comments.

Show a summary per file
File Description
rosetta/model/wrapper.py Added include_response parameter and implemented attention monkeypatching mechanism with hooks; refactored forward logic to handle last section differently when include_response is enabled
rosetta/model/projector.py Initialized C2CProjector output layer weights and biases to zero for stable training initialization
script/train/SFT_train.py Added include_response and multi_source_fusion_mode parameters to RosettaModel instantiation
rosetta/utils/evaluate.py Added include_response parameter loading from config for evaluation
rosetta/train/dataset_adapters.py Simplified prompt template to use raw questions without additional instructions; added unused extraction function
recipe/train_recipe/include_response.json New training configuration for include_response experiments with Qwen3-0.6B and Qwen3-32B models
script/dataset/create_gsm8k.py Simplified prompt template, fixed CSV filename from "Dolly" to "gsm8k", enabled thinking mode, and made split configurable
script/dataset/run_generation.sh Updated to generate GSM8K dataset with Qwen3-32B model and adjusted generation parameters
script/dataset/launch_server.sh Updated model path to Qwen3-32B for server deployment
consistency_scripts/check_rosetta_consistency.py New script to compute label consistency between Rosetta, SLM, and LLM models on generated responses
consistency_scripts/batch_check_consistency.py New script to batch-process consistency checks across multiple checkpoints
consistency_scripts/plot_consistency.py New script to visualize consistency rates across training checkpoints
consistency_scripts/rosetta_consistency_config.json Configuration file for consistency checking experiments
bash/train/include_response.sh Training launch script for include_response experiments
Comments suppressed due to low confidence (1)

rosetta/train/dataset_adapters.py:867

  • Variable _extract_question is not used.
        def _extract_question(text: str) -> str:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rosetta/model/wrapper.py
Comment thread consistency_scripts/check_rosetta_consistency.py Outdated
Comment thread script/consistency/check_rosetta_consistency.py
Comment thread consistency_scripts/check_rosetta_consistency.py Outdated
Comment thread consistency_scripts/plot_consistency.py
Comment thread consistency_scripts/plot_consistency.py Outdated
Comment thread consistency_scripts/plot_consistency.py Outdated
Comment thread consistency_scripts/plot_consistency.py Outdated
Comment thread consistency_scripts/check_rosetta_consistency.py Outdated
Comment thread script/consistency/batch_check_consistency.py
@fuvty
Copy link
Copy Markdown
Member

fuvty commented Dec 20, 2025

@minzh23 Thank you so much for the PR. Could you take a look at copilot's review, fix issues if the review make sense? Otherwise, just notify me and I will do careful human review and test, then merge the code

@minzh23
Copy link
Copy Markdown
Collaborator Author

minzh23 commented Dec 21, 2025

@fuvty Thanks for the feedback! I went through the suggestions from Copilot and incorporated the ones that made sense. The updates are mainly comment/docstring polishing and removing some unused code, and they do not change any core logic or behavior.

Could you please take another careful look and review the latest revision manually?

@github-actions
Copy link
Copy Markdown
Contributor

🔧 Model Path Fix Applied

Found and automatically fixed references to /share/public/public_models/Qwen3-*B paths in the following files:

  • .github/workflows/fix-qwen-model-paths.yml
  • recipe/train_recipe/include_response.json
  • script/consistency/rosetta_consistency_config.json
  • script/dataset/launch_server.sh
  • script/dataset/run_generation.sh

These have been changed to use the Qwen/Qwen3-*B format instead. The fix has been committed to this PR.

Copy link
Copy Markdown
Member

@fuvty fuvty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @minzh23 , I have finished my review and made a few modifications. The code looks good to me now. I am merging it to main. In the meantime, I notice that we are still having undefined load_aggregator functions across files. Perhaps we should also fix that, or remove the unused files entirely.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to this on how to modify the paths

@fuvty fuvty merged commit 6971bef into main Dec 21, 2025
@fuvty fuvty deleted the include_response branch December 21, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants