Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the Rosetta model to support include-response functionality for Cache-to-Cache (C2C) knowledge transfer, enabling the model to apply teacher model knowledge during the response generation phase. Additionally, it introduces utility scripts for generating LLM responses and verifying consistency across model outputs, facilitating dataset preparation and model evaluation.
Key Changes:
- Added
include_responseflag to RosettaModel with monkeypatch-based attention hook mechanism for runtime KV cache injection during response generation - Introduced consistency checking scripts to measure alignment between Rosetta, SLM, and LLM predictions on label segments
- Updated dataset generation and training scripts to support GSM8K dataset with new prompt templates
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 26 comments.
Show a summary per file
| File | Description |
|---|---|
rosetta/model/wrapper.py |
Added include_response parameter and implemented attention monkeypatching mechanism with hooks; refactored forward logic to handle last section differently when include_response is enabled |
rosetta/model/projector.py |
Initialized C2CProjector output layer weights and biases to zero for stable training initialization |
script/train/SFT_train.py |
Added include_response and multi_source_fusion_mode parameters to RosettaModel instantiation |
rosetta/utils/evaluate.py |
Added include_response parameter loading from config for evaluation |
rosetta/train/dataset_adapters.py |
Simplified prompt template to use raw questions without additional instructions; added unused extraction function |
recipe/train_recipe/include_response.json |
New training configuration for include_response experiments with Qwen3-0.6B and Qwen3-32B models |
script/dataset/create_gsm8k.py |
Simplified prompt template, fixed CSV filename from "Dolly" to "gsm8k", enabled thinking mode, and made split configurable |
script/dataset/run_generation.sh |
Updated to generate GSM8K dataset with Qwen3-32B model and adjusted generation parameters |
script/dataset/launch_server.sh |
Updated model path to Qwen3-32B for server deployment |
consistency_scripts/check_rosetta_consistency.py |
New script to compute label consistency between Rosetta, SLM, and LLM models on generated responses |
consistency_scripts/batch_check_consistency.py |
New script to batch-process consistency checks across multiple checkpoints |
consistency_scripts/plot_consistency.py |
New script to visualize consistency rates across training checkpoints |
consistency_scripts/rosetta_consistency_config.json |
Configuration file for consistency checking experiments |
bash/train/include_response.sh |
Training launch script for include_response experiments |
Comments suppressed due to low confidence (1)
rosetta/train/dataset_adapters.py:867
- Variable _extract_question is not used.
def _extract_question(text: str) -> str:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@minzh23 Thank you so much for the PR. Could you take a look at copilot's review, fix issues if the review make sense? Otherwise, just notify me and I will do careful human review and test, then merge the code |
|
@fuvty Thanks for the feedback! I went through the suggestions from Copilot and incorporated the ones that made sense. The updates are mainly comment/docstring polishing and removing some unused code, and they do not change any core logic or behavior. Could you please take another careful look and review the latest revision manually? |
Automated fix for model path references
|
🔧 Model Path Fix Applied Found and automatically fixed references to
These have been changed to use the |
fuvty
left a comment
There was a problem hiding this comment.
Thank you @minzh23 , I have finished my review and made a few modifications. The code looks good to me now. I am merging it to main. In the meantime, I notice that we are still having undefined load_aggregator functions across files. Perhaps we should also fix that, or remove the unused files entirely.
There was a problem hiding this comment.
Please refer to this on how to modify the paths
This PR extends the Rosetta model to support the include-response functionality.
It also adds utility scripts for generating LLM responses and for verifying consistency across generated outputs, facilitating data preparation and evaluation.