Support mbridge distillation for any_model#904
Support mbridge distillation for any_model#904danielkorzekwa wants to merge 136 commits intodkorzekwa/any_modelfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
|
|
||
| @dataclass | ||
| class GenericHeterogeneousProvider(GPTModelProvider, HeterogeneousTransformerConfig): |
There was a problem hiding this comment.
UPDATE: the ModelProvider design is now being replaced by ModelConfig which contains a Builder and a TransformerConfig, rather than trying to be both classes at the same time. (So far Mamba changed, but GPT incoming soon)
https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/7cbcf4a4f3f76b1d37cb5395bf2220f8abc44877/src/megatron/bridge/models/mamba/mamba_builder.py#L86
Just heads up that this will need to be refactored in near future.
There was a problem hiding this comment.
Good to know. Will need fixes in pruning and distillation example as well. TODOs when 26.04 is out
|
|
||
| # Patch upstream module BEFORE importing distill() so isinstance checks work with our local DistillationProvider | ||
| # This must happen before distill() is imported because distill.py imports DistillationProvider at module load time | ||
| megatron.bridge.models.distillation_provider.DistillationProvider = DistillationProvider |
There was a problem hiding this comment.
This is the only significant difference from the existing script, right?
There was a problem hiding this comment.
and also this:
# Import heterogeneous bridges BEFORE AutoBridge.from_hf_pretrained() is called to ensure
# registration takes precedence. The @MegatronModelBridge.register_bridge decorator registers
# bridges when the module is imported. If both LlamaBridge and PuzzletronLlamaAnyModelBridge
# register for the same source (LlamaForCausalLM), the dispatch system uses the last registration.
#
# Note: Currently, bridges are also registered when distillation_provider is imported
# below (via mbridge/__init__.py), but this import will be needed once DistillationProvider
# is upstreamed to Megatron-Bridge and we no longer import from modelopt.torch.puzzletron.
import modelopt.torch.puzzletron.export.mbridge # noqa: F401
## What does this PR do? When running the script, I often see the print stats for tokenization (every `log_interval`) not showing up or showing up very very delayed. Hence using `print(..., flush=True)` to fix this. Also update README that the example shown for tokenization takes too long to run, split into multiple .jsonl files for efficiently running the tokenization; and try out a smaller dataset first to test the script ## Testing <!-- Mention how have you tested your change if applicable. --> Split Nemotron-pretraining-SFT-v1 dataset into multiple .jsonl splits and then tokenize them parallelly in different slurm jobs. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
**Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** Addition of SpecBench Dataset Addition of NVIDID SPEED-Bench dataset, preproc scripts, and custom metrics aggregator Addition of example of converting SpecBench Medusa to this FW Addition of Initial TRTLLM AutoDeploy Specdec support Updates to all frameworks for better perf (overlap/async scheduling etc) <!-- You can potentially add a usage example below. --> ```python ``` <!-- Mention how have you tested your change if applicable. --> <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> * **New Features** * Added SPEED-Bench dataset support with configurable throughput and qualitative configurations * Introduced SpecBench metrics with acceptance rate analysis and visualizations * Added progress bar during benchmark execution * New model implementations for auto-deployment and Medusa-style speculative decoding * Data preparation utility for benchmark datasets * Enhanced metrics with per-category analysis and performance charts * **Documentation** * Updated README with SPEED-Bench workflow and examples * New porting guide for integrating custom benchmark runners * **Refactor** * Streamlined model and runner interfaces for improved flexibility * Consolidated dataset implementations and removed deprecated base classes * **Chores** * Added required dependencies for data handling and visualizations <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
## What does this PR do? **Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** ? ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Refactor** * Enhanced quantization configuration handling for transformer models through improved type validation, ensuring more robust processing of quantized model configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…om a student provider class, e.g., GenericHeterogeneousProvider (to serialize heterogeneous_layers_config_encoded_json Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ion/convert_checkpoints.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…tion Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…el checkpoints - Add Knowledge Distillation recipe (recipe.py) with PP support and TP-friendly KD loss - Add KDLoss (loss.py) for TP-aware KD on precomputed logits - Add patch_automodel.py for loading heterogeneous checkpoints via AnyModel - Add run.py entrypoint (pretrain | kd), kd.yaml and pretrain.yaml configs - Add README with setup, config, and run instructions Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…yers) with fewer layers
# than the template model (32 layers). This allows partial exports when some tensors are missing.
# Note: This is NOT needed when running on real compressed puzzletron student models,
# which have the same number of layers as the template model (some may be skipped via no_op
# in block_configs, but all layer tensors are still present in the checkpoint).
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…orial) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ring to conftest.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…mizer Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Adds gpt-oss-20b support for puzzle any-model pruning. **Type of change:** <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> new feature **Overview:** adds descriptor, converter and yaml configuration files for expert removal. Introduces slight changes on conversion to account for mxfp4 quantized checkpoint of gpt-oss ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> --------- Signed-off-by: mchochowski <mchochowski@nvidia.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: chochowski <Marcin.Chochowski@gmail.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
c1e8171 to
9fea797
Compare
|
New MR will be created due to incorrectly showing unchanged files due to rebasing |
### What does this PR do? Copy of #904 (to not see many unchanged files due to rebase) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> Signed-off-by: Ye Yu <yeyu@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Jingyu Xin <jingyux@nvidia.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com> Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com> Signed-off-by: Shiyang Chen <shiychen@nvidia.com> Signed-off-by: James Shen <yueshen@nvidia.com> Signed-off-by: realAsma <akuriparambi@nvidia.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: unknown <ynankani@nvidia.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: vipandya <vipandya@nvidia.com> Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> Signed-off-by: Will Guo <willg@nvidia.com> Signed-off-by: noeyy-mino <174223378+noeyy-mino@users.noreply.github.com> Signed-off-by: jenchen13 <jennifchen@nvidia.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com> Signed-off-by: Jenny Chen <jennifchen@nvidia.com> Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Konrad Staniszewski <kstaniszewsk@nvidia.com> Signed-off-by: kstaniszewsknv <kstaniszewsk@nvidia.com> Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Tony Yin <toyin@nvidia.com> Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com> Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: mchochowski <mchochowski@nvidia.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: chochowski <Marcin.Chochowski@gmail.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com> Co-authored-by: Chenhan D. Yu <5185878+ChenhanYu@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: jingyu-ml <108295447+jingyu-ml@users.noreply.github.com> Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Qidong Su <soodoshll@gmail.com> Co-authored-by: Shengliang Xu <106840466+shengliangxu@users.noreply.github.com> Co-authored-by: Zhiyu <bestczy317@gmail.com> Co-authored-by: Hrishith Thadicherla <99313418+hthadicherla@users.noreply.github.com> Co-authored-by: sychen52 <41452870+sychen52@users.noreply.github.com> Co-authored-by: yueshen2016 <39203804+yueshen2016@users.noreply.github.com> Co-authored-by: realAsma <86726418+realAsma@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: ynankani-nv <ynankani@nvidia.com> Co-authored-by: vishalpandya1990 <vishalpandya1990@gmail.com> Co-authored-by: danisereb <daserebrenik@nvidia.com> Co-authored-by: binghanc <176802681+binghanc@users.noreply.github.com> Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com> Co-authored-by: willg-nv <willg@nvidia.com> Co-authored-by: noeyy-mino <174223378+noeyy-mino@users.noreply.github.com> Co-authored-by: Jenny Chen <jennifchen@nvidia.com> Co-authored-by: kaix-nv <kaix@nvidia.com> Co-authored-by: kstaniszewsknv <kstaniszewsk@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Tony@NV <60408673+byte-deve@users.noreply.github.com> Co-authored-by: kinjalpatel27 <31936134+kinjalpatel27@users.noreply.github.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Sepehr Sameni <ssameni@nvidia.com> Co-authored-by: chochowski <mchochowski@nvidia.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com>
What does this PR do?