Support mbridge distillation for any_model by danielkorzekwa · Pull Request #904 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-02-18T18:26:06Z

What does this PR do?

hf_to_mcore mbridge converter (examples for llama and qwen models)
distillation script

coderabbitai · 2026-02-18T18:26:18Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)

main
release/.*
feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7750cf08-cf66-47dc-a581-f323d966d2c2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dkorzekwa/any_model_mbridge_distillation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

modelopt/torch/puzzletron/export/mbridge/base.py

examples/puzzletron/mbridge_distillation/README.md

examples/puzzletron/mbridge_distillation/distill_anymodel.py

AAnoosheh · 2026-02-24T14:07:34Z

modelopt/torch/puzzletron/export/mbridge/base.py

+
+
+@dataclass
+class GenericHeterogeneousProvider(GPTModelProvider, HeterogeneousTransformerConfig):


UPDATE: the ModelProvider design is now being replaced by ModelConfig which contains a Builder and a TransformerConfig, rather than trying to be both classes at the same time. (So far Mamba changed, but GPT incoming soon)
https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/7cbcf4a4f3f76b1d37cb5395bf2220f8abc44877/src/megatron/bridge/models/mamba/mamba_builder.py#L86

Just heads up that this will need to be refactored in near future.

Good to know. Will need fixes in pruning and distillation example as well. TODOs when 26.04 is out

added to TODOs

AAnoosheh · 2026-02-24T14:15:14Z

examples/puzzletron/mbridge_distillation/distill_hf.py

+
+# Patch upstream module BEFORE importing distill() so isinstance checks work with our local DistillationProvider
+# This must happen before distill() is imported because distill.py imports DistillationProvider at module load time
+megatron.bridge.models.distillation_provider.DistillationProvider = DistillationProvider


This is the only significant difference from the existing script, right?

and also this:

# Import heterogeneous bridges BEFORE AutoBridge.from_hf_pretrained() is called to ensure # registration takes precedence. The @MegatronModelBridge.register_bridge decorator registers # bridges when the module is imported. If both LlamaBridge and PuzzletronLlamaAnyModelBridge # register for the same source (LlamaForCausalLM), the dispatch system uses the last registration. # # Note: Currently, bridges are also registered when distillation_provider is imported # below (via mbridge/__init__.py), but this import will be needed once DistillationProvider # is upstreamed to Megatron-Bridge and we no longer import from modelopt.torch.puzzletron. import modelopt.torch.puzzletron.export.mbridge # noqa: F401

## What does this PR do? When running the script, I often see the print stats for tokenization (every `log_interval`) not showing up or showing up very very delayed. Hence using `print(..., flush=True)` to fix this. Also update README that the example shown for tokenization takes too long to run, split into multiple .jsonl files for efficiently running the tokenization; and try out a smaller dataset first to test the script ## Testing  Split Nemotron-pretraining-SFT-v1 dataset into multiple .jsonl splits and then tokenize them parallelly in different slurm jobs. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

**Type of change:** ?  **Overview:** Addition of SpecBench Dataset Addition of NVIDID SPEED-Bench dataset, preproc scripts, and custom metrics aggregator Addition of example of converting SpecBench Medusa to this FW Addition of Initial TRTLLM AutoDeploy Specdec support Updates to all frameworks for better perf (overlap/async scheduling etc)  ```python ```   - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No    * **New Features** * Added SPEED-Bench dataset support with configurable throughput and qualitative configurations * Introduced SpecBench metrics with acceptance rate analysis and visualizations * Added progress bar during benchmark execution * New model implementations for auto-deployment and Medusa-style speculative decoding * Data preparation utility for benchmark datasets * Enhanced metrics with per-category analysis and performance charts * **Documentation** * Updated README with SPEED-Bench workflow and examples * New porting guide for integrating custom benchmark runners * **Refactor** * Streamlined model and runner interfaces for improved flexibility * Consolidated dataset implementations and removed deprecated base classes * **Chores** * Added required dependencies for data handling and visualizations  Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

## What does this PR do? **Type of change:** ?  **Overview:** ? ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information   ## Summary by CodeRabbit ## Release Notes * **Refactor** * Enhanced quantization configuration handling for transformer models through improved type validation, ensuring more robust processing of quantized model configurations.  Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…om a student provider class, e.g., GenericHeterogeneousProvider (to serialize heterogeneous_layers_config_encoded_json Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…ion/convert_checkpoints.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…el checkpoints - Add Knowledge Distillation recipe (recipe.py) with PP support and TP-friendly KD loss - Add KDLoss (loss.py) for TP-aware KD on precomputed logits - Add patch_automodel.py for loading heterogeneous checkpoints via AnyModel - Add run.py entrypoint (pretrain | kd), kd.yaml and pretrain.yaml configs - Add README with setup, config, and run instructions Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…yers) with fewer layers # than the template model (32 layers). This allows partial exports when some tensors are missing. # Note: This is NOT needed when running on real compressed puzzletron student models, # which have the same number of layers as the template model (some may be skipped via no_op # in block_configs, but all layer tensors are still present in the checkpoint). Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…orial) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…ring to conftest.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…mizer Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

## What does this PR do? Adds gpt-oss-20b support for puzzle any-model pruning. **Type of change:**  new feature **Overview:** adds descriptor, converter and yaml configuration files for expert removal. Introduces slight changes on conversion to account for mxfp4 quantized checkpoint of gpt-oss ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information  --------- Signed-off-by: mchochowski <mchochowski@nvidia.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: chochowski <Marcin.Chochowski@gmail.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa · 2026-03-04T11:49:46Z

New MR will be created due to incorrectly showing unchanged files due to rebasing

### What does this PR do? Copy of #904 (to not see many unchanged files due to rebase) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> Signed-off-by: Ye Yu <yeyu@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Jingyu Xin <jingyux@nvidia.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com> Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com> Signed-off-by: Shiyang Chen <shiychen@nvidia.com> Signed-off-by: James Shen <yueshen@nvidia.com> Signed-off-by: realAsma <akuriparambi@nvidia.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: unknown <ynankani@nvidia.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: vipandya <vipandya@nvidia.com> Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> Signed-off-by: Will Guo <willg@nvidia.com> Signed-off-by: noeyy-mino <174223378+noeyy-mino@users.noreply.github.com> Signed-off-by: jenchen13 <jennifchen@nvidia.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com> Signed-off-by: Jenny Chen <jennifchen@nvidia.com> Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Konrad Staniszewski <kstaniszewsk@nvidia.com> Signed-off-by: kstaniszewsknv <kstaniszewsk@nvidia.com> Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Signed-off-by: Tony Yin <toyin@nvidia.com> Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com> Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: mchochowski <mchochowski@nvidia.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: chochowski <Marcin.Chochowski@gmail.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com> Co-authored-by: Chenhan D. Yu <5185878+ChenhanYu@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: jingyu-ml <108295447+jingyu-ml@users.noreply.github.com> Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Qidong Su <soodoshll@gmail.com> Co-authored-by: Shengliang Xu <106840466+shengliangxu@users.noreply.github.com> Co-authored-by: Zhiyu <bestczy317@gmail.com> Co-authored-by: Hrishith Thadicherla <99313418+hthadicherla@users.noreply.github.com> Co-authored-by: sychen52 <41452870+sychen52@users.noreply.github.com> Co-authored-by: yueshen2016 <39203804+yueshen2016@users.noreply.github.com> Co-authored-by: realAsma <86726418+realAsma@users.noreply.github.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: ynankani-nv <ynankani@nvidia.com> Co-authored-by: vishalpandya1990 <vishalpandya1990@gmail.com> Co-authored-by: danisereb <daserebrenik@nvidia.com> Co-authored-by: binghanc <176802681+binghanc@users.noreply.github.com> Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com> Co-authored-by: willg-nv <willg@nvidia.com> Co-authored-by: noeyy-mino <174223378+noeyy-mino@users.noreply.github.com> Co-authored-by: Jenny Chen <jennifchen@nvidia.com> Co-authored-by: kaix-nv <kaix@nvidia.com> Co-authored-by: kstaniszewsknv <kstaniszewsk@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Tony@NV <60408673+byte-deve@users.noreply.github.com> Co-authored-by: kinjalpatel27 <31936134+kinjalpatel27@users.noreply.github.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Sepehr Sameni <ssameni@nvidia.com> Co-authored-by: chochowski <mchochowski@nvidia.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com>

danielkorzekwa requested review from a team as code owners February 18, 2026 18:26

danielkorzekwa requested review from realAsma and removed request for a team February 18, 2026 18:26

kevalmorabia97 requested review from AAnoosheh and kevalmorabia97 and removed request for realAsma February 18, 2026 18:29

kevalmorabia97 reviewed Feb 20, 2026

View reviewed changes

AAnoosheh reviewed Feb 24, 2026

View reviewed changes

danielkorzekwa requested review from a team as code owners February 24, 2026 18:08

danielkorzekwa requested review from vishalpandya1990 and removed request for a team February 24, 2026 18:08

kevalmorabia97 and others added 24 commits March 4, 2026 03:27

Fix serializing a distillation checkpoint to also serialize fields fr…

fd2d279

…om a student provider class, e.g., GenericHeterogeneousProvider (to serialize heterogeneous_layers_config_encoded_json Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Create a script for export_mbridge_to_hf.py based on examples/convers…

4aee231

…ion/convert_checkpoints.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

remove import functionality

c07835c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Destroy process group before creating a new one + README for distilla…

9d9acbd

…tion Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add qwen distillation results to distillation tutorial

28bd48b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code refactoring

94bbddf

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Remove not needed method

48e89ed

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

clean up docs

b4107a9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed file

b687072

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Integration test for distill_hf

a342b96

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

change mbs in the test from 2 to 1 (to match mbrdige distillation tut…

5c9c9fb

…orial) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Improve comments

b20b2e1

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Verify that the distilled model can be loaded in HuggingFace format

88f2295

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed import script

c070633

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update Dataset Preparation setep

c515be2

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Improve ## Setup section in mbridge distillation readme. Code refacto…

3372022

…ring to conftest.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Set the current working dir in a docker container: -w /opt/Model-Opti…

dd010f1

…mizer Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

replace submit_job with srun

5ce7362

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa force-pushed the dkorzekwa/any_model_mbridge_distillation branch from c1e8171 to 9fea797 Compare March 4, 2026 11:29

danielkorzekwa closed this Mar 4, 2026

This was referenced Mar 4, 2026

Dkorzekwa/any model mbridge distillation clean #968

Closed

Dkorzekwa/any model mbridge distillation #969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support mbridge distillation for any_model#904

Support mbridge distillation for any_model#904
danielkorzekwa wants to merge 136 commits intodkorzekwa/any_modelfrom
dkorzekwa/any_model_mbridge_distillation

danielkorzekwa commented Feb 18, 2026

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AAnoosheh Feb 24, 2026

Uh oh!

kevalmorabia97 Feb 24, 2026

Uh oh!

danielkorzekwa Mar 2, 2026

Uh oh!

AAnoosheh Feb 24, 2026

Uh oh!

danielkorzekwa Mar 2, 2026 •

edited

Loading

Uh oh!

danielkorzekwa commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants



		@dataclass
		class GenericHeterogeneousProvider(GPTModelProvider, HeterogeneousTransformerConfig):

Conversation

danielkorzekwa commented Feb 18, 2026

What does this PR do?

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AAnoosheh Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

danielkorzekwa Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

AAnoosheh Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

danielkorzekwa Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielkorzekwa commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

danielkorzekwa Mar 2, 2026 •

edited

Loading