Skip to content

Support mbridge distillation for any_model#904

Open
danielkorzekwa wants to merge 25 commits intodkorzekwa/any_modelfrom
dkorzekwa/any_model_mbridge_distillation
Open

Support mbridge distillation for any_model#904
danielkorzekwa wants to merge 25 commits intodkorzekwa/any_modelfrom
dkorzekwa/any_model_mbridge_distillation

Conversation

@danielkorzekwa
Copy link

What does this PR do?

  • hf_to_mcore mbridge converter (examples for llama and qwen models)
  • distillation script

- Add distill_anymodel.py: Knowledge distillation script for AnyModel checkpoints
- Add import_anymodel_to_mbridge.py: Import script to convert HF AnyModel to MBridge format
- Update base.py: Simplify HeterogeneousBridgeMixin for AnyModel support
- Add __init__.py: Module initialization
- Add llama.py: Llama bridge implementation
- Add qwen3.py: Qwen3 bridge implementation
@danielkorzekwa danielkorzekwa requested review from a team as code owners February 18, 2026 18:26
@danielkorzekwa danielkorzekwa requested review from realAsma and removed request for a team February 18, 2026 18:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • release/.*
  • feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dkorzekwa/any_model_mbridge_distillation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 requested review from AAnoosheh and kevalmorabia97 and removed request for realAsma February 18, 2026 18:29
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
OmegaConf conversion tries to access per_block_parameters which may not
be initialized when loading from YAML. Return empty list as fallback.
"""
if name == "per_block_parameters":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also check if that attr is not set then only return [] else return whatever is set?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think at some point we should just upstream this in Megatron-Bridge repo since its a standard Megatron feature without anything to do with model optimization?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pros and cons for both. cons = base.py is NAS related, e.g. no-op in block_configs. However, I think having more generic heterogenous support in mbridge would be useful. I captured it in TODOs.

export WORKSPACE=/path/to/your/project
```

1. **Clone Megatron-Bridge:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Megatron-Bridge is already cloned in the container at /opt/Megatron-Bridge. Why dont we just do following inside the container: cd /opt/Megatron-Bridge && git checkout 960a718cb8989676b258e107d538642717e22e39?

…ze() on DistillationProvider.provide()

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>


@dataclass
class GenericHeterogeneousProvider(GPTModelProvider, HeterogeneousTransformerConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: the ModelProvider design is now being replaced by ModelConfig which contains a Builder and a TransformerConfig, rather than trying to be both classes at the same time. (So far Mamba changed, but GPT incoming soon)
https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/7cbcf4a4f3f76b1d37cb5395bf2220f8abc44877/src/megatron/bridge/models/mamba/mamba_builder.py#L86

Just heads up that this will need to be refactored in near future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Will need fixes in pruning and distillation example as well. TODOs when 26.04 is out


# Patch upstream module BEFORE importing distill() so isinstance checks work with our local DistillationProvider
# This must happen before distill() is imported because distill.py imports DistillationProvider at module load time
megatron.bridge.models.distillation_provider.DistillationProvider = DistillationProvider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only significant difference from the existing script, right?

@danielkorzekwa danielkorzekwa requested review from a team as code owners February 24, 2026 18:08
@danielkorzekwa danielkorzekwa requested review from a team as code owners February 24, 2026 18:08
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…om a student provider class, e.g., GenericHeterogeneousProvider (to serialize heterogeneous_layers_config_encoded_json

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ion/convert_checkpoints.py

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…tion

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants