Support mbridge distillation for any_model#904
Support mbridge distillation for any_model#904danielkorzekwa wants to merge 25 commits intodkorzekwa/any_modelfrom
Conversation
- Add distill_anymodel.py: Knowledge distillation script for AnyModel checkpoints - Add import_anymodel_to_mbridge.py: Import script to convert HF AnyModel to MBridge format - Update base.py: Simplify HeterogeneousBridgeMixin for AnyModel support
- Add __init__.py: Module initialization - Add llama.py: Llama bridge implementation - Add qwen3.py: Qwen3 bridge implementation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
| OmegaConf conversion tries to access per_block_parameters which may not | ||
| be initialized when loading from YAML. Return empty list as fallback. | ||
| """ | ||
| if name == "per_block_parameters": |
There was a problem hiding this comment.
Should this also check if that attr is not set then only return [] else return whatever is set?
There was a problem hiding this comment.
Do you think at some point we should just upstream this in Megatron-Bridge repo since its a standard Megatron feature without anything to do with model optimization?
There was a problem hiding this comment.
pros and cons for both. cons = base.py is NAS related, e.g. no-op in block_configs. However, I think having more generic heterogenous support in mbridge would be useful. I captured it in TODOs.
| export WORKSPACE=/path/to/your/project | ||
| ``` | ||
|
|
||
| 1. **Clone Megatron-Bridge:** |
There was a problem hiding this comment.
Megatron-Bridge is already cloned in the container at /opt/Megatron-Bridge. Why dont we just do following inside the container: cd /opt/Megatron-Bridge && git checkout 960a718cb8989676b258e107d538642717e22e39?
…ze() on DistillationProvider.provide() Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
|
|
||
|
|
||
| @dataclass | ||
| class GenericHeterogeneousProvider(GPTModelProvider, HeterogeneousTransformerConfig): |
There was a problem hiding this comment.
UPDATE: the ModelProvider design is now being replaced by ModelConfig which contains a Builder and a TransformerConfig, rather than trying to be both classes at the same time. (So far Mamba changed, but GPT incoming soon)
https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/7cbcf4a4f3f76b1d37cb5395bf2220f8abc44877/src/megatron/bridge/models/mamba/mamba_builder.py#L86
Just heads up that this will need to be refactored in near future.
There was a problem hiding this comment.
Good to know. Will need fixes in pruning and distillation example as well. TODOs when 26.04 is out
|
|
||
| # Patch upstream module BEFORE importing distill() so isinstance checks work with our local DistillationProvider | ||
| # This must happen before distill() is imported because distill.py imports DistillationProvider at module load time | ||
| megatron.bridge.models.distillation_provider.DistillationProvider = DistillationProvider |
There was a problem hiding this comment.
This is the only significant difference from the existing script, right?
…om a student provider class, e.g., GenericHeterogeneousProvider (to serialize heterogeneous_layers_config_encoded_json Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ion/convert_checkpoints.py Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…tion Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?