draft: support NemotronH model in HF path by Fridah-nv · Pull Request #943 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-02-27T00:01:21Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

We can use NemotronHForCausalLM with MAMBA_MOE_NVFP4_CONSERVATIVE_CFG or MAMBA_MOE_NVFP4_AGGRESSIVE_CFG

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

copy-pr-bot · 2026-02-27T00:01:25Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-27T00:01:33Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fridah/super-ptq

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Fridah-nv · 2026-02-27T00:12:52Z

modelopt/torch/quantization/plugins/huggingface.py

+    def _setup(self):
+        pass
+
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:


I'm just moving what we have in super_ptq.py to here, can discuss more if we want to align this with other MoE Quantization behavior in ModelOPT

codecov · 2026-02-27T02:18:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.04%. Comparing base (4eacb0d) to head (63dca16).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #943      +/-   ##
==========================================
- Coverage   72.18%   72.04%   -0.14%     
==========================================
  Files         207      207              
  Lines       22656    22718      +62     
==========================================
+ Hits        16355    16368      +13     
- Misses       6301     6350      +49

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv · 2026-02-27T07:11:21Z

modelopt/torch/quantization/plugins/huggingface.py

+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        return super().forward(hidden_states)
+
+    def layer_sync_moe_local_experts_amax(self):


@realAsma is this still necessary?

This is helpful for advanced algorithms. This method only sync the input quantizer amax - the correctly syncd input quantizer amax is required for MSE/GPTQ algorithms.

cjluo-nv · 2026-02-27T07:12:01Z

modelopt/torch/quantization/config.py

    },
    "algorithm": "max",
 }
+


nit: remove

cjluo-nv · 2026-02-27T07:13:00Z

modelopt/torch/export/quant_utils.py

-            wildcards.add(name[:i] + "*")
+        for i in range(len(name)):
+            if name[i] == ".":
+                wildcards.add(name[:i] + "*")


Do you know why we need adding it?

cjluo-nv · 2026-02-27T07:13:53Z

modelopt/torch/export/quant_utils.py

-            _prefix_wildcard_summarize_exclude_modules(
-                exclude_modules, per_layer_config["quantized_layers"].keys()
-            )
+        summarized = _prefix_wildcard_summarize_exclude_modules(


is this a no-op change?

realAsma · 2026-02-27T21:11:01Z

modelopt/torch/quantization/plugins/huggingface.py

+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        return super().forward(hidden_states)
+
+    def layer_sync_moe_local_experts_amax(self):


@Fridah-nv could you please check the implementation of the supported MoE layers

Model-Optimizer/modelopt/torch/quantization/plugins/huggingface.py

Line 442 in 35e6099

class _QuantSparseMoe(QuantModule):

and see whether we can implement this function to the _QuantSparseMoe base class?

realAsma · 2026-02-27T21:11:31Z

modelopt/torch/quantization/plugins/huggingface.py

        return output


+class _QuantNemotronHMOE(QuantModule):


Do we still need this - does this work -

Model-Optimizer/modelopt/torch/quantization/plugins/huggingface.py

Line 1140 in 35e6099

def register_sparse_moe_on_the_fly(model):

realAsma · 2026-02-27T21:12:42Z

modelopt/torch/quantization/plugins/huggingface.py

In my understanding, we dont need explicit registration for attention (it is caught by the attention AST patching). Is that correct?

jenchen13 · 2026-02-28T00:15:51Z

modelopt/torch/quantization/plugins/huggingface.py

+                    amax_dict[name] = (
+                        amax_tensor if stored is None else torch.maximum(stored, amax_tensor)
+                    )
+        for expert in self.experts:


can you iterate through the key/values in the amax_dict instead of through the entire expert modules instead?

support NemotronHMOE, fix export issue

63dca16

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv requested review from Edwardf0t1, cjluo-nv, jenchen13, meenchen, realAsma and sugunav14 February 27, 2026 00:01

Fridah-nv self-assigned this Feb 27, 2026

Fridah-nv commented Feb 27, 2026

View reviewed changes

cjluo-nv reviewed Feb 27, 2026

View reviewed changes

modelopt/torch/quantization/config.py

},

"algorithm": "max",

}

Copy link

Collaborator

cjluo-nv Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

cjluo-nv reviewed Feb 27, 2026

View reviewed changes

realAsma reviewed Feb 27, 2026

View reviewed changes

jenchen13 reviewed Feb 28, 2026

View reviewed changes

Conversation

Fridah-nv commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026

Review skipped

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 27, 2026

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fridah-nv commented Feb 27, 2026 •

edited

Loading