[models]: add Gemma 4 E2B text support by Carlos-Marques · Pull Request #3 · exa-labs/neuronx-distributed-inference

Carlos-Marques · 2026-06-09T21:53:38Z

Description

Adds a Gemma 4 text-path model implementation registered as gemma4, using the upstream vLLM/HF Gemma4 text architecture as the reference and NxDI primitives for inference:

Gemma4InferenceConfig
  layer_types[] -> per-layer sliding/full attention configs
  full_attention -> global_head_dim, no sliding window
  sliding_attention -> head_dim, sliding_window
  hidden_size_per_layer_input > 0 -> packed PLE table + projection pipeline
  num_kv_shared_layers > 0 -> checkpoint K/V projection materialization for shared layers

The implementation includes Gemma4 RMSNorm, PLE gates/projections, double-wide MLP sizing for KV-shared tail layers, final-logit softcapping in the LM head, and state-dict conversion for language_model.model.* / model.language_model.* checkpoints.

Model Information

Model Name: Gemma 4 E2B / Gemma 4 E2B-it

Model Architecture: Decoder-only Gemma4 text transformer with mixed sliding/full attention

Purpose: Text generation

Checklist

Required Components

Accuracy Test (ex. test/integration/test_model.py)
- Added CPU/unit coverage for Gemma4 config/state-dict behavior; Neuron compile accuracy test is not included in this PR.
README.md with usage/compatibility/testing sections
- Not included; this follows existing in-tree model structure rather than the contrib folder template.
Source Code (src/)
- src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py
- MODEL_TYPES["gemma4"] registration

Optional Components

Unit Tests (CPU or Neuron-based)
- test/unit/models/gemma4/test_modeling_gemma4.py

Folder Structure

This PR adds an in-tree model under:

src/neuronx_distributed_inference/models/gemma4/
test/unit/models/gemma4/

Testing

How did you test this change?

Local static/syntax checks only; the local environment does not have the repo test dependencies (torch, pytest) installed.

Test Results:

python -m py_compile src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py test/unit/models/gemma4/test_modeling_gemma4.py
ruff check --line-length=120 --ignore=F401,E203 src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py test/unit/models/gemma4/test_modeling_gemma4.py src/neuronx_distributed_inference/utils/constants.py
pre-commit run --files src/neuronx_distributed_inference/models/gemma4/__init__.py src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py src/neuronx_distributed_inference/utils/constants.py test/unit/models/gemma4/test_modeling_gemma4.py

All commands above passed locally.

Compatibility

Tested with:

Neuron SDK Version(s): Not run on Neuron hardware in this session
Instance Type(s): Not run on Neuron hardware in this session
PyTorch Version: Not available in local environment
Python Version: local py_compile interpreter

Additional Information

The implementation targets the Gemma4 text path only; multimodal audio/vision encoders are intentionally out of scope. Upstream reference analyzed: vLLM Gemma4ForCausalLM / Gemma4Model / Gemma4Attention and HF Gemma4Text* implementations.

Related Issues

N/A

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines to the extent applicable for this repository snapshot
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

Link to Devin session: https://app.devin.ai/sessions/8501976bc5364801a8d6d8b8b768a547
Requested by: @Carlos-Marques

Co-Authored-By: carlos <carlosmarques.personal@gmail.com>

devin-ai-integration · 2026-06-09T21:53:41Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment, CI, and merge conflict monitoring

feat(models): add Gemma 4 text support

6c9f09e

Co-Authored-By: carlos <carlosmarques.personal@gmail.com>

devin-ai-integration Bot assigned Carlos-Marques Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[models]: add Gemma 4 E2B text support#3

[models]: add Gemma 4 E2B text support#3
Carlos-Marques wants to merge 1 commit into
mainfrom
devin/1781041202-gemma4-e2b-support

Carlos-Marques commented Jun 9, 2026

Uh oh!

devin-ai-integration Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Carlos-Marques commented Jun 9, 2026

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

devin-ai-integration Bot commented Jun 9, 2026

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant