Skip to content

[models]: add Gemma 4 E2B text support#3

Open
Carlos-Marques wants to merge 1 commit into
mainfrom
devin/1781041202-gemma4-e2b-support
Open

[models]: add Gemma 4 E2B text support#3
Carlos-Marques wants to merge 1 commit into
mainfrom
devin/1781041202-gemma4-e2b-support

Conversation

@Carlos-Marques

Copy link
Copy Markdown

Description

Adds a Gemma 4 text-path model implementation registered as gemma4, using the upstream vLLM/HF Gemma4 text architecture as the reference and NxDI primitives for inference:

Gemma4InferenceConfig
  layer_types[] -> per-layer sliding/full attention configs
  full_attention -> global_head_dim, no sliding window
  sliding_attention -> head_dim, sliding_window
  hidden_size_per_layer_input > 0 -> packed PLE table + projection pipeline
  num_kv_shared_layers > 0 -> checkpoint K/V projection materialization for shared layers

The implementation includes Gemma4 RMSNorm, PLE gates/projections, double-wide MLP sizing for KV-shared tail layers, final-logit softcapping in the LM head, and state-dict conversion for language_model.model.* / model.language_model.* checkpoints.

Model Information

Model Name: Gemma 4 E2B / Gemma 4 E2B-it

Model Architecture: Decoder-only Gemma4 text transformer with mixed sliding/full attention

Purpose: Text generation

Checklist

Required Components

  • Accuracy Test (ex. test/integration/test_model.py)
    • Added CPU/unit coverage for Gemma4 config/state-dict behavior; Neuron compile accuracy test is not included in this PR.
  • README.md with usage/compatibility/testing sections
    • Not included; this follows existing in-tree model structure rather than the contrib folder template.
  • Source Code (src/)
    • src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py
    • MODEL_TYPES["gemma4"] registration

Optional Components

  • Unit Tests (CPU or Neuron-based)
    • test/unit/models/gemma4/test_modeling_gemma4.py

Folder Structure

This PR adds an in-tree model under:

src/neuronx_distributed_inference/models/gemma4/
test/unit/models/gemma4/

Testing

How did you test this change?

Local static/syntax checks only; the local environment does not have the repo test dependencies (torch, pytest) installed.

Test Results:

python -m py_compile src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py test/unit/models/gemma4/test_modeling_gemma4.py
ruff check --line-length=120 --ignore=F401,E203 src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py test/unit/models/gemma4/test_modeling_gemma4.py src/neuronx_distributed_inference/utils/constants.py
pre-commit run --files src/neuronx_distributed_inference/models/gemma4/__init__.py src/neuronx_distributed_inference/models/gemma4/modeling_gemma4.py src/neuronx_distributed_inference/utils/constants.py test/unit/models/gemma4/test_modeling_gemma4.py

All commands above passed locally.

Compatibility

Tested with:

  • Neuron SDK Version(s): Not run on Neuron hardware in this session
  • Instance Type(s): Not run on Neuron hardware in this session
  • PyTorch Version: Not available in local environment
  • Python Version: local py_compile interpreter

Additional Information

The implementation targets the Gemma4 text path only; multimodal audio/vision encoders are intentionally out of scope. Upstream reference analyzed: vLLM Gemma4ForCausalLM / Gemma4Model / Gemma4Attention and HF Gemma4Text* implementations.

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines to the extent applicable for this repository snapshot
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

Link to Devin session: https://app.devin.ai/sessions/8501976bc5364801a8d6d8b8b768a547
Requested by: @Carlos-Marques

Co-Authored-By: carlos <carlosmarques.personal@gmail.com>
@devin-ai-integration

Copy link
Copy Markdown

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant