feat(nnx): add preferred_element_type to attention.py and recurrent.py, out_sharding to recurrent.py #5179

aarushisingh04 · 2026-01-07T21:09:38Z

What does this PR do?

This PR adds support for preferred_element_type and out_sharding to core NNX
Attention and RNN modules, aligning them with existing NNX layers such as
Linear and Conv.

The change improves API consistency and gives users fine-grained control over
numerical precision and distributed sharding at the module level, without
changing any default behavior.

Motivation

In NNX, low-level layers like Linear and Conv already expose
preferred_element_type and out_sharding, but higher-level modules that rely
on these layers such as MultiHeadAttention and RNN cells did not forward or
expose these parameters.

Negatives/Difficulties -

Controlling numerical precision for attention and recurrent modules
Applying sharding constraints to module outputs in distributed settings
Writing consistent code when mixing Linear / Conv with Attention or RNNs

This PR closes that gap by plumbing the existing parameters through these
modules in a fully backward-compatible way.

Changes Made

1. Attention module enhancements (`attention.py`)

Updated:

dot_product_attention
dot_product_attention_weights
MultiHeadAttention

Changes:

Added preferred_element_type to function signatures and forwarded it to
internal jnp.einsum calls
Added out_sharding to MultiHeadAttention.__call__ to allow sharding of the
output projection
Updated docstrings to document the new parameters

2. RNN module enhancements (`recurrent.py`)

Updated:

LSTMCell
OptimizedLSTMCell
SimpleCell
GRUCell

Changes:

Added preferred_element_type to __init__ and forwarded it to internal
Linear layers
Added out_sharding to __call__ for all cells
Standardized and completed docstrings (Args / Returns)

Backward Compatibility

No existing behavior is changed
All new parameters are optional
Default numerical precision and sharding remain unchanged unless explicitly
specified by the user

Tests Added

A new targeted test suite was added:

tests/nnx/technical_support_test.py

This suite verifies:

preferred_element_type propagation for Attention and RNN modules
out_sharding exposure and forwarding in MultiHeadAttention and RNN cells

Specific tests include:

test_mha_preferred_element_type
test_mha_out_sharding_signature
test_lstm_preferred_element_type
test_gru_preferred_element_type
test_simple_cell_out_sharding_signature

All tests passed.

Regression Testing

Existing test suites were run to ensure no regressions:

tests/nnx/nn/attention_test.py
tests/nnx/nn/recurrent_test.py

All tests passed.

Verification Command

$env:PYTHONPATH="."; pytest -v \
  tests/nnx/technical_support_test.py \
  tests/nnx/nn/attention_test.py \
  tests/nnx/nn/recurrent_test.py

Checklist

This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other checks if that's the case).
This change is discussed in a Github issue/discussion (please add a link).
The documentation and docstrings adhere to the documentation guidelines.
This change includes necessary high-coverage tests. (No quality testing = no merge!)

…ention and RNN modules

google-cla · 2026-01-07T21:09:42Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

aarushisingh04 · 2026-01-08T20:15:18Z

hey @samanklesaria
could you check this PR out whenever possible?

flax/nnx/nn/attention.py

tests/nnx/technical_support_test.py

samanklesaria · 2026-01-08T21:37:58Z

I'm a bit skeptical of adding out_sharding to MultiHeadAttention. As I see it, out_sharding is used to resolve ambiguity if the explicitly sharded tensors you pass to an operation are sharded inconsistently for some dimension. The MultiHeadAttention layer creates ("output"s) a lot of different tensors when called: Q, K, V, the attention weights, and final output. It seems like the out_sharding argument you add to MultiHeadAttention is used only when projecting the final output. So it can't be used to resolve sharding inconsistencies in any of these intermediate steps. Rather than only offering the user an opportunity to resolve inconsistencies in only one of the many operations implied by MultiHeadAttention, I think it would be cleaner if we kept the current functionality and didn't support it for any of them. If the user wants the output of MultiHeadAttention sharded in a specific way, the can just reshard it manually afterwards.

flax/nnx/nn/recurrent.py

aarushisingh04 · 2026-01-13T11:39:04Z

@samanklesaria

thanks for the feedback

attention.py keeps preferred_element_type, avoided adding out_sharding
recurrent.py has a duplicate import removed (and keeps both preferred_element_type andout_sharding )
moved the tests to attention_test.py and recurrent_test.py

samanklesaria

Looks good to me.

aarushisingh04 · 2026-01-16T13:42:23Z

hey @samanklesaria just checking in, is this okay to merge? thanks!

samanklesaria · 2026-01-16T14:18:01Z

@aarushisingh04 It's in the merge queue. But PRs have to go through a separate testing process on the Google side before they get merged in, so it can take a bit.

chapman20j · 2026-01-22T21:49:43Z

flax/nnx/nn/attention.py

+    )
    if len(query_shape) > 4:
      out = jnp.reshape(out, query_shape)
    return out


Can you incorporate preferred_element_type here as well?

feat(nnx): add preferred_element_type and out_sharding support to Att…

6b1ff68

…ention and RNN modules

samanklesaria reviewed Jan 8, 2026

View reviewed changes

flax/nnx/nn/attention.py Show resolved Hide resolved

samanklesaria reviewed Jan 8, 2026

View reviewed changes

tests/nnx/technical_support_test.py Outdated Show resolved Hide resolved

samanklesaria reviewed Jan 8, 2026

View reviewed changes

flax/nnx/nn/recurrent.py Show resolved Hide resolved

aarushisingh04 added 3 commits January 13, 2026 15:53

Merge branch 'main' into nnx-technical-support

3916a22

Refining attention/rnn modules and moving tests

9d1fa51

refining out_sharding for recurrent.py

5cf3939

aarushisingh04 changed the title ~~feat(nnx): add preferred_element_type and out_sharding support to Attention and RNN modules~~ feat(nnx): add preferred_element_type to attention.py and recurrent.py, out_sharding to recurrent.py Jan 13, 2026

samanklesaria approved these changes Jan 13, 2026

View reviewed changes

chapman20j reviewed Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nnx): add preferred_element_type to attention.py and recurrent.py, out_sharding to recurrent.py #5179

feat(nnx): add preferred_element_type to attention.py and recurrent.py, out_sharding to recurrent.py #5179

aarushisingh04 commented Jan 7, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Jan 7, 2026

Uh oh!

aarushisingh04 commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

samanklesaria commented Jan 8, 2026

Uh oh!

Uh oh!

aarushisingh04 commented Jan 13, 2026

Uh oh!

samanklesaria left a comment

Uh oh!

aarushisingh04 commented Jan 16, 2026

Uh oh!

samanklesaria commented Jan 16, 2026

Uh oh!

chapman20j Jan 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(nnx): add preferred_element_type to attention.py and recurrent.py, out_sharding to recurrent.py #5179

Are you sure you want to change the base?

feat(nnx): add preferred_element_type to attention.py and recurrent.py, out_sharding to recurrent.py #5179

Conversation

aarushisingh04 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Changes Made

1. Attention module enhancements (attention.py)

2. RNN module enhancements (recurrent.py)

Backward Compatibility

Tests Added

Regression Testing

Verification Command

Checklist

Uh oh!

google-cla bot commented Jan 7, 2026

Uh oh!

aarushisingh04 commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

samanklesaria commented Jan 8, 2026

Uh oh!

Uh oh!

aarushisingh04 commented Jan 13, 2026

Uh oh!

samanklesaria left a comment

Choose a reason for hiding this comment

Uh oh!

aarushisingh04 commented Jan 16, 2026

Uh oh!

samanklesaria commented Jan 16, 2026

Uh oh!

chapman20j Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aarushisingh04 commented Jan 7, 2026 •

edited

Loading

1. Attention module enhancements (`attention.py`)

2. RNN module enhancements (`recurrent.py`)

chapman20j Jan 22, 2026 •

edited

Loading