Fix/pallas tests shared memory by AratiGanesh · Pull Request #581 · ROCm/jax

AratiGanesh · 2025-12-15T15:39:08Z

Motivation

Pallas GPU tests (gpu_ops_test.py and ops_test.py) were crashing with RESOURCE_EXHAUSTED errors on devices (MI250) with limited shared memory. This PR aims to estimate the shared memory and ensure that tests either adjust to fit the available memory or skip if they cannot run.

Technical Details

gpu_ops_test.py - Uses similar logic as https://github.com/ROCm/jax/pull/559/files.
Implements a shared memory estimate function for fused attention. Tests now automatically reduce block sizes (128x128 → 32x32) if the original config exceeds device limits. The backward pass logic also scales its gradients blocks proportionally to the forward pass.
ops_test.py - Estimates the share memory and if it cannot fit matrices, it skips the test.

Test Plan

Rerun
pytest tests/pallas/gpu_ops_test.py -v
pytest jax/tests/pallas/gpu_ops_test.py -v

Test Result

All tests pass.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

mminutoli

it's really minor comments with the intent of making it so it is easier to review from upstream. Please, fix them and let's create a PR also for upstream jax.

mminutoli · 2025-12-15T17:20:28Z

build/rocm-test-requirements.txt

Can we take this out of this PR?

I'd like to be an "atomic" change fixing only 1 issue that we can send to upstream jax.

tests/pallas/gpu_ops_test.py

mminutoli · 2025-12-15T17:26:44Z

tests/pallas/gpu_ops_test.py

+    block_k_orig = original_blocks["block_k"]
+    dtype = jnp.float16
+
+    adjusted_q, adjusted_k = self._adjust_mha_params_for_shared_memory(


I am ok with this like this, but I am wondering if they'll want this done only for GPUs.

…ally reduce the block sizes until it fits the memory. - Fix `ops_test.py`: Add accurate shared memory estimation code and skip test if memory is exceeded

AratiGanesh requested a review from a team as a code owner December 15, 2025 15:39

AratiGanesh changed the base branch from main to rocm-jaxlib-v0.8.0 December 15, 2025 15:43

AratiGanesh requested review from mminutoli and removed request for a team December 15, 2025 15:46

mminutoli requested changes Dec 15, 2025

View reviewed changes

AratiGanesh force-pushed the fix/pallas-tests-shared-memory branch 2 times, most recently from 9f43140 to 7b0b014 Compare December 29, 2025 22:09

- Fix gpu_ops_test.py: Implement block size reduction code. Dynamic…

cff5d16

…ally reduce the block sizes until it fits the memory. - Fix `ops_test.py`: Add accurate shared memory estimation code and skip test if memory is exceeded

AratiGanesh force-pushed the fix/pallas-tests-shared-memory branch from 7b0b014 to cff5d16 Compare December 29, 2025 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix/pallas tests shared memory#581

Fix/pallas tests shared memory#581
AratiGanesh wants to merge 1 commit intorocm-jaxlib-v0.8.0from
fix/pallas-tests-shared-memory

AratiGanesh commented Dec 15, 2025

Uh oh!

mminutoli left a comment

Uh oh!

mminutoli Dec 15, 2025

Uh oh!

Uh oh!

mminutoli Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

AratiGanesh commented Dec 15, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

mminutoli left a comment

Choose a reason for hiding this comment

Uh oh!

mminutoli Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mminutoli Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants