Skip to content

Fix vector size check for strided tensors with unit strides on non-axis dims#1312

Merged
louisfd merged 1 commit intotracel-ai:mainfrom
antimora:fix-vectorize-strided-unfold
Apr 24, 2026
Merged

Fix vector size check for strided tensors with unit strides on non-axis dims#1312
louisfd merged 1 commit intotracel-ai:mainfrom
antimora:fix-vectorize-strided-unfold

Conversation

@antimora
Copy link
Copy Markdown
Collaborator

Summary

try_tensor_vector_size_parallel picked next_stride by filtering strides > 1 and taking the min. The > 1 filter was meant to exclude the axis's own unit stride, but it also silently excluded legitimate non-axis dims with stride 1, letting invalid vector sizes slip through.

Changed to filter by i != axis && s != 0: skip the axis and broadcast dims, keep everything else (including unit-strided non-axis dims).

Reproducer

Burn's STFT test failed because unfold(dim, n_fft=4, step=1) on a [1, 12] contiguous tensor produces shape [1, 9, 4] with strides [12, 1, 1]. Both the frame dim and the sample dim have stride 1. When that view is materialized via copy_gpu_ref:

  1. tensor_vector_size_parallel returns vector_size = 4 (12 is divisible by 4; the problematic stride-1 on the frame dim is filtered out as it's not > 1).
  2. The copy kernel indexes the source via index_offset_contiguous_fastdivmod, which computes an element offset then divides by vector_size.
  3. For output vector 1 the source element offset is 1 (= frame_coord * stride[frame] = 1 * 1), which rounds down to source vector 0 after the divide. The kernel reads source vector 0 instead of the unaligned data starting at element 1, so output vector 1 gets the same data as vector 0.

Result: frames duplicated in n_fft-sized chunks — the first 4 output frames all contain the first 4 elements of the signal, the next 4 contain elements 4..8, etc. Visible as stft_istft_roundtrip_hann_window failing in Burn (tracel-ai/burn at rev d7bed26, any backend routed through cubecl).

Fix

The correct invariant: for every dim other than the axis, changing that coord must shift the source offset by a multiple of vector_size (or by zero, for broadcasts). The old filter s > 1 silently dropped stride-1 non-axis dims, violating this.

New filter: i != axis && s != 0. Unit-size dims with stride 1 are now included, which can cause false negatives (vectorization disabled) but never incorrect output - acceptable since the function's contract already allows being "overly strict."

Regression tests

Added unit tests in cubecl-core/src/lib.rs covering:

  • Contiguous tensor: max vector size accepted
  • Unfold with step=1: falls back to vec_size 1 (the bug above)
  • Unfold with step=2: accepts vec_size 2, rejects 4
  • Broadcast dim (stride 0): ignored, not counted as a blocker
  • Axis stride != 1: returns StrideMismatch

Each test also exercises the path that produced incorrect output before the fix; verified the step=1 test fails on the old code (returns 4 instead of 1) and passes on the new code.

Test plan

  • cargo test --lib tests:: in cubecl-core passes
  • Burn's stft_istft_roundtrip_hann_window passes with the original unfold path (no workaround needed)
  • All 21 Burn stft tests pass with the cubecl fix alone

…is dims

try_tensor_vector_size_parallel computed next_stride as the smallest stride
> 1, which skipped non-axis dims with stride 1 (produced e.g. by unfold
with step=1). That caused vectorized copies to return wrong data:
index_offset_contiguous_fastdivmod would compute an unaligned source
offset (stride 1 * frame_coord) and divide by the vector size, reading
the wrong source vector and duplicating frames.

Fix: use the smallest non-zero stride outside the axis. Broadcast dims
(stride 0) are still ignored; unit-size dims with stride 1 may now
disable vectorization (false negative, not incorrect output).

Reproduces as Burn's stft_istft_roundtrip_hann_window failure where
unfold(1, 4, 1) on a len-12 signal produced frames duplicated in
4-element chunks instead of sliding by one sample.
Copy link
Copy Markdown
Member

@louisfd louisfd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@louisfd louisfd merged commit 21c9705 into tracel-ai:main Apr 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants