Skip to content

Fix MPS crash on Apple Silicon with grouped-query attention#39

Merged
ssrajadh merged 1 commit intossrajadh:masterfrom
skg-perch:fix/mps-gqa-attention-crash
Apr 12, 2026
Merged

Fix MPS crash on Apple Silicon with grouped-query attention#39
ssrajadh merged 1 commit intossrajadh:masterfrom
skg-perch:fix/mps-gqa-attention-crash

Conversation

@skg-perch
Copy link
Copy Markdown
Contributor

Summary

  • MPS SDPA crashes with "incompatible dimensions" when query and KV head counts differ in grouped-query attention (affects both Qwen3-VL 8B and 2B on Apple Silicon)
  • Falls back to attn_implementation="eager" on MPS to avoid the buggy SDPA kernel
  • Disables a spurious transformers >=5.3 assertion that incorrectly reports video feature/token count mismatches on MPS (both counts are identical but torch._check fails)

Test plan

  • All 150 existing tests pass
  • Tested end-to-end: indexed a 78-min video with --backend local --model qwen2b on Apple Silicon (MPS), ~5s/chunk
  • Search returns correct results after indexing

🤖 Generated with Claude Code

MPS SDPA crashes with "incompatible dimensions" when query and KV head
counts differ (GQA), which affects both Qwen3-VL 8B and 2B models.
Fall back to eager attention on MPS.  Also disable a spurious
transformers >=5.3 assertion that incorrectly reports video feature/token
count mismatches on MPS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: Sulaiman Ghori <sulaiman.ghori@outlook.com>
@ssrajadh ssrajadh merged commit 1346450 into ssrajadh:master Apr 12, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants