Fix MPS crash on Apple Silicon with grouped-query attention by skg-perch · Pull Request #39 · ssrajadh/sentrysearch

skg-perch · 2026-04-11T01:28:19Z

Summary

MPS SDPA crashes with "incompatible dimensions" when query and KV head counts differ in grouped-query attention (affects both Qwen3-VL 8B and 2B on Apple Silicon)
Falls back to attn_implementation="eager" on MPS to avoid the buggy SDPA kernel
Disables a spurious transformers >=5.3 assertion that incorrectly reports video feature/token count mismatches on MPS (both counts are identical but torch._check fails)

Test plan

All 150 existing tests pass
Tested end-to-end: indexed a 78-min video with --backend local --model qwen2b on Apple Silicon (MPS), ~5s/chunk
Search returns correct results after indexing

🤖 Generated with Claude Code

MPS SDPA crashes with "incompatible dimensions" when query and KV head counts differ (GQA), which affects both Qwen3-VL 8B and 2B models. Fall back to eager attention on MPS. Also disable a spurious transformers >=5.3 assertion that incorrectly reports video feature/token count mismatches on MPS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Sulaiman Ghori <sulaiman.ghori@outlook.com>

ssrajadh merged commit 1346450 into ssrajadh:master Apr 12, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MPS crash on Apple Silicon with grouped-query attention#39

Fix MPS crash on Apple Silicon with grouped-query attention#39
ssrajadh merged 1 commit intossrajadh:masterfrom
skg-perch:fix/mps-gqa-attention-crash

skg-perch commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

skg-perch commented Apr 11, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants