[fix](Qwen3-30b-A3B): fix nonsense output by correcting RoPE write-back by yyj6666667 · Pull Request #31 · kvcache-ai/sglang

yyj6666667 · 2026-04-15T14:01:11Z

Motivation

The fused_rope_store kernel wrote rotated k only to k_cache, leaving the caller's k tensor unrotated. Prefill attention ran with rotated q but unrotated k on the first forward of each request, producing gibberish output on Qwen3-30B-A3B.

Modifications

In the fused_rope_store kernel, always store the rotated tensor back in place (q → q, k → k), in addition to writing
rotated k into k_cache.

Tests

Verified on Qwen3-30B-A3B: prefill outputs are nornal after the fix; prior gibberish reproducer no longer triggers.

- Restore in-place storage for q/k and ensure k_cache is updated with the rotated values.

gemini-code-assist

Code Review

This pull request modifies the fused_rope_store_kernel to ensure that rotated values are consistently written back to the input tensor for both query and key inputs, while also updating the k_cache. The review feedback identifies a critical missing update in the Python wrapper's mutates_args for the k tensor, which is necessary for correct dependency tracking and graph capture. Additionally, there is a suggestion to reuse existing pointers for the RoPE dimension offsets to improve code readability and maintain consistency with other kernel implementations.

gemini-code-assist · 2026-04-15T14:03:24Z

+      store_as<InputStorage>(input, input_vec_x, lane_id);
+      const auto input_y_out = pointer::offset(input, (kRopeDim / 2) * sizeof(DType));
+      store_as<InputStorage>(input_y_out, input_vec_y, lane_id);


You can reuse the existing input_x and input_y pointers (defined at lines 185-186) instead of recalculating the offset for the second half of the RoPE dimension. This improves readability and maintains consistency with the fused_rope_kernel implementation in the same file.

store_as<InputStorage>(input_x, input_vec_x, lane_id); store_as<InputStorage>(input_y, input_vec_y, lane_id);

Copilot

Pull request overview

Fixes incorrect RoPE application in the fused RoPE+KV-cache store path for Qwen3-30B-A3B by ensuring rotated keys are written back to the caller’s k tensor (in addition to k_cache), preventing prefill attention from mixing rotated q with unrotated k.

Changes:

Update fused_rope_store_kernel to always write rotated outputs back in-place to the input tensor for both q and k.
Preserve existing behavior of also writing rotated k into k_cache for decode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T14:04:05Z

+      // Always write rotated values back to `input` (q itself when
+      // load_q==true, k itself when load_q==false). When processing k, also
+      // write to k_cache for future decode.
+      store_as<InputStorage>(input, input_vec_x, lane_id);
+      const auto input_y_out = pointer::offset(input, (kRopeDim / 2) * sizeof(DType));
+      store_as<InputStorage>(input_y_out, input_vec_y, lane_id);
+      if (!load_q) {
+        const auto k_out = pointer::offset(k_cache, loc * cache_stride_bytes, head_id * head_stride_bytes);
+        store_as<InputStorage>(k_out, input_vec_x, lane_id);


This kernel now writes rotated values back into the k input buffer when load_q == false. The Python custom-op wrapper for apply_rope_inplace_with_kvcache currently does not declare k in mutates_args, which can lead to incorrect behavior under torch.compile/graph captures because the op will be treated as not mutating k. Please update the wrapper registration to include k as a mutated argument to match this new in-place writeback.

Copilot · 2026-04-15T14:04:05Z

+      // Always write rotated value back to `input`; also to k_cache when processing k.
+      store_as<InputStorage>(input, input_vec, lane_id);
+      if (!load_q) {
+        const auto k_out = pointer::offset(k_cache, loc * cache_stride_bytes, head_id * head_stride_bytes);
+        store_as<InputStorage>(k_out, input_vec, lane_id);
+      }


The existing fused RoPE+KV-cache tests validate q and k_cache, but they don’t assert that the in-place k tensor is rotated. Since this change makes k writeback part of the contract, please extend the test to also compare the post-call k against the reference rotated k to prevent regressions.

[fix](Qwen3-30b-A3B): fix nonsense output by correcting RoPE write-back

a7e2d24

- Restore in-place storage for q/k and ensure k_cache is updated with the rotated values.

yyj6666667 requested review from ErvinXie and Copilot April 15, 2026 14:01

yyj6666667 requested a review from yuan-luo as a code owner April 15, 2026 14:01

Copilot started reviewing on behalf of yyj6666667 April 15, 2026 14:01 View session

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

Copilot AI reviewed Apr 15, 2026

View reviewed changes

kvcache-ai deleted a comment from gemini-code-assist Bot Apr 15, 2026

yyj6666667 mentioned this pull request Apr 23, 2026

KTransformer 0.5.3镜像的大模型幻觉严重 kvcache-ai/ktransformers#1940

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](Qwen3-30b-A3B): fix nonsense output by correcting RoPE write-back#31

[fix](Qwen3-30b-A3B): fix nonsense output by correcting RoPE write-back#31
yyj6666667 wants to merge 1 commit intokvcache-ai:mainfrom
yyj6666667:fix/qwen3-30b

yyj6666667 commented Apr 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yyj6666667 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yyj6666667 commented Apr 15, 2026 •

edited

Loading