Skip to content

feat[QVAC-20402]: add M-RoPE sliding context support#2438

Merged
jpgaribotti merged 38 commits into
tetherto:mainfrom
gianni-cor:cursor/llamacpp-direct-slide-repro-js
Jun 9, 2026
Merged

feat[QVAC-20402]: add M-RoPE sliding context support#2438
jpgaribotti merged 38 commits into
tetherto:mainfrom
gianni-cor:cursor/llamacpp-direct-slide-repro-js

Conversation

@gianni-cor

@gianni-cor gianni-cor commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • Adds sliding-context support for Qwen3.5/Qwen-VL style M-RoPE/iM-RoPE models in llm-llamacpp.
  • Multimodal turns can consume a different number of KV-cache cells than decoder positions, so the addon needs to track both before deciding when and how to slide.
  • Sliding needs to preserve protected prompt/tool prefixes, image-token recall, cache save/load behavior, and quantized K-cache operation under context pressure.
  • Depends on Fabric M-RoPE/iM-RoPE K-shift support in feat: support M-RoPE K-shift qvac-fabric-llm.cpp#146.

📝 How does it solve it?

  • Introduces explicit context-usage tracking for multimodal contexts, separating logical decoder positions from physical KV-cache cell usage.
  • Extends context sliding with bounded full-wipe and tail-preserving fallback behavior while preserving the configured discard budget.
  • Adds a dedicated ContextSlideFailed path for native memory-operation failures instead of treating them as ordinary context overflow.
  • Stores and restores cache metadata for both logical positions and KV-cache usage so shifted multimodal sessions can be saved and loaded correctly.
  • Pins the local qvac-fabric overlay to the Fabric branch containing M-RoPE/iM-RoPE and quantized K-cache K-shift support.

🧪 How was it tested?

  • npx standard test/integration/mrope-sliding-context.test.js
  • bare test/integration/mrope-sliding-context.test.js --exit
  • Added integration coverage for Qwen3.5 text sliding, tool-compaction pressure, multimodal image recall after sliding save/load, quantized K-cache sliding, and Llama RoPE baseline sliding.

🔌 API Changes

  • Adds ContextSlideFailed so callers can distinguish failed KV memory operations from context overflow.

@gianni-cor gianni-cor requested review from a team as code owners June 4, 2026 09:24
@gianni-cor gianni-cor added the verified Authorize secrets / label-gate in PR workflows label Jun 4, 2026
gianni-cor and others added 20 commits June 9, 2026 11:35
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          66                21    68.18%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%        1009               506    49.85%

@jpgaribotti

Copy link
Copy Markdown
Contributor

/review

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          66                21    68.18%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%        1009               506    49.85%

@jpgaribotti

Copy link
Copy Markdown
Contributor

/review

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧪 C++ Test Coverage Report

Coverage:

📊 Detailed Coverage
Filename                         Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
NmtLazyInitializeBackend.cpp          99                20    79.80%          11                 1    90.91%         157                36    77.07%          66                21    68.18%
NmtLazyInitializeBackend.hpp           2                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
TranslationModel.cpp                 296               168    43.24%          28                 8    71.43%         506               213    57.91%         181               122    32.60%
TranslationModel.hpp                   1                 0   100.00%           1                 0   100.00%           1                 0   100.00%           0                 0         -
nmt.cpp                               72                22    69.44%           9                 1    88.89%         137                28    79.56%          44                16    63.64%
nmt.hpp                               51                 4    92.16%          11                 2    81.82%          53                 4    92.45%          28                 0   100.00%
nmt_beam_search.cpp                  116                25    78.45%          10                 3    70.00%         254                32    87.40%          76                19    75.00%
nmt_graph_decoder.cpp                164                78    52.44%          15                 7    53.33%         540               161    70.19%         112                69    38.39%
nmt_graph_encoder.cpp                 54                13    75.93%           3                 0   100.00%         268                33    87.69%          37                16    56.76%
nmt_loader.cpp                       270                67    75.19%          14                 0   100.00%         774                97    87.47%         161                67    58.39%
nmt_state_backend.cpp                253                94    62.85%          21                 0   100.00%         489               128    73.82%         165                87    47.27%
nmt_tokenization.cpp                  88                21    76.14%           8                 0   100.00%         135                36    73.33%          61                26    57.38%
nmt_utils.cpp                        120                89    25.83%           8                 3    62.50%         180               134    25.56%          78                63    19.23%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                               1586               601    62.11%         140                25    82.14%        3495               902    74.19%        1009               506    49.85%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants