Skip to content

TurboQuant TQ4 KV cache compression for Qwen 3.5 MoE #472

TurboQuant TQ4 KV cache compression for Qwen 3.5 MoE

TurboQuant TQ4 KV cache compression for Qwen 3.5 MoE #472

Triggered via pull request April 4, 2026 23:54
Status Success
Total duration 1h 6m 58s
Artifacts 2

cuda-perf.yml

on: pull_request
set-parameters
12s
set-parameters
Matrix: export-models
Matrix: benchmark-cuda
upload-benchmark-results
38s
upload-benchmark-results
Fit to window
Zoom out
Zoom in

Annotations

4 warnings
set-parameters
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v3, actions/setup-python@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
export-models (google/gemma-3-4b-it, quantized-int4-tile-packed, google_gemma-3-4b-it, 50) / linux-job
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: ./test-infra/.github/actions/setup-ssh, actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683, actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02, nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482, pmeier/pytest-results-action@a2c1430e2bddadbad9f49a6f9b879f062c6b19b1. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
benchmark-cuda (google/gemma-3-4b-it, quantized-int4-tile-packed, google_gemma-3-4b-it, 50) / linux-job
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: ./test-infra/.github/actions/setup-ssh, actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683, actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093, actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02, nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482, pmeier/pytest-results-action@a2c1430e2bddadbad9f49a6f9b879f062c6b19b1. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
upload-benchmark-results
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v3, actions/download-artifact@v4, actions/setup-python@v4, astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb, aws-actions/configure-aws-credentials@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Artifacts

Produced during runtime
Name Size Digest
model-google_gemma-3-4b-it-quantized-int4-tile-packed
3.4 GB
sha256:9b257b9421f5ceb87dbc33a4a3ee2739c46aef29e12e7e4c180e041c92025cfe
results-google_gemma-3-4b-it-quantized-int4-tile-packed
1.69 KB
sha256:7244da145092cc85cb0d6e7c059682fc5ce51fea0805fb80904a65e4694ec4a3