Describe the bug
The GPU Memory Budget subsection of VRAM Estimation appears to be off by roughly an order of magnitude. I didn't see anything obvious in the yaml file for the recipe that could cause the issue.
To Reproduce
Run sparkrun run @experimental/qwen3.5-397b-a17b-int4-autoround-2x-vllm
Diagnostics
{14:24}|spark@spark:~ ➭ sparkrun run @experimental/qwen3.5-397b-a17b-int4-autoround-2x-vllm
sparkrun v0.2.25
Runtime: vllm-distributed
Image: ghcr.io/spark-arena/dgx-vllm-eugr-nightly-tf5:latest
Model: Intel/Qwen3.5-397B-A17B-int4-AutoRound
Mode: cluster (2 nodes)
VRAM Estimation:
Model dtype: int4
KV cache dtype: bfloat16
Architecture: 60 layers, 2 KV heads, 256 head_dim
Model weights: 210.78 GB
KV cache: 30.00 GB (max_model_len=262,144)
Tensor parallel: 2
Per-GPU total: 120.39 GB
DGX Spark fit: YES
GPU Memory Budget:
gpu_memory_utilization: 11200%
Usable GPU memory: 13552.0 GB (121 GB x 11200%)
Available for KV: 13446.6 GB
Max context tokens: 234,996,514
Context multiplier: 896.4x (vs max_model_len=262,144)
Hosts: default cluster 'default'
Head: 127.0.0.1
Workers: 192.168.1.116
Local/Remote
- I am running sparkrun from the park head node.
Additional context
N/A
Suggested Fix
Unsure
Describe the bug
The GPU Memory Budget subsection of VRAM Estimation appears to be off by roughly an order of magnitude. I didn't see anything obvious in the yaml file for the recipe that could cause the issue.
To Reproduce
Run
sparkrun run @experimental/qwen3.5-397b-a17b-int4-autoround-2x-vllmDiagnostics
Local/Remote
Additional context
N/A
Suggested Fix
Unsure