Skip to content

GPU memory budget calculation doesn't work with gpu-mem-util-gb mod #136

@amckenna

Description

@amckenna

Describe the bug
The GPU Memory Budget subsection of VRAM Estimation appears to be off by roughly an order of magnitude. I didn't see anything obvious in the yaml file for the recipe that could cause the issue.

To Reproduce
Run sparkrun run @experimental/qwen3.5-397b-a17b-int4-autoround-2x-vllm

Diagnostics

{14:24}|spark@spark:~ ➭ sparkrun run @experimental/qwen3.5-397b-a17b-int4-autoround-2x-vllm
sparkrun v0.2.25

Runtime:   vllm-distributed
Image:     ghcr.io/spark-arena/dgx-vllm-eugr-nightly-tf5:latest
Model:     Intel/Qwen3.5-397B-A17B-int4-AutoRound
Mode:      cluster (2 nodes)

VRAM Estimation:
  Model dtype:      int4
  KV cache dtype:   bfloat16
  Architecture:     60 layers, 2 KV heads, 256 head_dim
  Model weights:    210.78 GB
  KV cache:         30.00 GB (max_model_len=262,144)
  Tensor parallel:  2
  Per-GPU total:    120.39 GB
  DGX Spark fit:    YES

  GPU Memory Budget:
    gpu_memory_utilization: 11200%
    Usable GPU memory:     13552.0 GB (121 GB x 11200%)
    Available for KV:      13446.6 GB
    Max context tokens:    234,996,514
    Context multiplier:    896.4x (vs max_model_len=262,144)

Hosts:     default cluster 'default'
  Head:    127.0.0.1
  Workers: 192.168.1.116

Local/Remote

  • I am running sparkrun from the park head node.

Additional context
N/A

Suggested Fix
Unsure

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions