Skip to content

Only 8 of 64 GPUs Are Fully Partitioned and Usable in Docker After CPX/NPS4 #100

@Bihan

Description

@Bihan

Summary
After setting compute partition to CPX and memory partition to NPS4, only 8 GPUs (indices 0, 8, 16, 24, 32, 40, 48, 56) show valid COMPUTE_PARTITION: CPX and MEMORY_PARTITION: NPS4. Also, these are the only devices attachable via Docker’s --device option.

How to Reproduce
Run

sudo amd-smi set --memory-partition NPS4
amd-smi static --partition

Actual Behavior
Only 8 GPUs (indices 0, 8, 16, 24, 32, 40, 48, 56) show the correct partitions. The rest show:

COMPUTE_PARTITION: N/A
MEMORY_PARTITION: N/A

Actual Output: logs.txt

Expected Behavior
All 64 GPUs should show:

COMPUTE_PARTITION: CPX
MEMORY_PARTITION: NPS4

System Info
Dell PowerEdge XE9680 (MI300X)
CPU: 2 x Intel Xeon Platinum 8462Y+: 32c @ 2.8 GHz
RAM: 2.0 TiB NVMe: 124 TB
GPUs: 8 x AMD MI300X

Kernel: Linux 5.15.0-142-generic
ROCm version: 6.4.1
AMDSMI Tool: 25.4.2+aca1101
AMDSMI Library: 25.4.0
amdgpu version: 6.12.12
VBIOS: AMD MI300X_HW_SRIOV_CVS_1VF (Version: 022.040.003.043.000001, Date: 2025/02/18)
OS: Ubuntu 22.04.5 LTS

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions