-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Summary
After setting compute partition to CPX and memory partition to NPS4, only 8 GPUs (indices 0, 8, 16, 24, 32, 40, 48, 56) show valid COMPUTE_PARTITION: CPX and MEMORY_PARTITION: NPS4. Also, these are the only devices attachable via Docker’s --device option.
How to Reproduce
Run
sudo amd-smi set --memory-partition NPS4
amd-smi static --partition
Actual Behavior
Only 8 GPUs (indices 0, 8, 16, 24, 32, 40, 48, 56) show the correct partitions. The rest show:
COMPUTE_PARTITION: N/A
MEMORY_PARTITION: N/A
Actual Output: logs.txt
Expected Behavior
All 64 GPUs should show:
COMPUTE_PARTITION: CPX
MEMORY_PARTITION: NPS4
System Info
Dell PowerEdge XE9680 (MI300X)
CPU: 2 x Intel Xeon Platinum 8462Y+: 32c @ 2.8 GHz
RAM: 2.0 TiB NVMe: 124 TB
GPUs: 8 x AMD MI300X
Kernel: Linux 5.15.0-142-generic
ROCm version: 6.4.1
AMDSMI Tool: 25.4.2+aca1101
AMDSMI Library: 25.4.0
amdgpu version: 6.12.12
VBIOS: AMD MI300X_HW_SRIOV_CVS_1VF (Version: 022.040.003.043.000001, Date: 2025/02/18)
OS: Ubuntu 22.04.5 LTS