-
Notifications
You must be signed in to change notification settings - Fork 2
_get_amdgpu_kmd_version only working for DKMS amdgpu module? #6
Description
Hello everyone, and thanks for the work on the WheelNext initiative and this provider in particular. I know that this is work in progress, so feel free to ignore the issue if this is too soon to provide feedback.
I played a bit with the variant provider (with https://github.com/traversaro/variantlib-exps) on two systems, one a system with MI300X in which amdgpu kernel driver was installed via DKMS (the image is https://marketplace.digitalocean.com/apps/pytorch-rocm7). On that system, the KMD version is correctly detected by the function, and the value is coherent with the one reported by, see:
root@2-6-0---ROCm-7-0-gpu-mi300x1-192gb-devcloud-atl1:~/variantlib-exps# pixi run print-amd
✨ Pixi task (print-amd): python -c 'from amd_variant_provider.detect_rocm import get_system_info as amd_get_system_info; print(amd_get_system_info())' ⠁
{'kmd_version': KMDVersion(major=6, minor=14, patch=14), 'rocm_version': ROCmVersion(major=7, minor=0, patch=0), 'gfx_arch': ['gfx9', 'gfx942']}
root@2-6-0---ROCm-7-0-gpu-mi300x1-192gb-devcloud-atl1:~/variantlib-exps# rocm-smi --showdriverversion
============================ ROCm System Management Interface ============================
============================== Version of System Component ===============================
Driver version: 6.14.14
==========================================================================================
================================== End of ROCm SMI Log ===================================
instead, I also tried to run _get_amdgpu_kmd_version on a different system, a AMD Ryzen™ AI Max+ 395 . In that case, the amdgpu driver was built as part of the kernel itself, so the /sys/module/amdgpu/version file does not exist, even if the /sys/module/amdgpu/ folder exists. In that case, the _get_amdgpu_kmd_version function returns None, that is not coherent with the value return by rocm-smi --showdriverversion:
(rocm) root@c78c8dbcd428:/notebooks/variantlib-exps# pixi run print-amd
✨ Pixi task (print-amd): python -c 'from amd_variant_provider.detect_rocm import get_system_info as amd_get_system_info; print(amd_get_system_info())' ⠁
{'rocm_version': ROCmVersion(major=7, minor=0, patch=0), 'gfx_arch': ['gfx11', 'gfx1151'], 'kmd_version': None}
(rocm) root@c78c8dbcd428:/notebooks/variantlib-exps# rocm-smi --showdriverversion
============================ ROCm System Management Interface ============================
============================== Version of System Component ===============================
Driver version: 6.14.0-32-generic
==========================================================================================
================================== End of ROCm SMI Log ===================================
I am not an expert of AMD world, so this may be expected, but in doubt I preferred to report the inconsistency betweeh _get_amdgpu_kmd_version() and rocm-smi --showdriverversion.