-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support #18749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There are some small but consistent improvements on RDNA3 as well with this.
|
|
Thank you for testing it! I was hoping for more, I guess it is too different from 8060S. There's probably some more tuning for RDNA3/4 dGPUs that can be done, but I don't have the hardware for that. |
|
You should try this for mul_mat_id as well! |
|
9070xt
|
I tried, but didn't find a good parameter set yet. I'll keep trying. @characharm Is that Windows or Linux? |
|
@0cc4m Windows. I rebooted between tests for accuracy. The numbers are stable. |
|
Thank you for testing, I wish the drivers would behave more similarly. I'll disable the change on Windows. |
|
Can you also test a dense model, though? Those should be more affected than MoE. |
|
|
@characharm Please check if #18763 restores your performance. |
|
No, the performance is the same. I compared it with the CI build, so the problem is not in my build. |
|
I think these tile sizes are also used for matmul id in some cases, so that could explain the effect on gpt-oss. |
|
@characharm Sorry, I missed disabling the large tile size. Try again, please.
No, I didn't enable the large tile for mul_mat_id, so unless the check somewhere is wrong, it should not be used at all. |
|
Not sure whether this helps, but Vulkan performance on RX 7600 has been going down. I don't use such a big model on this GPU, but interesting that it has degraded.
build: 7d77f07 (7108)
build: c9ced49 (7710) |
|
Can you test a model that actually fits into your GPU? That would likely give more usable data. |
Luckily, I had copied the results from an old build. Yeah, even for smaller models, there has been degradation. Not sure whether kernel upgrade played a role. Kernel: 6.6 (don't remember the exact patch)
build: dd5e8ca (6916) Kernel: 6.17.9-200
build: c9ced49 (7710) |
|
Can you add more information about your setup? What OS, what driver, what does your device info string say, etc? |
…gml-org#18749) * vulkan: Enable and optimize large matmul parameter combination for AMD * limit tuning to AMD GPUs with coopmat support * use tx_m values instead of _l
OS: Fedora 42 |
|
My guess would be that your driver is too old, for good Mesa coopmat performance you usually want 25.3 or higher. But I didn't want to cause an issue for older versions. |
25.1.9 is the latest. No updates are available for Fedora 42. |
|
25.1.9, despite being the newest for Fedora 42, is not good enough. If upgrade to Fedora 43 is not an option for you at the moment, you could try a newer mesa build from the che/mesa COPR repo. For example a merge request providing a significant PP speed improvement was merged into Mesa repo in August and is available since release 25.2.x or 25.3.x (not sure here). |
…gml-org#18749) * vulkan: Enable and optimize large matmul parameter combination for AMD * limit tuning to AMD GPUs with coopmat support * use tx_m values instead of _l
I tuned this on AMD Radeon 8060S, but a brief test also showed improvements on AMD RX 9060 XT.
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 0 = AMD Radeon RX 9060 XT (RADV GFX1200) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat