UPSTREAM PR #18749: Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support #884

loci-dev · 2026-01-11T08:40:29Z

I tuned this on AMD Radeon 8060S, but a brief test also showed improvements on AMD RX 9060 XT.

model	size	params	ngl	test	t/s (ROCm)	t/s (before)	t/s (after)	diff
llama 8B Q4_0	4.33 GiB	8.03 B	99	pp512	815.22 ± 16.88	644.84 ± 4.49	875.24 ± 17.99	+35.7%
llama 8B Q4_0	4.33 GiB	8.03 B	99	pp512 @ d8192	321.66 ± 3.02	384.27 ± 0.29	463.72 ± 1.32	+20.7%
llama 8B Q4_K - Small	4.36 GiB	8.03 B	99	pp512	995.81 ± 32.32	529.50 ± 13.07	793.51 ± 9.67	+49.9%
llama 8B Q4_K - Small	4.36 GiB	8.03 B	99	pp512 @ d8192	352.08 ± 3.17	341.86 ± 1.96	435.55 ± 2.63	+27.4%
llama 8B Q8_0	7.95 GiB	8.03 B	99	pp512	755.86 ± 21.11	422.37 ± 21.61	742.46 ± 15.49	+75.8%
llama 8B Q8_0	7.95 GiB	8.03 B	99	pp512 @ d8192	317.44 ± 2.31	306.97 ± 0.36	419.24 ± 4.59	+36.6%

model	size	params	ngl	test	t/s (ROCm)	t/s (before)	t/s (after)	diff
llama 8B Q4_K - Small	4.36 GiB	8.03 B	99	pp512	648.41 ± 14.37	1437.13 ± 1.77	1902.23 ± 1.65	+32.4%
llama 8B Q4_K - Small	4.36 GiB	8.03 B	99	pp512 @ d8192	410.01 ± 6.22	757.59 ± 2.71	841.76 ± 3.02	+11.1%

loci-review · 2026-01-11T09:24:40Z

Explore the complete analysis inside the Version Insights

Perfect! I've generated a summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #884

Performance Impact: ✅ MINIMAL

The analysis comparing the base version (1694ff01-eeb5-11f0-a055-c529586b3e1a) to the target version (b8594261-eec9-11f0-a055-c529586b3e1a) shows:

No significant response time changes (>2% threshold)
No significant throughput changes (>2% threshold)

Conclusion

This pull request is performance-neutral, meaning:

✅ No performance regressions detected
✅ Code changes maintain performance stability
✅ Safe to proceed without performance concerns

The PR can move forward without requiring any performance optimization work.

0cc4m added 3 commits January 10, 2026 23:08

vulkan: Enable and optimize large matmul parameter combination for AMD

bf2ee19

limit tuning to AMD GPUs with coopmat support

1fec364

use tx_m values instead of _l

e1a33aa

loci-dev temporarily deployed to PROD__AL_DEMO January 11, 2026 08:40 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 25 times, most recently from e8cc137 to 080a3ae Compare January 14, 2026 19:08

loci-dev force-pushed the main branch 28 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 2 times, most recently from 0cb533b to ef7afbe Compare February 13, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18749: Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support #884

UPSTREAM PR #18749: Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support #884

Uh oh!

loci-dev commented Jan 11, 2026

Uh oh!

loci-review bot commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UPSTREAM PR #18749: Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support #884

Are you sure you want to change the base?

UPSTREAM PR #18749: Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support #884

Uh oh!

Conversation

loci-dev commented Jan 11, 2026

Uh oh!

loci-review bot commented Jan 11, 2026

Summary Report for llama.cpp PR #884

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants