Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18749

I tuned this on AMD Radeon 8060S, but a brief test also showed improvements on AMD RX 9060 XT.

ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

model size params ngl test t/s (ROCm) t/s (before) t/s (after) diff
llama 8B Q4_0 4.33 GiB 8.03 B 99 pp512 815.22 ± 16.88 644.84 ± 4.49 875.24 ± 17.99 +35.7%
llama 8B Q4_0 4.33 GiB 8.03 B 99 pp512 @ d8192 321.66 ± 3.02 384.27 ± 0.29 463.72 ± 1.32 +20.7%
llama 8B Q4_K - Small 4.36 GiB 8.03 B 99 pp512 995.81 ± 32.32 529.50 ± 13.07 793.51 ± 9.67 +49.9%
llama 8B Q4_K - Small 4.36 GiB 8.03 B 99 pp512 @ d8192 352.08 ± 3.17 341.86 ± 1.96 435.55 ± 2.63 +27.4%
llama 8B Q8_0 7.95 GiB 8.03 B 99 pp512 755.86 ± 21.11 422.37 ± 21.61 742.46 ± 15.49 +75.8%
llama 8B Q8_0 7.95 GiB 8.03 B 99 pp512 @ d8192 317.44 ± 2.31 306.97 ± 0.36 419.24 ± 4.59 +36.6%

ggml_vulkan: 0 = AMD Radeon RX 9060 XT (RADV GFX1200) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat

model size params ngl test t/s (ROCm) t/s (before) t/s (after) diff
llama 8B Q4_K - Small 4.36 GiB 8.03 B 99 pp512 648.41 ± 14.37 1437.13 ± 1.77 1902.23 ± 1.65 +32.4%
llama 8B Q4_K - Small 4.36 GiB 8.03 B 99 pp512 @ d8192 410.01 ± 6.22 757.59 ± 2.71 841.76 ± 3.02 +11.1%

@loci-review
Copy link

loci-review bot commented Jan 11, 2026

Explore the complete analysis inside the Version Insights

Perfect! I've generated a summary report for your project. Here are the key findings:

Summary Report for llama.cpp PR #884

Performance Impact: ✅ MINIMAL

The analysis comparing the base version (1694ff01-eeb5-11f0-a055-c529586b3e1a) to the target version (b8594261-eec9-11f0-a055-c529586b3e1a) shows:

  • No significant response time changes (>2% threshold)
  • No significant throughput changes (>2% threshold)

Conclusion

This pull request is performance-neutral, meaning:

  • ✅ No performance regressions detected
  • ✅ Code changes maintain performance stability
  • ✅ Safe to proceed without performance concerns

The PR can move forward without requiring any performance optimization work.

@loci-dev loci-dev force-pushed the main branch 25 times, most recently from e8cc137 to 080a3ae Compare January 14, 2026 19:08
@loci-dev loci-dev force-pushed the main branch 28 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32
@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 0cb533b to ef7afbe Compare February 13, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants