UPSTREAM PR #19317: cleanup `llama-quantize --help` output by loci-dev · Pull Request #1158 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-08T02:18:24Z

Note

Source pull request: ggml-org/llama.cpp#19317

More pleasant formatting for the output of llama-quantize --help.

Before this PR:

After this PR:

(this image was previously wrong, updated it to match the current code)

some much needed TLC

oops, spoiler

loci-review · 2026-02-08T04:11:58Z

Overview

Analysis of 115,630 functions across 47 commits reveals minimal performance impact. Only 10 functions modified (0.009%), all within the llama-quantize offline utility tool. No performance-critical inference pathways were affected.

Power Consumption Changes:

build.bin.llama-quantize: -0.11% (-48.02 nJ)
build.bin.llama-cvector-generator: 0.00%
build.bin.libmtmd.so: -0.00%
build.bin.llama-tts: 0.00%
build.bin.libllama.so: 0.00%
build.bin.llama-tokenize: 0.00%
build.bin.llama-qwen2vl-cli: 0.00%
build.bin.llama-gguf-split: 0.00%
build.bin.llama-llava-cli: 0.00%
build.bin.llama-minicpmv-cli: 0.00%
build.bin.llama-gemma3-cli: 0.00%
build.bin.libggml.so: 0.00%
build.bin.libggml-cpu.so: 0.00%
build.bin.libggml-base.so: 0.00%
build.bin.llama-bench: 0.00%

Function Analysis

Significant Improvements (Compiler Optimizations):

std::vector<common_adapter_lora_info>::end(): Response time -52.67% (-90.68ns), throughput time -60.27% (-90.68ns)
__val_comp_iter token comparator: Response time -49.49% (-117.39ns), throughput time -57.91% (-117.39ns)
regex_traits::operator|: Response time -24.95% (-46.79ns), throughput time -29.75% (-46.79ns)
std::vector<unsigned long>::end(): Response time -18.06% (-17.96ns), throughput time -23.10% (-17.96ns)

These improvements occurred without source code changes, indicating toolchain optimization benefits.

Intentional Regression:

usage() help text function: Response time +23.09% (+258.97ns), throughput time +35.95% (+95.70ns). Commit e34fe51 doubled printf calls (22→43) for improved help text formatting. Zero practical impact as function only executes on --help requests.

Minor Regression:

iterator::operator- for kv_override: Response time +27.30% (+30.87ns), throughput time +33.78% (+33.78ns). Caused by sanitizer instrumentation from build system refactoring (commit 423bee4), affecting only debug builds.

Other analyzed functions showed improvements under 7% with negligible absolute changes.

Additional Findings

Extensive GPU backend work (30+ commits across Metal, Vulkan, CUDA) delivered Flash Attention optimizations, kernel consolidation, and bug fixes without impacting analyzed functions. This confirms proper architectural separation between GPU compute kernels and CPU-side utilities. All modified functions are in non-critical paths; core inference libraries (libllama.so, libggml-cpu.so) show zero power consumption change, confirming stability of performance-critical components.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

ddh0 and others added 5 commits February 3, 2026 22:36

cleanup llama-quantize --help output

e34fe51

some much needed TLC

remove future argument

51385ad

oops, spoiler

cleanup of cleanup

8001bb3

Merge branch 'ggml-org:master' into llama-quantize-help-cleanup

9c28f4c

Merge branch 'ggml-org:master' into llama-quantize-help-cleanup

3007fa4

loci-dev temporarily deployed to PROD__AL_DEMO February 8, 2026 02:18 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 3 times, most recently from ef7afbe to d4c3480 Compare February 14, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19317: cleanup `llama-quantize --help` output#1158

UPSTREAM PR #19317: cleanup `llama-quantize --help` output#1158
loci-dev wants to merge 5 commits intomainfrom
loci/pr-19317-llama-quantize-help-cleanup

loci-dev commented Feb 8, 2026

Uh oh!

loci-review bot commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 8, 2026

Uh oh!

loci-review bot commented Feb 8, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants