fix: use actual tensor embedding dimension instead of model parameter #18745

chrismuzyn · 2026-01-10T22:15:57Z

Replace hardcoded n_embd with n_embd_tensor in embedding extraction to handle cases where tensor dimensions differ from model parameters.

This issue was only triggered when the pool type was changed. This fixes #18737.

I have tested this fix extensively with Qwen3-VL-Embedding-8b with pooling set to last. Of course, previously it didn't work at all so there was nothing to compare to.

So I also tested extensively with llama-embed-nemotron-8b with pooling set to mean both with and without my change. The returned embeddings matched 100%. (They matched when genned on the same backend, CPU - CPU and Vulkan - Vulkan, they interestingly did NOT match CPU - Vulkan, but I think that must have been the case previously? Unfortunately I somehow broke my ROCM install so I was unable to retest HIP).

Please let me know if there are additional tests I can run or if I can provide my exact GGUFs. Will try to retest HIP if I can get it working again, but the issue was reproducible solely with the CPU backend.

Replace hardcoded n_embd with n_embd_tensor in embedding extraction to handle cases where tensor dimensions differ from model parameters.

ngxson · 2026-01-10T22:41:09Z

we recently added n_embd_out for this exact purpose, it should be used instead.

consider adding a GGUF metadata for it. the original Qwen3-VL-Embedding model is also missing 1_Pooling, I don't think it's actually ready to be used unless Qwen team fixed it (I already reached out to them, but got no responses)

vincenthawke · 2026-02-11T18:16:42Z

we recently added n_embd_out for this exact purpose, it should be used instead.

I am guessing the windows builds don't include that update. Don't see it after running llama-server --help

fix: use actual tensor embedding dimension instead of model parameter

d7ec22f

Replace hardcoded n_embd with n_embd_tensor in embedding extraction to handle cases where tensor dimensions differ from model parameters.

chrismuzyn requested a review from ggerganov as a code owner January 10, 2026 22:15

loci-dev mentioned this pull request Jan 10, 2026

UPSTREAM PR #18745: fix: use actual tensor embedding dimension instead of model parameter auroralabs-loci/llama.cpp#882

Open

vincenthawke mentioned this pull request Feb 11, 2026

Misc. bug: llama-server ignores my image data when trying to run embedding with Qwen3-VL-Embedding-2B #19525

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use actual tensor embedding dimension instead of model parameter #18745

fix: use actual tensor embedding dimension instead of model parameter #18745

chrismuzyn commented Jan 10, 2026

Uh oh!

ngxson commented Jan 10, 2026

Uh oh!

vincenthawke commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: use actual tensor embedding dimension instead of model parameter #18745

Are you sure you want to change the base?

fix: use actual tensor embedding dimension instead of model parameter #18745

Conversation

chrismuzyn commented Jan 10, 2026

Uh oh!

ngxson commented Jan 10, 2026

Uh oh!

vincenthawke commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vincenthawke commented Feb 11, 2026 •

edited

Loading