Skip to content

Conversation

@chrismuzyn
Copy link

Replace hardcoded n_embd with n_embd_tensor in embedding extraction to handle cases where tensor dimensions differ from model parameters.

This issue was only triggered when the pool type was changed. This fixes #18737.

I have tested this fix extensively with Qwen3-VL-Embedding-8b with pooling set to last. Of course, previously it didn't work at all so there was nothing to compare to.

So I also tested extensively with llama-embed-nemotron-8b with pooling set to mean both with and without my change. The returned embeddings matched 100%. (They matched when genned on the same backend, CPU - CPU and Vulkan - Vulkan, they interestingly did NOT match CPU - Vulkan, but I think that must have been the case previously? Unfortunately I somehow broke my ROCM install so I was unable to retest HIP).

Please let me know if there are additional tests I can run or if I can provide my exact GGUFs. Will try to retest HIP if I can get it working again, but the issue was reproducible solely with the CPU backend.

Replace hardcoded n_embd with n_embd_tensor in embedding extraction to handle cases where tensor dimensions differ from model parameters.
@chrismuzyn chrismuzyn requested a review from ggerganov as a code owner January 10, 2026 22:15
@ngxson
Copy link
Collaborator

ngxson commented Jan 10, 2026

we recently added n_embd_out for this exact purpose, it should be used instead.

consider adding a GGUF metadata for it. the original Qwen3-VL-Embedding model is also missing 1_Pooling, I don't think it's actually ready to be used unless Qwen team fixed it (I already reached out to them, but got no responses)

@vincenthawke
Copy link

vincenthawke commented Feb 11, 2026

we recently added n_embd_out for this exact purpose, it should be used instead.

I am guessing the windows builds don't include that update. Don't see it after running llama-server --help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Tensor Out of Bounds in ggml_backend_tensor_get_async using Multi Modal Embedding with Pooling

3 participants