fix: use actual tensor embedding dimension instead of model parameter #18745
+8
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replace hardcoded n_embd with n_embd_tensor in embedding extraction to handle cases where tensor dimensions differ from model parameters.
This issue was only triggered when the pool type was changed. This fixes #18737.
I have tested this fix extensively with
Qwen3-VL-Embedding-8bwith pooling set tolast. Of course, previously it didn't work at all so there was nothing to compare to.So I also tested extensively with
llama-embed-nemotron-8bwith pooling set tomeanboth with and without my change. The returned embeddings matched 100%. (They matched when genned on the same backend, CPU - CPU and Vulkan - Vulkan, they interestingly did NOT match CPU - Vulkan, but I think that must have been the case previously? Unfortunately I somehow broke my ROCM install so I was unable to retest HIP).Please let me know if there are additional tests I can run or if I can provide my exact GGUFs. Will try to retest HIP if I can get it working again, but the issue was reproducible solely with the CPU backend.