Skip to content

Hex encoded tensors are field type sensitive #35611

@dainiusjocas

Description

@dainiusjocas

Describe the bug
When feeding tensors with vanilla floating point numbers a Vespa field type can be either float or bfloat16: Vespa takes care of conversion.

But when feeding tensors as hex encoded values (to reduce network utilization), if floats are encoded, then they can't be fed into bfloat16 field. Feeding into bfloat16 field requires specific encoding to "fit" into the field.

To Reproduce
Notebook with reproduction in pyvespa:

fields=[
    Field(
        name="embedding_float",
        type="tensor<float>(d0[1])",
        indexing=["attribute", "summary"],
    ),
    Field(
        name="embedding_bfloat16",
        type="tensor<bfloat16>(d0[1])",
        indexing=["attribute", "summary"],
    )
]

Feeding:

echo '{"id": "3", "fields": {"embedding_float": "3F9E0610", "embedding_bfloat16": "3F9E0610"}}' | vespa feed -

Results in a error:

Error when feeding document 3: {'Exception': 'Index 1 out of bounds for length 1', 'id': '3', 'message': 'Exception during feed_data_point'}

Expected behavior
Vespa could either:

  • return a more informative exception.
  • convert the number representation to fit the field type.

Screenshots

Environment (please complete the following information):

  • Infrastructure: Vespa Cloud, AWS, self-hosted

Vespa version
8.625.17

Additional context
SPANN type of deployment.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions