Skip to content

feat: Add vector value type support to expression system#893

Open
AlexFilipImproving wants to merge 4 commits intovalkey-io:mainfrom
Bit-Quill:feature/vector-value
Open

feat: Add vector value type support to expression system#893
AlexFilipImproving wants to merge 4 commits intovalkey-io:mainfrom
Bit-Quill:feature/vector-value

Conversation

@AlexFilipImproving
Copy link
Collaborator

Overview

This PR adds native vector support to the Value type in valkey-search's expression system, enabling vector operations in FT.AGGREGATE expressions. This represents a significant improvement over Redis's limited vector handling, which treats vectors as opaque output-only values.

Motivation

Currently, Redis FT.AGGREGATE has minimal vector operation support:

  • String functions on vectors return nil
  • Arithmetic operations fail with type errors
  • No element access or vector-scalar operations

Valkey-Search now diverges from Redis by supporting intuitive vector operations that enable more powerful data transformations.

Key Features

1. Vector as First-Class Value Type

  • Added std::shared_ptr<std::vector<Value>> variant to the Value type
  • Efficient memory management through shared pointer semantics
  • Support for nested vectors (vectors of vectors)
  • Copy-on-write semantics for safe modifications

2. Vector-Scalar Operations

Apply scalar functions element-wise to vectors:

FT.AGGREGATE products "*"
  GROUPBY 1 @category
    REDUCE TOLIST 1 @name AS names
  APPLY "lower(@names)" AS lowercase_names

Supported functions: lower, upper, strlen, floor, ceil, abs, log, sqrt, and more.

3. Vector Arithmetic with Broadcasting

Perform arithmetic operations with automatic scalar broadcasting:

# Vector + scalar
APPLY "@prices + 10" AS adjusted_prices

# Vector + vector (element-wise)
APPLY "@prices1 + @prices2" AS combined_prices

Supported operations: +, -, *, /, ^

4. Vector-Specific Functions

  • vectorlen(@vec) - Get vector length
  • vectorat(@vec, index) - Access element at index
  • isvector(@val) - Check if value is a vector
  • makevector(...) - Create vector from elements
  • flatten(@vec, depth) - Flatten nested vectors

5. Comprehensive Error Handling

Clear, actionable error messages:

  • Type mismatch: "Type error: cannot add vector to string"
  • Length mismatch: "Length mismatch: vectors have lengths 3 and 5"
  • Element errors: "Element error at index 2: division by zero"
  • Index errors: "Index out of bounds: index 5, vector length 3"

Implementation Details

Architecture

  • Memory Model: shared_ptr<vector<Value>> for efficient copying and passing
  • Type Safety: Recursive Value structure supports heterogeneous and nested vectors
  • Performance: O(1) copy operations, O(n) element-wise operations
  • Compatibility: Fully backward compatible with existing scalar operations

Modified Components

  • src/expr/value.h / value.cc - Core Value type extension
  • src/expr/functions.cc - Vector operation support in all functions
  • src/expr/comparison.cc - Lexicographic vector comparison
  • src/expr/serialization.cc - RESP array serialization for vectors

Testing

  • Unit Tests: Comprehensive coverage of vector operations, edge cases, and error conditions
  • Integration Tests: FT.AGGREGATE with vector operations
  • Nested Vector Tests: Multi-level nesting validation
  • Comparison Tests: Lexicographic ordering verification

Examples

Element-wise String Operations

FT.AGGREGATE products "*"
  GROUPBY 1 @category
    REDUCE TOLIST 1 @name AS names
  APPLY "upper(@names)" AS uppercase_names

Vector Arithmetic

FT.AGGREGATE sales "*"
  GROUPBY 1 @product
    REDUCE TOLIST 1 @price AS prices
  APPLY "@prices * 1.1" AS prices_with_tax
  APPLY "@prices_with_tax - @prices" AS tax_amounts

Vector Metadata

FT.AGGREGATE data "*"
  GROUPBY 1 @id
    REDUCE TOLIST 1 @value AS values
  APPLY "vectorlen(@values)" AS count
  APPLY "vectorat(@values, 0)" AS first_value

Breaking Changes

None. This is a purely additive feature that maintains full backward compatibility with existing scalar operations.

Performance Considerations

  • Vector copies are O(1) due to shared_ptr semantics
  • Element-wise operations are O(n) where n is vector length
  • No performance degradation for existing scalar operations
  • Memory-efficient through shared ownership model

Related Issues

Addresses the need for better vector operation support in FT.AGGREGATE expressions, enabling more powerful data transformations than Redis currently provides.

This commit adds comprehensive vector value type support to the expression
evaluation system, enabling vectors to be used in FT.AGGREGATE operations.

Key changes:
- Implement AsVector() accessor method for Value class
- Add Vector variant to expr::Value with std::vector<double> storage
- Implement vector-specific functions (VECLEN, VECGET, VECSET, etc.)
- Add vector support to mathematical and string functions
- Implement vector serialization to RESP format
- Add comprehensive error handling for vector operations
- Add vector comparison operators and nested vector support
- Refactor Value::Nil to use std::string instead of const char*

Testing:
- Add comprehensive unit tests for vector operations
- Add nested vector tests
- Add vector comparison tests

This also includes upstream changes from main branch merged during
development, including text search optimizations, compatibility tests,
and various bug fixes.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
@allenss-amazon
Copy link
Member

I think what you call a vector should be called an Array. That's because in the context of vector search a "vector" field data type should be a scalar value when seen in the expression language. So we're really introducing two new datatypes: vector (a scalar consisting of some number of floating point numbers) and an array which is a one-dimensional list of Values. The broadcast pattern applies to Arrays in and out.

For vector types, we need some new functions that are distance functions (same set as the vector search functions).

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants