Skip to content

Refactor FT.AGGREGATE reducers to process records in batch#816

Open
AlexFilipImproving wants to merge 7 commits intovalkey-io:mainfrom
Bit-Quill:feature/vector-reducer-functions
Open

Refactor FT.AGGREGATE reducers to process records in batch#816
AlexFilipImproving wants to merge 7 commits intovalkey-io:mainfrom
Bit-Quill:feature/vector-reducer-functions

Conversation

@AlexFilipImproving
Copy link
Collaborator

Changed the ProcessRecords interface to accept all records at once instead of processing them one at a time. This enables more efficient implementations, particularly for COUNT which now simply returns the input size rather than incrementing a counter for each record.

Changes:

  • Updated ReducerInstance::ProcessRecords to accept const std::vector<absl::InlinedVector<expr::Value, 4>>& all_values
  • Modified GroupBy::Execute to collect all values per group before calling ProcessRecords once
  • Updated all reducer implementations (Count, Min, Max, Sum, Avg, Stddev, CountDistinct) to process batched records

@AlexFilipImproving AlexFilipImproving force-pushed the feature/vector-reducer-functions branch from fd077f8 to cc9c9c2 Compare February 24, 2026 22:34
@AlexFilipImproving
Copy link
Collaborator Author

I also added a cleanup of the sortby_parameter added in my previous PR

@AlexFilipImproving
Copy link
Collaborator Author

I made the requested changes, but it looks like some of the compatibility tests are failing for reasons outside of this change. For example, these look like they have to do with the implementation of DIALECT 2

>>>>>>>>>>> ERROR FAILURE <<<<<<<<<<<<<<
>>>>>> Starting Test test_bad_numeric_data[json-2] So Far: Correct:0 Wrong:1 <<<<<<<<<
CMD: b'ft.search json_idx1 @n1:[-inf inf] DIALECT 2'
CMD:['ft.search', 'json_idx1', '@n1:[-inf inf]', 'DIALECT', '2']
Mismatched sizes RL:2 VK:3
--RL--
{'__key': b'json:0', '$': b'{"n1":0,"n2":0,"n3":0,"t1":"","t2":"","t3":"","v1":[0,0,0],"e1":1,"e2":"two"}'}
{'__key': b'json:4', '$': b'{"n1":0,"n2":0,"n3":0,"t2":"","t3":"","v1":[4,4,4],"e1":1,"e2":"two"}'}
--VK:--
{'__key': b'json:0', '$': b'{"n1":0,"n2":0,"n3":0,"t1":"","t2":"","t3":"","v1":[0,0,0],"e1":1,"e2":"two"}'}
{'__key': b'json:4', '$': b'{"n1":0,"n2":0,"n3":0,"t2":"","t3":"","v1":[4,4,4],"e1":1,"e2":"two"}'}
{'__key': b'json:5', '$': b'{"n1":0,"n2":0,"n3":0,"t2":"","t3":"","v1":[5,5,5,5]}'}
>>>>>>>>>>> ERROR FAILURE <<<<<<<<<<<<<<
>>>>>> Starting Test test_bad_numeric_data[json-2] So Far: Correct:0 Wrong:2 <<<<<<<<<
CMD: b'ft.search json_idx1 @n1:[-inf inf] DIALECT 2'
CMD:['ft.search', 'json_idx1', '@n1:[-inf inf]', 'DIALECT', '2']
Mismatched sizes RL:2 VK:3
--RL--
{'__key': b'json:0', '$': b'{"n1":0,"n2":0,"n3":0,"t1":"","t2":"","t3":"","v1":[0,0,0],"e1":1,"e2":"two"}'}
{'__key': b'json:4', '$': b'{"n1":0,"n2":0,"n3":0,"t2":"","t3":"","v1":[4,4,4],"e1":1,"e2":"two"}'}
--VK:--
{'__key': b'json:0', '$': b'{"n1":0,"n2":0,"n3":0,"t1":"","t2":"","t3":"","v1":[0,0,0],"e1":1,"e2":"two"}'}
{'__key': b'json:4', '$': b'{"n1":0,"n2":0,"n3":0,"t2":"","t3":"","v1":[4,4,4],"e1":1,"e2":"two"}'}
{'__key': b'json:5', '$': b'{"n1":0,"n2":0,"n3":0,"t2":"","t3":"","v1":[5,5,5,5]}'}

@BCathcart
Copy link
Collaborator

@boda26 Mind taking a look at the failing compatibility test? I was wondering if it was my recent change but this looks like a basic numeric search which should be unaffected

@BCathcart
Copy link
Collaborator

Failures probably fixed by #838

@AlexFilipImproving AlexFilipImproving force-pushed the feature/vector-reducer-functions branch from d51f13e to 2819935 Compare March 11, 2026 18:52
This commit squashes 59 commits containing:
- Refactored FT.AGGREGATE reducers to process records in batch
- Vector reducer function optimizations
- Text search performance improvements
- Compatibility test enhancements
- Multi-DB support for FT.SEARCH
- Various bug fixes and test coverage improvements
- Locking optimizations for text indexes
- Memory allocation improvements

Key changes:
- Changed ProcessRecords interface to accept all records at once for batch processing
- Count reducer now O(1) instead of O(n) function calls
- Improved text index locking and performance
- Added comprehensive timeout coverage for search execution
- Enhanced compatibility testing framework
- Fixed lazy expiration crash in search commands
- Optimized stemming, tokenization, and rax mutations
- Added ingestion performance scenarios
- Better separation between data collection and processing phases

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
@AlexFilipImproving AlexFilipImproving force-pushed the feature/vector-reducer-functions branch from 2819935 to 0117d89 Compare March 11, 2026 19:02
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
@AlexFilipImproving AlexFilipImproving force-pushed the feature/vector-reducer-functions branch from 185d720 to 225c756 Compare March 11, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants