feat: add Random Sample reducer to FT.Aggregate groupby stage by rileydes-improving · Pull Request #949 · valkey-io/valkey-search

rileydes-improving · 2026-04-01T21:39:17Z

Summary

Adds a RANDOM_SAMPLE reducer to FT.AGGREGATE GROUPBY stage, enabling reservoir-sampled field values to be returned as RESP arrays from aggregate queries in RediSearch compatible behavior (Not testable as sampling is random)

Changes

RandomSample class in ft_aggregate_exec.cc implements Algorithm R reservoir sampling with nil-skipping, a 1000-element cap, and a thread_local std::mt19937; registered in GroupBy::reducerTable as RANDOM_SAMPLE with 2 arguments (@ field, sample_size).
ft.aggregate.json and ft.aggregate.md updated to include RANDOM_SAMPLE as a valid reducer.
Uses vector value type from feat: Add vector value type support to expression system #893

Testing

Unit: ft_aggregate_exec_test.cc covers empty group, single element, sample < / == / > group size, nil exclusion, string and mixed-type inputs, multiple independent reducers, and per-group isolation
Integration: test_non_vector.py covers basic sampling, size < / == / > group, sample size 0, multiple reducers, per-category GROUPBY, and numeric field values.
Compatibility tests not added

Example

FT.AGGREGATE products "@price:[1 1000]"
  LOAD 1 price  APPLY 1 AS all
  GROUPBY 1 @all
  REDUCE RANDOM_SAMPLE 2 @price 5 AS sample

Returns one group with sample as a RESP array of up to 5 randomly selected price values.

This commit squashes 59 commits containing: - Refactored FT.AGGREGATE reducers to process records in batch - Vector reducer function optimizations - Text search performance improvements - Compatibility test enhancements - Multi-DB support for FT.SEARCH - Various bug fixes and test coverage improvements - Locking optimizations for text indexes - Memory allocation improvements Key changes: - Changed ProcessRecords interface to accept all records at once for batch processing - Count reducer now O(1) instead of O(n) function calls - Improved text index locking and performance - Added comprehensive timeout coverage for search execution - Enhanced compatibility testing framework - Fixed lazy expiration crash in search commands - Optimized stemming, tokenization, and rax mutations - Added ingestion performance scenarios - Better separation between data collection and processing phases Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

allenss-amazon · 2026-04-02T21:59:32Z

I didn't review this because it's largely a duplicate of 946. It's too hard to see the differences. Let's fixup 946 and then rebase this one on top of it.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Refactor GroupBy reducers from batch ProcessRecords to incremental ProcessRecord, allowing each reducer to process records as they arrive instead of collecting all values first. Reducers now own their parsing logic via BasicReducerParser, decoupling reducer construction from the aggregate parser. ReducerInfo becomes a factory function pointer and Reducer is now a virtual base class with MakeInstance. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

This commit adds comprehensive vector value type support to the expression evaluation system, enabling vectors to be used in FT.AGGREGATE operations. Key changes: - Implement AsVector() accessor method for Value class - Add Vector variant to expr::Value with std::vector<double> storage - Implement vector-specific functions (VECLEN, VECGET, VECSET, etc.) - Add vector support to mathematical and string functions - Implement vector serialization to RESP format - Add comprehensive error handling for vector operations - Add vector comparison operators and nested vector support - Refactor Value::Nil to use std::string instead of const char* Testing: - Add comprehensive unit tests for vector operations - Add nested vector tests - Add vector comparison tests This also includes upstream changes from main branch merged during development, including text search optimizations, compatibility tests, and various bug fixes. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

note: ProcessRecords cannot propogate errors which is why errors return nill - Add RANDOM_SAMPLE reducer to ft.aggregate command - Update command documentation with RANDOM_SAMPLE syntax and description - Implement RANDOM_SAMPLE parsing in ft_aggregate_parser.h - Add RANDOM_SAMPLE execution logic in ft_aggregate_exec.cc - Update ft.aggregate.json command schema with RANDOM_SAMPLE definition - Add comprehensive unit and integration tests for RANDOM_SAMPLE Signed-off-by: Riley Des <riley.desserre@improving.com>

Signed-off-by: Riley Des <riley.desserre@improving.com>

…ic once Signed-off-by: Riley Des <riley.desserre@improving.com>

Signed-off-by: Riley Des <riley.desserre@improving.com>

…nges - Move sample_size_ initialization from ProcessRecords to constructor - Change ProcessRecords to ProcessRecord for single-record processing - Extract RandomSampleReducer struct with custom parser for parse-time validation - Validate sample size at parse time instead of runtime for better error handling - Remove ReducerInstanceVector struct (no longer needed) - Register RandomSampleReducerParser in reducerTable for RANDOM_SAMPLE command - Simplifies runtime processing by pre-validating parameters during aggregation setup (matching Redis compatibility) Signed-off-by: Riley Des <riley.desserre@improving.com>

…ity tests - Add comprehensive error case tests for RANDOM_SAMPLE reducer in compatibility test suite - Include validation for wrong argument counts, non-numeric/non-integer sample sizes, negative values, and values exceeding the 1000 cap - Remove duplicate error handling tests from test_non_vector.py that are now covered by compatibility tests - Add TryParseStages helper method to ft_aggregate_exec_test.cc for testing parse-time error paths - Add RandomSampleParseErrorsTest to verify negative sample sizes and other invalid inputs are rejected at parse time - Consolidate error validation into compatibility layer to avoid redundant test coverage Signed-off-by: Riley Des <riley.desserre@improving.com>

Signed-off-by: Riley Des <riley.desserre@improving.com>

AlexFilipImproving added 9 commits March 11, 2026 15:02

Merge branch 'main' into feature/vector-reducer-functions

5bc847c

Merge main and regenerate tests

29ed9d8

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Fix changes that git messed up in the merge

225c756

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Merge branch 'main' into feature/vector-reducer-functions

8280227

Fix some integration tests

6f9d638

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Merge branch 'main' into feature/vector-reducer-functions

32aeebd

Merge branch 'main' into feature/vector-reducer-functions

1841331

Merge branch 'main' into feature/vector-reducer-functions

1b72e51

Merge branch 'main' into feature/vector-reducer-functions

23a5618

rileydes-improving force-pushed the feature/random-sample-implementation branch from f3c20cd to 5aca69c Compare April 13, 2026 17:57

Remove test regeneration in generate.py

ff40960

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

rileydes-improving force-pushed the feature/random-sample-implementation branch from 5aca69c to 221ced0 Compare April 15, 2026 15:54

AlexFilipImproving and others added 16 commits April 17, 2026 17:25

Remove subprocess import

6cda697

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Merge branch 'main' into feature/vector-reducer-functions

8451a99

Fix formatting issue

4cfd0a4

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Formatting

743f18c

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Change name from Vector to Array and modify related constructor in Value

8b6eed5

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Format fix

5097428

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

test: pruning RANDOM_SAMPLE unit tests to only essential

c7ac819

Signed-off-by: Riley Des <riley.desserre@improving.com>

format: autoformat changes

1fa83d1

Signed-off-by: Riley Des <riley.desserre@improving.com>

refactor(ft_aggregate): running RandomSample initialization logic log…

bbe70f9

…ic once Signed-off-by: Riley Des <riley.desserre@improving.com>

refactor(ft_aggregate_exec_test): update Vector to Array naming

44b91ea

Signed-off-by: Riley Des <riley.desserre@improving.com>

style: autoformat changes

bae15fe

Signed-off-by: Riley Des <riley.desserre@improving.com>

rileydes-improving force-pushed the feature/random-sample-implementation branch from 221ced0 to bae15fe Compare April 23, 2026 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Random Sample reducer to FT.Aggregate groupby stage#949

feat: add Random Sample reducer to FT.Aggregate groupby stage#949
rileydes-improving wants to merge 27 commits intovalkey-io:mainfrom
Bit-Quill:feature/random-sample-implementation

rileydes-improving commented Apr 1, 2026

Uh oh!

allenss-amazon commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rileydes-improving commented Apr 1, 2026

Summary

Changes

Testing

Example

Uh oh!

allenss-amazon commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants