feat: add Random Sample reducer to FT.Aggregate groupby stage#949
Open
rileydes-improving wants to merge 27 commits intovalkey-io:mainfrom
Open
feat: add Random Sample reducer to FT.Aggregate groupby stage#949rileydes-improving wants to merge 27 commits intovalkey-io:mainfrom
rileydes-improving wants to merge 27 commits intovalkey-io:mainfrom
Conversation
This commit squashes 59 commits containing: - Refactored FT.AGGREGATE reducers to process records in batch - Vector reducer function optimizations - Text search performance improvements - Compatibility test enhancements - Multi-DB support for FT.SEARCH - Various bug fixes and test coverage improvements - Locking optimizations for text indexes - Memory allocation improvements Key changes: - Changed ProcessRecords interface to accept all records at once for batch processing - Count reducer now O(1) instead of O(n) function calls - Improved text index locking and performance - Added comprehensive timeout coverage for search execution - Enhanced compatibility testing framework - Fixed lazy expiration crash in search commands - Optimized stemming, tokenization, and rax mutations - Added ingestion performance scenarios - Better separation between data collection and processing phases Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Member
|
I didn't review this because it's largely a duplicate of 946. It's too hard to see the differences. Let's fixup 946 and then rebase this one on top of it. |
f3c20cd to
5aca69c
Compare
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
5aca69c to
221ced0
Compare
Refactor GroupBy reducers from batch ProcessRecords to incremental ProcessRecord, allowing each reducer to process records as they arrive instead of collecting all values first. Reducers now own their parsing logic via BasicReducerParser, decoupling reducer construction from the aggregate parser. ReducerInfo becomes a factory function pointer and Reducer is now a virtual base class with MakeInstance. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
This commit adds comprehensive vector value type support to the expression evaluation system, enabling vectors to be used in FT.AGGREGATE operations. Key changes: - Implement AsVector() accessor method for Value class - Add Vector variant to expr::Value with std::vector<double> storage - Implement vector-specific functions (VECLEN, VECGET, VECSET, etc.) - Add vector support to mathematical and string functions - Implement vector serialization to RESP format - Add comprehensive error handling for vector operations - Add vector comparison operators and nested vector support - Refactor Value::Nil to use std::string instead of const char* Testing: - Add comprehensive unit tests for vector operations - Add nested vector tests - Add vector comparison tests This also includes upstream changes from main branch merged during development, including text search optimizations, compatibility tests, and various bug fixes. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
note: ProcessRecords cannot propogate errors which is why errors return nill - Add RANDOM_SAMPLE reducer to ft.aggregate command - Update command documentation with RANDOM_SAMPLE syntax and description - Implement RANDOM_SAMPLE parsing in ft_aggregate_parser.h - Add RANDOM_SAMPLE execution logic in ft_aggregate_exec.cc - Update ft.aggregate.json command schema with RANDOM_SAMPLE definition - Add comprehensive unit and integration tests for RANDOM_SAMPLE Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
…ic once Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
…nges - Move sample_size_ initialization from ProcessRecords to constructor - Change ProcessRecords to ProcessRecord for single-record processing - Extract RandomSampleReducer struct with custom parser for parse-time validation - Validate sample size at parse time instead of runtime for better error handling - Remove ReducerInstanceVector struct (no longer needed) - Register RandomSampleReducerParser in reducerTable for RANDOM_SAMPLE command - Simplifies runtime processing by pre-validating parameters during aggregation setup (matching Redis compatibility) Signed-off-by: Riley Des <riley.desserre@improving.com>
…ity tests - Add comprehensive error case tests for RANDOM_SAMPLE reducer in compatibility test suite - Include validation for wrong argument counts, non-numeric/non-integer sample sizes, negative values, and values exceeding the 1000 cap - Remove duplicate error handling tests from test_non_vector.py that are now covered by compatibility tests - Add TryParseStages helper method to ft_aggregate_exec_test.cc for testing parse-time error paths - Add RandomSampleParseErrorsTest to verify negative sample sizes and other invalid inputs are rejected at parse time - Consolidate error validation into compatibility layer to avoid redundant test coverage Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
221ced0 to
bae15fe
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a RANDOM_SAMPLE reducer to FT.AGGREGATE GROUPBY stage, enabling reservoir-sampled field values to be returned as RESP arrays from aggregate queries in RediSearch compatible behavior (Not testable as sampling is random)
Changes
RandomSampleclass inft_aggregate_exec.ccimplements Algorithm R reservoir sampling with nil-skipping, a 1000-element cap, and a thread_local std::mt19937; registered in GroupBy::reducerTable as RANDOM_SAMPLE with 2 arguments (@ field, sample_size).ft.aggregate.jsonandft.aggregate.mdupdated to includeRANDOM_SAMPLEas a valid reducer.Testing
Example
Returns one group with sample as a RESP array of up to 5 randomly selected price values.