[DO NOT MERGE, ONLY FOR REVIEW] Feature/random sample implementation#5
Draft
rileydes-improving wants to merge 28 commits intofeature/vector-reducer-functionsfrom
Draft
Conversation
This commit adds comprehensive vector value type support to the expression evaluation system, enabling vectors to be used in FT.AGGREGATE operations. Key changes: - Implement AsVector() accessor method for Value class - Add Vector variant to expr::Value with std::vector<double> storage - Implement vector-specific functions (VECLEN, VECGET, VECSET, etc.) - Add vector support to mathematical and string functions - Implement vector serialization to RESP format - Add comprehensive error handling for vector operations - Add vector comparison operators and nested vector support - Refactor Value::Nil to use std::string instead of const char* Testing: - Add comprehensive unit tests for vector operations - Add nested vector tests - Add vector comparison tests This also includes upstream changes from main branch merged during development, including text search optimizations, compatibility tests, and various bug fixes. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Changed the ProcessRecords interface to accept all records at once instead of processing them one at a time. This enables more efficient implementations, particularly for COUNT which now simply returns the input size rather than incrementing a counter for each record. Changes: - Updated ReducerInstance::ProcessRecords to accept const std::vector<absl::InlinedVector<expr::Value, 4>>& all_values - Modified GroupBy::Execute to collect all values per group before calling ProcessRecords once - Updated all reducer implementations (Count, Min, Max, Sum, Avg, Stddev, CountDistinct) to process batched records Benefits: - Count reducer is now O(1) instead of O(n) function calls - Better separation between data collection and processing phases - Enables future optimizations for batch-aware reducers Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add AsVector() method that returns optional<shared_ptr<vector<Value>>> for safe type checking and conversion. Returns nullopt for non-vector values. Also add comprehensive tests for all vector accessor methods: - GetVector() returns correct shared_ptr - GetVectorElement() with bounds checking - AsVector() returns value for vectors, nullopt for scalars - Shared_ptr semantics verification - Nested vector access Task: 2.3 Add immutable accessor methods Spec: vector-value-type Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add support for vector data type in the expression value system: - Add kVector type to Value::Type enum - Implement vector storage using std::vector<double> - Add vector construction, comparison, and type checking - Include comprehensive unit tests for vector operations Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add element-wise vector support to FuncFloor, FuncCeil, FuncAbs, FuncLog, FuncLog2, FuncExp, and FuncSqrt using ApplyToElements helper. This enables mathematical operations on vectors in FT.AGGREGATE expressions, applying the function to each element while preserving vector structure. Completes task 5.2 from vector-value-type spec. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add comprehensive error handling for vector operations with clear, descriptive error messages: - Add error message generation functions: * MakeTypeMismatchError() for type incompatibilities * MakeLengthMismatchError() for vector length mismatches * MakeIndexOutOfBoundsError() for invalid indices * MakeElementError() for element-level errors with index tracking - Simplify vector operation helpers by accepting Value::Vector directly: * ApplyToElements() - catches and wraps element errors with index * ApplyWithScalar() - catches and wraps element errors with index * ApplyElementWise() - checks lengths and reports mismatches All error cases return Value(Nil(reason)) with descriptive messages that include context like element indices and actual vector lengths. Addresses requirements 3.4, 4.3, 5.1, 5.2, 5.3, 5.4 from the vector value type specification. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add unit tests covering all vector comparison scenarios: - Equal vectors (same elements, empty, single-element, strings, mixed types, nested) - Vectors differing in elements (first, middle, last, strings, negatives, nested) - Vectors differing in length (shorter vs longer, empty vs non-empty) - Vector vs scalar comparisons (returns UNORDERED for all scalar types) Tests validate lexicographic comparison, length-based comparison when prefixes match, and proper UNORDERED handling for vector-scalar cases. Validates Requirement 7.1 from vector-value-type spec. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
- Add SerializeValueToResp() function for recursive RESP serialization - Vectors serialize as RESP arrays with nested element serialization - Add DeserializeValueFromResp() for future TOLIST/RDB support - Update ReplyWithValue() in ft_aggregate.cc to handle vector values - Add tests for vector serialization and deserialization - Supports nested vectors and mixed-type vectors Implements task 9 from vector-value-type spec (Requirements 2.1, 2.2, 2.4, 10.4) Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add five new vector-specific functions to the Value type: - FuncVectorLen: returns the length of a vector - FuncVectorAt: accesses vector elements by index - FuncIsVector: checks if a value is a vector - FuncMakeVector: creates a vector from multiple values - FuncFlatten: flattens nested vectors to a specified depth These functions enable vector manipulation in FT.AGGREGATE expressions and provide essential vector operations for the expression system. Implements requirements 1.2, 6.1, 6.2, 11.1, 11.2 from the vector-value-type spec. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add tests for nested vector construction at 2 and 3 levels of nesting, including mixed depths and empty inner vectors. Add tests verifying that scalar functions (FuncLower, FuncFloor, FuncCeil) and arithmetic operations recursively apply to nested vector elements. Add tests for element access and FuncVectorLen on nested structures. These tests validate Requirements 11.1, 11.2, and 11.3 for nested vector support in the vector value type implementation. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Replace const char* with std::string in Value::Nil to avoid potential dangling pointer issues. The Nil class now owns the error message string, making it safer when temporary strings are passed as error reasons. Changes: - Nil constructor now takes std::string by value - Internal storage changed from const char* to std::string - GetReason() returns c_str() for backward compatibility - Updated all call sites to pass std::string directly - Improved test assertions to check exact error messages Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Update FuncStrlen, FuncStartswith, and FuncContains to support vector operands with element-wise application and broadcasting semantics. - FuncStrlen: Apply to each element of a vector - FuncStartswith: Support vector-scalar, scalar-vector, and vector-vector - FuncContains: Support vector-scalar, scalar-vector, and vector-vector All functions now check IsVector() only once per value for efficiency. Mathematical functions (FuncFloor, FuncCeil, FuncAbs, FuncLog, FuncLog2, FuncExp, FuncSqrt) already had vector support implemented. This completes task 5 from the vector-value-type spec. Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
note: ProcessRecords cannot propogate errors which is why errors return nill - Add RANDOM_SAMPLE reducer to ft.aggregate command - Update command documentation with RANDOM_SAMPLE syntax and description - Implement RANDOM_SAMPLE parsing in ft_aggregate_parser.h - Add RANDOM_SAMPLE execution logic in ft_aggregate_exec.cc - Update ft.aggregate.json command schema with RANDOM_SAMPLE definition - Add comprehensive unit and integration tests for RANDOM_SAMPLE Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
…ving error checking flow, fixing off by one in reservoir sampling Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
…re/random-sample-implementation
930e4b9 to
faf7078
Compare
Signed-off-by: Riley Des <riley.desserre@improving.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.