Skip to content

[DO NOT MERGE, ONLY FOR REVIEW] Feature/random sample implementation#5

Draft
rileydes-improving wants to merge 28 commits intofeature/vector-reducer-functionsfrom
feature/random-sample-implementation
Draft

[DO NOT MERGE, ONLY FOR REVIEW] Feature/random sample implementation#5
rileydes-improving wants to merge 28 commits intofeature/vector-reducer-functionsfrom
feature/random-sample-implementation

Conversation

@rileydes-improving
Copy link
Copy Markdown

No description provided.

This commit adds comprehensive vector value type support to the expression
evaluation system, enabling vectors to be used in FT.AGGREGATE operations.

Key changes:
- Implement AsVector() accessor method for Value class
- Add Vector variant to expr::Value with std::vector<double> storage
- Implement vector-specific functions (VECLEN, VECGET, VECSET, etc.)
- Add vector support to mathematical and string functions
- Implement vector serialization to RESP format
- Add comprehensive error handling for vector operations
- Add vector comparison operators and nested vector support
- Refactor Value::Nil to use std::string instead of const char*

Testing:
- Add comprehensive unit tests for vector operations
- Add nested vector tests
- Add vector comparison tests

This also includes upstream changes from main branch merged during
development, including text search optimizations, compatibility tests,
and various bug fixes.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
@rileydes-improving rileydes-improving changed the title Feature/random sample implementation [DO NOT MERGE, ONLY FOR REVIEW] Feature/random sample implementation Mar 16, 2026
AlexFilipImproving and others added 26 commits March 16, 2026 10:52
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Changed the ProcessRecords interface to accept all records at once instead of processing them one at a time. This enables more efficient implementations, particularly for COUNT which now simply returns the input size rather than incrementing a counter for each record.

Changes:
- Updated ReducerInstance::ProcessRecords to accept const std::vector<absl::InlinedVector<expr::Value, 4>>& all_values
- Modified GroupBy::Execute to collect all values per group before calling ProcessRecords once
- Updated all reducer implementations (Count, Min, Max, Sum, Avg, Stddev, CountDistinct) to process batched records

Benefits:
- Count reducer is now O(1) instead of O(n) function calls
- Better separation between data collection and processing phases
- Enables future optimizations for batch-aware reducers

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add AsVector() method that returns optional<shared_ptr<vector<Value>>>
for safe type checking and conversion. Returns nullopt for non-vector
values.

Also add comprehensive tests for all vector accessor methods:
- GetVector() returns correct shared_ptr
- GetVectorElement() with bounds checking
- AsVector() returns value for vectors, nullopt for scalars
- Shared_ptr semantics verification
- Nested vector access

Task: 2.3 Add immutable accessor methods
Spec: vector-value-type
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add support for vector data type in the expression value system:
- Add kVector type to Value::Type enum
- Implement vector storage using std::vector<double>
- Add vector construction, comparison, and type checking
- Include comprehensive unit tests for vector operations

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add element-wise vector support to FuncFloor, FuncCeil, FuncAbs, FuncLog,
FuncLog2, FuncExp, and FuncSqrt using ApplyToElements helper.

This enables mathematical operations on vectors in FT.AGGREGATE expressions,
applying the function to each element while preserving vector structure.

Completes task 5.2 from vector-value-type spec.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add comprehensive error handling for vector operations with clear,
descriptive error messages:

- Add error message generation functions:
  * MakeTypeMismatchError() for type incompatibilities
  * MakeLengthMismatchError() for vector length mismatches
  * MakeIndexOutOfBoundsError() for invalid indices
  * MakeElementError() for element-level errors with index tracking

- Simplify vector operation helpers by accepting Value::Vector directly:
  * ApplyToElements() - catches and wraps element errors with index
  * ApplyWithScalar() - catches and wraps element errors with index
  * ApplyElementWise() - checks lengths and reports mismatches

All error cases return Value(Nil(reason)) with descriptive messages
that include context like element indices and actual vector lengths.

Addresses requirements 3.4, 4.3, 5.1, 5.2, 5.3, 5.4 from the vector
value type specification.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add unit tests covering all vector comparison scenarios:
- Equal vectors (same elements, empty, single-element, strings, mixed types, nested)
- Vectors differing in elements (first, middle, last, strings, negatives, nested)
- Vectors differing in length (shorter vs longer, empty vs non-empty)
- Vector vs scalar comparisons (returns UNORDERED for all scalar types)

Tests validate lexicographic comparison, length-based comparison when
prefixes match, and proper UNORDERED handling for vector-scalar cases.

Validates Requirement 7.1 from vector-value-type spec.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
- Add SerializeValueToResp() function for recursive RESP serialization
- Vectors serialize as RESP arrays with nested element serialization
- Add DeserializeValueFromResp() for future TOLIST/RDB support
- Update ReplyWithValue() in ft_aggregate.cc to handle vector values
- Add tests for vector serialization and deserialization
- Supports nested vectors and mixed-type vectors

Implements task 9 from vector-value-type spec (Requirements 2.1, 2.2, 2.4, 10.4)

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add five new vector-specific functions to the Value type:
- FuncVectorLen: returns the length of a vector
- FuncVectorAt: accesses vector elements by index
- FuncIsVector: checks if a value is a vector
- FuncMakeVector: creates a vector from multiple values
- FuncFlatten: flattens nested vectors to a specified depth

These functions enable vector manipulation in FT.AGGREGATE expressions
and provide essential vector operations for the expression system.

Implements requirements 1.2, 6.1, 6.2, 11.1, 11.2 from the vector-value-type spec.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Add tests for nested vector construction at 2 and 3 levels of nesting,
including mixed depths and empty inner vectors. Add tests verifying that
scalar functions (FuncLower, FuncFloor, FuncCeil) and arithmetic operations
recursively apply to nested vector elements. Add tests for element access
and FuncVectorLen on nested structures.

These tests validate Requirements 11.1, 11.2, and 11.3 for nested vector
support in the vector value type implementation.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Replace const char* with std::string in Value::Nil to avoid potential
dangling pointer issues. The Nil class now owns the error message string,
making it safer when temporary strings are passed as error reasons.

Changes:
- Nil constructor now takes std::string by value
- Internal storage changed from const char* to std::string
- GetReason() returns c_str() for backward compatibility
- Updated all call sites to pass std::string directly
- Improved test assertions to check exact error messages

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Update FuncStrlen, FuncStartswith, and FuncContains to support vector
operands with element-wise application and broadcasting semantics.

- FuncStrlen: Apply to each element of a vector
- FuncStartswith: Support vector-scalar, scalar-vector, and vector-vector
- FuncContains: Support vector-scalar, scalar-vector, and vector-vector

All functions now check IsVector() only once per value for efficiency.
Mathematical functions (FuncFloor, FuncCeil, FuncAbs, FuncLog, FuncLog2,
FuncExp, FuncSqrt) already had vector support implemented.

This completes task 5 from the vector-value-type spec.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
note: ProcessRecords cannot propogate errors which is why errors return nill

- Add RANDOM_SAMPLE reducer to ft.aggregate command
- Update command documentation with RANDOM_SAMPLE syntax and description
- Implement RANDOM_SAMPLE parsing in ft_aggregate_parser.h
- Add RANDOM_SAMPLE execution logic in ft_aggregate_exec.cc
- Update ft.aggregate.json command schema with RANDOM_SAMPLE definition
- Add comprehensive unit and integration tests for RANDOM_SAMPLE

Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
…ving error checking flow, fixing off by one in reservoir sampling

Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
@rileydes-improving rileydes-improving force-pushed the feature/random-sample-implementation branch from 930e4b9 to faf7078 Compare March 16, 2026 18:21
Signed-off-by: Riley Des <riley.desserre@improving.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants