[DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation by rileydes-improving · Pull Request #3 · Bit-Quill/valkey-search

rileydes-improving · 2026-03-05T14:23:32Z

No description provided.

Resolves: valkey-io#703 Encountered the following series of crashes during engine startup. 1. SIGSEGV in `string2ll_resolver` — IFUNC resolver called before TSAN runtime initializes ``` Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6de0fb0 in ?? () #0 0x00007ffff6de0fb0 in ?? () #1 0x00000000004b89a7 in string2ll_resolver.lto_priv () #2 0x00007ffff7de4d94 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7ddceaa in dl_main () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df0b5f in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7ddad2e in _dl_start () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff7dd9ef8 in _start () from /lib64/ld-linux-x86-64.so.2 valkey-io#7 0x0000000000000003 in ?? () valkey-io#8 0x00007fffffffe04c in ?? () valkey-io#9 0x00007fffffffe0b6 in ?? () valkey-io#10 0x00007fffffffe0bd in ?? () valkey-io#11 0x0000000000000000 in ?? () ``` 2. TSAN abort on `RTLD_DEEPBIND` — module loader uses `RTLD_DEEPBIND` which is incompatible with TSAN ``` cat /workplace/baswanth/tsan/valkey-search/testing/integration/.build-release-tsan/output/6379_stdout.txt 2>/dev/null | tail -20 45443:M 25 Feb 2026 01:57:58.222 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 45443:M 25 Feb 2026 01:57:58.222 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 45443:M 25 Feb 2026 01:57:58.222 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=45443, just started 45443:M 25 Feb 2026 01:57:58.222 * Configuration loaded 45443:M 25 Feb 2026 01:57:58.224 * monotonic clock: POSIX clock_gettime 45443:M 25 Feb 2026 01:57:58.226 * Running mode=cluster, port=6379. 45443:M 25 Feb 2026 01:57:58.226 * No cluster configuration found, I'm a13c403e8a798f9ad051c654ff540ce1b3833b9d ==45443==You are trying to dlopen a /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so shared library with RTLD_DEEPBIND flag which is incompatible with sanitizer runtime (see google/sanitizers#611 for details). If you want to run /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags. ``` 3. `CHECK failed: sanitizer_deadlock_detector.h:67` — TSAN deadlock detector overflows during module load ``` 7826:M 25 Feb 2026 02:39:45.284 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=17826, just started 17826:M 25 Feb 2026 02:39:45.284 * Configuration loaded 17826:M 25 Feb 2026 02:39:45.286 * monotonic clock: POSIX clock_gettime 17826:M 25 Feb 2026 02:39:45.288 * Running mode=cluster, port=6379. 17826:M 25 Feb 2026 02:39:45.289 * No cluster configuration found, I'm 134edf06efe1a2e084ce9c94a43dab67e8376dfc 17826:M 25 Feb 2026 02:39:45.298 . <module> Setting shards to 524287 ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=17826) ``` core changes that needs to be added. ``` cd /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-server && git diff src/util.c src/module.c src/config.h diff --git a/src/config.h b/src/config.h index 0c6b111a9..f80fa8cbc 100644 --- a/src/config.h +++ b/src/config.h @@ -172,6 +172,10 @@ #define VALKEY_NO_SANITIZE(sanitizer) #endif +#if defined(__SANITIZE_THREAD__) +#define VALKEY_THREAD_SANITIZER 1 +#endif + #if defined(__SANITIZE_ADDRESS__) /* GCC */ #define VALKEY_ADDRESS_SANITIZER 1 diff --git a/src/module.c b/src/module.c index 8d1bad4fe..0435499a8 100644 --- a/src/module.c +++ b/src/module.c @@ -12545,7 +12545,7 @@ int moduleLoad(const char *path, void **module_argv, int module_argc, int is_loa } int dlopen_flags = RTLD_NOW | RTLD_LOCAL; -#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && __has_include(<dlfcn.h>) +#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && !defined(VALKEY_THREAD_SANITIZER) && __has_include(<dlfcn.h>) /* RTLD_DEEPBIND, which is required for loading modules that contains the * same symbols, does not work with ASAN. Therefore, we exclude * RTLD_DEEPBIND when doing test builds with ASAN. diff --git a/src/util.c b/src/util.c index 0631736c8..cf5a073b1 100644 --- a/src/util.c +++ b/src/util.c @@ -622,7 +622,7 @@ static int string2llScalar(const char *s, size_t slen, long long *value) { } #if HAVE_IFUNC && HAVE_X86_SIMD -__attribute__((no_sanitize_address, used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { +__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { /* Ifunc resolvers run before ASan initialization and before CPU detection * is initialized, so disable ASan and init CPU detection here. */ __builtin_cpu_init(); ``` --------- Signed-off-by: Baswanth Vegunta <baswanth@amazon.com> Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>

Changed the ProcessRecords interface to accept all records at once instead of processing them one at a time. This enables more efficient implementations, particularly for COUNT which now simply returns the input size rather than incrementing a counter for each record. Changes: - Updated ReducerInstance::ProcessRecords to accept const std::vector<absl::InlinedVector<expr::Value, 4>>& all_values - Modified GroupBy::Execute to collect all values per group before calling ProcessRecords once - Updated all reducer implementations (Count, Min, Max, Sum, Avg, Stddev, CountDistinct) to process batched records Benefits: - Count reducer is now O(1) instead of O(n) function calls - Better separation between data collection and processing phases - Enables future optimizations for batch-aware reducers Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Signed-off-by: Allen Samuels <allenss@amazon.com>

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

…imation - Add QUANTILE reducer function using Greenwald-Khanna streaming algorithm - Support quantile values from 0.0 to 1.0 with 1% error bound for memory efficiency - Update ft.aggregate.md documentation with QUANTILE reducer specification - Add QUANTILE to ft.aggregate.json command schema with 2 required arguments - Implement QUANTILE in ft_aggregate_exec.cc with streaming quantile calculation - Update ft_aggregate_parser.cc and ft_aggregate_parser.h to parse QUANTILE syntax - Add comprehensive test coverage in ft_aggregate_parser_test.cc and ft_aggregate_exec_test.cc - Add compatibility tests for basic usage, multiple quantiles, edge cases, and error conditions - Add integration tests validating QUANTILE with various quantile values and groupby operations Signed-off-by: Riley Des <riley.desserre@improving.com>

…oordinator / iscancelled cases (valkey-io#901) The PR does the following: Update to oom flags on search commands. We are marking search commands with the denyoom flags following what was done since 1.0. Updated Error Tracking / Propagation to ensure we do not always return the "timeout" message even we cancel a fanout operation. Updated the normal code path of search commands (commands.cc) to first attempt replying with an error message based on the status. If no error, then check if canceled and return the standard timeout error. Note: The logic for early cancellation when partial results are disabled is still retained in this PR. If there is a timeout when partial results are disabled, we cancel without waiting and return the timed out error message. The parts which are new are the following: We have added handling for the case (with partial results disabled) and one of the fanouts fails with OOM. We now have correct with error propagation for it It also adds handling for the case (with partial results enabled) where we have not timed out, and all nodes have failed. Since it tracks the first error message, we are able to reply with the error now --------- Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

…ey-io#904) Signed-off-by: Elias Tamraz <tamrazelias@gmail.com>

Removing some scenarios due to benchamrk workflow taking too long. Removed 12b , 12c, 12e -> PositionMap extreme test already covered in 12a. 12b, 12c, 12e test the PositionMap differently but not needed now. Removed 14a -> failing on CME Removed 14c, 14e -> Removing Baseline case with SUFFIXTRIE enabled and STOPWORDS 0 since same performance as the baseline case, so not adding any value. Changed 13b to have 500 clients from 1000. --------- Signed-off-by: Aksha Thakkar <thaakb@amazon.com> Signed-off-by: AkshaThakkar1812 <akshathakkar@gmail.com> Co-authored-by: Aksha Thakkar <thaakb@amazon.com>

…y-io#896) kDefaultMinPrefixLength is annotated as “configurable”, but it is currently hardcoded and cannot be changed at runtime. This PR adds a module config (search.tag-min-prefix-length) to control the minimum prefix length for TAG wildcard queries (e.g., @field:{prefix*}). issue : valkey-io#895 Signed-off-by: Su Ko <rhtn1128@gmail.com>

rileydes-improving changed the title ~~Feature/quantile implementation~~ [DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation Mar 9, 2026

rileydes-improving force-pushed the feature/quantile-implementation branch from 5a2adda to 5e863aa Compare March 11, 2026 16:00

AlexFilipImproving force-pushed the feature/vector-reducer-functions branch 2 times, most recently from 2819935 to 0117d89 Compare March 11, 2026 19:02

AlexFilipImproving and others added 8 commits March 12, 2026 08:26

Cleanup sortby_parameter

eb7bf9b

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Merge maps

249ff29

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Fix tests

33003d6

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Put RequiresCompeleteResults back

3540b0a

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

Fix bug in compatibility test

1a797e0

Signed-off-by: Allen Samuels <allenss@amazon.com>

Fixes for compatibility tests

03f2897

Signed-off-by: Allen Samuels <allenss@amazon.com>

Revert tests

82808c7

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

rileydes-improving force-pushed the feature/quantile-implementation branch from 5e863aa to a194333 Compare March 12, 2026 17:31

rileydes-improving force-pushed the feature/quantile-implementation branch from a194333 to 09f81ba Compare March 13, 2026 13:50

KarthikSubbarao and others added 5 commits March 16, 2026 13:54

Fix some integration tests

570d28f

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>

fix: prevent infinite drain loop in forked BGSave child process (valk…

007b88d

…ey-io#904) Signed-off-by: Elias Tamraz <tamrazelias@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation#3

[DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation#3
rileydes-improving wants to merge 14 commits intofeature/vector-reducer-functionsfrom
feature/quantile-implementation

rileydes-improving commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

rileydes-improving commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants