Skip to content

[DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation#3

Draft
rileydes-improving wants to merge 14 commits intofeature/vector-reducer-functionsfrom
feature/quantile-implementation
Draft

[DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation#3
rileydes-improving wants to merge 14 commits intofeature/vector-reducer-functionsfrom
feature/quantile-implementation

Conversation

@rileydes-improving
Copy link
Copy Markdown

No description provided.

@rileydes-improving rileydes-improving changed the title Feature/quantile implementation [DO NOT MERGE, ONLY FOR REVIEW] Feature/quantile implementation Mar 9, 2026
@rileydes-improving rileydes-improving force-pushed the feature/quantile-implementation branch from 5a2adda to 5e863aa Compare March 11, 2026 16:00
@AlexFilipImproving AlexFilipImproving force-pushed the feature/vector-reducer-functions branch 2 times, most recently from 2819935 to 0117d89 Compare March 11, 2026 19:02
AlexFilipImproving pushed a commit that referenced this pull request Mar 11, 2026
Resolves: valkey-io#703 

Encountered the following series of crashes during engine startup. 
1. SIGSEGV in `string2ll_resolver` — IFUNC resolver called before TSAN
runtime initializes
```
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6de0fb0 in ?? ()
#0  0x00007ffff6de0fb0 in ?? ()
#1  0x00000000004b89a7 in string2ll_resolver.lto_priv ()
#2  0x00007ffff7de4d94 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2
#3  0x00007ffff7ddceaa in dl_main () from /lib64/ld-linux-x86-64.so.2
#4  0x00007ffff7df0b5f in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2
#5  0x00007ffff7ddad2e in _dl_start () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff7dd9ef8 in _start () from /lib64/ld-linux-x86-64.so.2
valkey-io#7  0x0000000000000003 in ?? ()
valkey-io#8  0x00007fffffffe04c in ?? ()
valkey-io#9  0x00007fffffffe0b6 in ?? ()
valkey-io#10 0x00007fffffffe0bd in ?? ()
valkey-io#11 0x0000000000000000 in ?? ()
```
2. TSAN abort on `RTLD_DEEPBIND` — module loader uses `RTLD_DEEPBIND`
which is incompatible with TSAN
```
cat /workplace/baswanth/tsan/valkey-search/testing/integration/.build-release-tsan/output/6379_stdout.txt 2>/dev/null | tail -20
45443:M 25 Feb 2026 01:57:58.222 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
45443:M 25 Feb 2026 01:57:58.222 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
45443:M 25 Feb 2026 01:57:58.222 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=45443, just started
45443:M 25 Feb 2026 01:57:58.222 * Configuration loaded
45443:M 25 Feb 2026 01:57:58.224 * monotonic clock: POSIX clock_gettime
45443:M 25 Feb 2026 01:57:58.226 * Running mode=cluster, port=6379.
45443:M 25 Feb 2026 01:57:58.226 * No cluster configuration found, I'm a13c403e8a798f9ad051c654ff540ce1b3833b9d
==45443==You are trying to dlopen a /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so shared library with RTLD_DEEPBIND flag which is incompatible with sanitizer runtime (see google/sanitizers#611 for details). If you want to run /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags.
```
3. `CHECK failed: sanitizer_deadlock_detector.h:67` — TSAN deadlock
detector overflows during module load
```
7826:M 25 Feb 2026 02:39:45.284 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=17826, just started
17826:M 25 Feb 2026 02:39:45.284 * Configuration loaded
17826:M 25 Feb 2026 02:39:45.286 * monotonic clock: POSIX clock_gettime
17826:M 25 Feb 2026 02:39:45.288 * Running mode=cluster, port=6379.
17826:M 25 Feb 2026 02:39:45.289 * No cluster configuration found, I'm 134edf06efe1a2e084ce9c94a43dab67e8376dfc
17826:M 25 Feb 2026 02:39:45.298 . <module> Setting shards to 524287
ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=17826)
```

core changes that needs to be added.
```
cd /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-server && git diff src/util.c src/module.c src/config.h
diff --git a/src/config.h b/src/config.h
index 0c6b111a9..f80fa8cbc 100644
--- a/src/config.h
+++ b/src/config.h
@@ -172,6 +172,10 @@
 #define VALKEY_NO_SANITIZE(sanitizer)
 #endif
 
+#if defined(__SANITIZE_THREAD__)
+#define VALKEY_THREAD_SANITIZER 1
+#endif
+
 #if defined(__SANITIZE_ADDRESS__)
 /* GCC */
 #define VALKEY_ADDRESS_SANITIZER 1
diff --git a/src/module.c b/src/module.c
index 8d1bad4fe..0435499a8 100644
--- a/src/module.c
+++ b/src/module.c
@@ -12545,7 +12545,7 @@ int moduleLoad(const char *path, void **module_argv, int module_argc, int is_loa
     }
 
     int dlopen_flags = RTLD_NOW | RTLD_LOCAL;
-#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && __has_include(<dlfcn.h>)
+#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && !defined(VALKEY_THREAD_SANITIZER) && __has_include(<dlfcn.h>)
     /* RTLD_DEEPBIND, which is required for loading modules that contains the
      * same symbols, does not work with ASAN. Therefore, we exclude
      * RTLD_DEEPBIND when doing test builds with ASAN.
diff --git a/src/util.c b/src/util.c
index 0631736c8..cf5a073b1 100644
--- a/src/util.c
+++ b/src/util.c
@@ -622,7 +622,7 @@ static int string2llScalar(const char *s, size_t slen, long long *value) {
 }
 
 #if HAVE_IFUNC && HAVE_X86_SIMD
-__attribute__((no_sanitize_address, used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) {
+__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) {
     /* Ifunc resolvers run before ASan initialization and before CPU detection
      * is initialized, so disable ASan and init CPU detection here. */
     __builtin_cpu_init();
 ```

---------

Signed-off-by: Baswanth Vegunta <baswanth@amazon.com>
Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
AlexFilipImproving and others added 8 commits March 12, 2026 08:26
Changed the ProcessRecords interface to accept all records at once instead of processing them one at a time. This enables more efficient implementations, particularly for COUNT which now simply returns the input size rather than incrementing a counter for each record.

Changes:
- Updated ReducerInstance::ProcessRecords to accept const std::vector<absl::InlinedVector<expr::Value, 4>>& all_values
- Modified GroupBy::Execute to collect all values per group before calling ProcessRecords once
- Updated all reducer implementations (Count, Min, Max, Sum, Avg, Stddev, CountDistinct) to process batched records

Benefits:
- Count reducer is now O(1) instead of O(n) function calls
- Better separation between data collection and processing phases
- Enables future optimizations for batch-aware reducers

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
@rileydes-improving rileydes-improving force-pushed the feature/quantile-implementation branch from 5e863aa to a194333 Compare March 12, 2026 17:31
…imation

- Add QUANTILE reducer function using Greenwald-Khanna streaming algorithm
- Support quantile values from 0.0 to 1.0 with 1% error bound for memory efficiency
- Update ft.aggregate.md documentation with QUANTILE reducer specification
- Add QUANTILE to ft.aggregate.json command schema with 2 required arguments
- Implement QUANTILE in ft_aggregate_exec.cc with streaming quantile calculation
- Update ft_aggregate_parser.cc and ft_aggregate_parser.h to parse QUANTILE syntax
- Add comprehensive test coverage in ft_aggregate_parser_test.cc and ft_aggregate_exec_test.cc
- Add compatibility tests for basic usage, multiple quantiles, edge cases, and error conditions
- Add integration tests validating QUANTILE with various quantile values and groupby operations

Signed-off-by: Riley Des <riley.desserre@improving.com>
@rileydes-improving rileydes-improving force-pushed the feature/quantile-implementation branch from a194333 to 09f81ba Compare March 13, 2026 13:50
KarthikSubbarao and others added 5 commits March 16, 2026 13:54
…oordinator / iscancelled cases (valkey-io#901)

The PR does the following:

Update to oom flags on search commands. We are marking search commands
with the denyoom flags following what was done since 1.0.
Updated Error Tracking / Propagation to ensure we do not always return
the "timeout" message even we cancel a fanout operation.
Updated the normal code path of search commands (commands.cc) to first
attempt replying with an error message based on the status. If no error,
then check if canceled and return the standard timeout error.
Note: The logic for early cancellation when partial results are disabled
is still retained in this PR. If there is a timeout when partial results
are disabled, we cancel without waiting and return the timed out error
message.

The parts which are new are the following:
We have added handling for the case (with partial results disabled) and
one of the fanouts fails with OOM. We now have correct with error
propagation for it
It also adds handling for the case (with partial results enabled) where
we have not timed out, and all nodes have failed. Since it tracks the
first error message, we are able to reply with the error now

---------

Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Removing some scenarios due to benchamrk workflow taking too long. 

Removed 12b , 12c, 12e -> PositionMap extreme test already covered in
12a. 12b, 12c, 12e test the PositionMap differently but not needed now.

Removed 14a -> failing on CME

Removed 14c, 14e -> Removing Baseline case with SUFFIXTRIE enabled and
STOPWORDS 0 since same performance as the baseline case, so not adding
any value.

Changed 13b to have 500 clients from 1000.

---------

Signed-off-by: Aksha Thakkar <thaakb@amazon.com>
Signed-off-by: AkshaThakkar1812 <akshathakkar@gmail.com>
Co-authored-by: Aksha Thakkar <thaakb@amazon.com>
…y-io#896)

kDefaultMinPrefixLength is annotated as “configurable”, but it is
currently hardcoded and cannot be changed at runtime.
This PR adds a module config (search.tag-min-prefix-length) to control
the minimum prefix length for TAG wildcard queries (e.g.,
@field:{prefix*}).

issue : valkey-io#895

Signed-off-by: Su Ko <rhtn1128@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants