Conversation
allenss-amazon
left a comment
There was a problem hiding this comment.
It's important to handle the case where both SORTBY and LIMIT are specified. In this scenario, rather than a full sort of the neighbors, a heap should be used to retain only the required number of entries.
There was a problem hiding this comment.
The structure of the main data path of this code is wrong.
- It's on the wrong thread and thus looking at potentially stale data. The data for the sort needs to come out of fetched data, not the indexes.
- In cluster mode it's only operating on the local data, not the post-merged global result.
It needs to be placed in the call chain of "SearchCommand::SendReply" after the current contents of the candidate keys has been fetched and is available. This will probably require a logic change in the "NOCONTENT" option (since it's suppressing fetching of per-key fields) as well as ensuring that the content fetch code includes the sortby field.
Perhaps some 1:1 time with me will help here.
src/query/search.cc
Outdated
| return results; | ||
| } | ||
|
|
||
| absl::StatusOr<std::deque<indexes::Neighbor>> ApplySorting( |
There was a problem hiding this comment.
I don't think this can fail. The only possible failure is if the sortby field isn't found, and that's probably a crash since you've already validated it in the command itself long before you get here....
@allenss-amazon I generally understand the idea but yes, I think a 1:1 would be a good idea. I'm looking at the code now and I think I need a bit of guidance on the implementation. How would you like the meeting set up? |
0f89d05 to
8e0de57
Compare
allenss-amazon
left a comment
There was a problem hiding this comment.
Looks pretty good. Two things. 1/ I'd like to see a heap-oriented sort for cases where LIMIT is in effect and 2/ need some data oriented actual search cases.
src/commands/ft_search.cc
Outdated
| double num_a, num_b; | ||
| if (!absl::SimpleAtod(val_a, &num_a)) return false; | ||
| if (!absl::SimpleAtod(val_b, &num_b)) return true; | ||
| result = num_a < num_b; | ||
| } else { | ||
| result = val_a < val_b; | ||
| } |
There was a problem hiding this comment.
We want to have these kinds of data comparison centralized so that all of the various ways to compare data get the same answers. I'm not familiar with out SimpleAtoD deals with infinity and NaN, etc. Please take a look at the files in the src/expr/* directory.
src/query/search.h
Outdated
| int k{0}; | ||
| std::optional<unsigned> ef; | ||
| LimitParameter limit; | ||
| SortByParameter sortby; |
There was a problem hiding this comment.
I believe this belongs in "struct SearchCommand", not here. This parameter is private to the search command, not something that's shared across general queries. (Sorry the command naming here is legacy and therefore confusing).
There was a problem hiding this comment.
I tried doing this and I'm not sure how this would fit into SearchCommand since it needs to be located by the parser and I would have to change the implementation of ConstructSortByParser to take a SearchCommand &, which would propagate to all other parsers in ft.search.
It would also be inconsistent with other parsers, like ft.aggregate.
There was a problem hiding this comment.
Legacy issue. Originally, there wasn't a difference between SearchParameters and SearchCommand. When this object hierarchy was introduced it looks like the parsing logic for ft.search didn't get the update. Please update it, it shouldn't require much more than changing the parameters definitions on a few functions in the dedicate ft.search parsing logic.
| .sortby_parameters_str = "SORTBY attribute_identifier_2", | ||
| .sortby_field = "attribute_identifier_2", | ||
| .sortby_order = query::SortOrder::kAscending, | ||
| .sortby_enabled = true, |
There was a problem hiding this comment.
A parser test for a field NOT in the schema might be useful
There was a problem hiding this comment.
I added a test where the attribute is not found and found that I actually wasn't testing for this case so that's in here too now.
| "embedding", | ||
| "$embedding", | ||
| "SORTBY", | ||
| "vector", |
There was a problem hiding this comment.
This needs some data oriented tests. In theory, the compatibility framework that's under integration/compatibility would be the best place to do this. It used to work for ft.search commands, but I see that those tests have been commented out, which leads me to think that it's not working for those commands. But fixing it should be relatively easy.
There was a problem hiding this comment.
I think the latest version should work with ft.search commands. I fixed some bugs and I've been using it.
There was a problem hiding this comment.
This would be the place to test numeric conversion errors.
021dec9 to
d758c86
Compare
7349606 to
bff42e1
Compare
allenss-amazon
left a comment
There was a problem hiding this comment.
There seem to be several changes in the PR that are either just formatting or perhaps some temporary changes for debugging purposes. Let's get those cleaned up. Otherwise, the basic core functionality looks correct to me.
Also, I'd definitely want to see some test cases in the compatibility test suite
| data_set = do_answer(answers[i], data_set) | ||
| data_set = do_answer(self.server.get_new_client(), answers[i], data_set) | ||
|
|
||
| print(f"Correct answers: {correct_answers} out of {len(answers)}") |
There was a problem hiding this comment.
Are changes here intentional? If so, can you explain?
There was a problem hiding this comment.
This is the part of compatibility tests that is commented out. I am looking into writing those numeric tests that you asked for in this file, but I have been having some trouble with the build environment for a few days.
There was a problem hiding this comment.
I reverted this change in the latest commit
src/commands/ft_search.cc
Outdated
| // Support non-vector queries | ||
| if (IsNonVectorQuery()) { | ||
| query::ProcessNonVectorNeighborsForReply( | ||
| ctx, index_schema->GetAttributeDataType(), neighbors, *this); | ||
| ApplySorting(neighbors, *this); | ||
| SerializeNonVectorNeighbors(ctx, search_result, *this); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Why restrict this to non-vector?
There was a problem hiding this comment.
I didn't catch this when resolving a merge conflict. It was supposed to be a single added line to the if statement above. I'll go over the entire change again to make sure nothing else slipped through.
src/commands/ft_search_parser.cc
Outdated
| << "Error parsing vector similarity parameters: "; | ||
| } | ||
|
|
||
| if (auto searchCommand = dynamic_cast<SearchCommand *>(¶meters); |
There was a problem hiding this comment.
Anytime I see a dynamic_cast, I wonder if this isn't a band-aid on some incorrect object structure. Here, it looks like the preparse and postparse functions should be virtual, that would eliminate the need for the dynamic cast.
src/commands/ft_search_parser.h
Outdated
| struct SortByParameter { | ||
| std::string field; | ||
| SortOrder order{SortOrder::kAscending}; | ||
| bool enabled{false}; |
There was a problem hiding this comment.
Rather than an explicit enabled flag, perhaps this should be std::optional ?
| .k = 0, | ||
| .ef = 0, | ||
| .score_as = "", | ||
| .vector_query = false, | ||
| .sortby_parameters_str = "SORTBY attribute_identifier_1 DESC", | ||
| .sortby_field = "attribute_identifier_1", | ||
| .sortby_order = SortOrder::kDescending, | ||
| .sortby_enabled = true, | ||
| }, |
There was a problem hiding this comment.
Let's tweak some of the keywords with one or more lower case letters to ensure that we're getting the case-invariant matching correct.
| "embedding", | ||
| "$embedding", | ||
| "SORTBY", | ||
| "vector", |
There was a problem hiding this comment.
This would be the place to test numeric conversion errors.
edb271a to
cd56346
Compare
3a35723 to
645bb26
Compare
2694c9b to
c2c2491
Compare
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
8defc22 to
62f13e1
Compare
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
…valkey-search into feature/sortby-implementation
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
4967cc2 to
20106f2
Compare
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Implementation of `SORTBY` on `FT.SEARCH` Please check the conversation in Bit-Quill#1 first --------- Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
|
Closing since this is now merged into the official repo valkey-io#618 |
Implementation of `SORTBY` on `FT.SEARCH` Please check the conversation in Bit-Quill#1 first --------- Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Resolves: valkey-io#703 Encountered the following series of crashes during engine startup. 1. SIGSEGV in `string2ll_resolver` — IFUNC resolver called before TSAN runtime initializes ``` Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6de0fb0 in ?? () #0 0x00007ffff6de0fb0 in ?? () #1 0x00000000004b89a7 in string2ll_resolver.lto_priv () #2 0x00007ffff7de4d94 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7ddceaa in dl_main () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df0b5f in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7ddad2e in _dl_start () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff7dd9ef8 in _start () from /lib64/ld-linux-x86-64.so.2 valkey-io#7 0x0000000000000003 in ?? () valkey-io#8 0x00007fffffffe04c in ?? () valkey-io#9 0x00007fffffffe0b6 in ?? () valkey-io#10 0x00007fffffffe0bd in ?? () valkey-io#11 0x0000000000000000 in ?? () ``` 2. TSAN abort on `RTLD_DEEPBIND` — module loader uses `RTLD_DEEPBIND` which is incompatible with TSAN ``` cat /workplace/baswanth/tsan/valkey-search/testing/integration/.build-release-tsan/output/6379_stdout.txt 2>/dev/null | tail -20 45443:M 25 Feb 2026 01:57:58.222 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 45443:M 25 Feb 2026 01:57:58.222 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 45443:M 25 Feb 2026 01:57:58.222 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=45443, just started 45443:M 25 Feb 2026 01:57:58.222 * Configuration loaded 45443:M 25 Feb 2026 01:57:58.224 * monotonic clock: POSIX clock_gettime 45443:M 25 Feb 2026 01:57:58.226 * Running mode=cluster, port=6379. 45443:M 25 Feb 2026 01:57:58.226 * No cluster configuration found, I'm a13c403e8a798f9ad051c654ff540ce1b3833b9d ==45443==You are trying to dlopen a /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so shared library with RTLD_DEEPBIND flag which is incompatible with sanitizer runtime (see google/sanitizers#611 for details). If you want to run /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags. ``` 3. `CHECK failed: sanitizer_deadlock_detector.h:67` — TSAN deadlock detector overflows during module load ``` 7826:M 25 Feb 2026 02:39:45.284 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=17826, just started 17826:M 25 Feb 2026 02:39:45.284 * Configuration loaded 17826:M 25 Feb 2026 02:39:45.286 * monotonic clock: POSIX clock_gettime 17826:M 25 Feb 2026 02:39:45.288 * Running mode=cluster, port=6379. 17826:M 25 Feb 2026 02:39:45.289 * No cluster configuration found, I'm 134edf06efe1a2e084ce9c94a43dab67e8376dfc 17826:M 25 Feb 2026 02:39:45.298 . <module> Setting shards to 524287 ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=17826) ``` core changes that needs to be added. ``` cd /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-server && git diff src/util.c src/module.c src/config.h diff --git a/src/config.h b/src/config.h index 0c6b111a9..f80fa8cbc 100644 --- a/src/config.h +++ b/src/config.h @@ -172,6 +172,10 @@ #define VALKEY_NO_SANITIZE(sanitizer) #endif +#if defined(__SANITIZE_THREAD__) +#define VALKEY_THREAD_SANITIZER 1 +#endif + #if defined(__SANITIZE_ADDRESS__) /* GCC */ #define VALKEY_ADDRESS_SANITIZER 1 diff --git a/src/module.c b/src/module.c index 8d1bad4fe..0435499a8 100644 --- a/src/module.c +++ b/src/module.c @@ -12545,7 +12545,7 @@ int moduleLoad(const char *path, void **module_argv, int module_argc, int is_loa } int dlopen_flags = RTLD_NOW | RTLD_LOCAL; -#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && __has_include(<dlfcn.h>) +#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && !defined(VALKEY_THREAD_SANITIZER) && __has_include(<dlfcn.h>) /* RTLD_DEEPBIND, which is required for loading modules that contains the * same symbols, does not work with ASAN. Therefore, we exclude * RTLD_DEEPBIND when doing test builds with ASAN. diff --git a/src/util.c b/src/util.c index 0631736c8..cf5a073b1 100644 --- a/src/util.c +++ b/src/util.c @@ -622,7 +622,7 @@ static int string2llScalar(const char *s, size_t slen, long long *value) { } #if HAVE_IFUNC && HAVE_X86_SIMD -__attribute__((no_sanitize_address, used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { +__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { /* Ifunc resolvers run before ASan initialization and before CPU detection * is initialized, so disable ASan and init CPU detection here. */ __builtin_cpu_init(); ``` --------- Signed-off-by: Baswanth Vegunta <baswanth@amazon.com> Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
Resolves: valkey-io#703 Encountered the following series of crashes during engine startup. 1. SIGSEGV in `string2ll_resolver` — IFUNC resolver called before TSAN runtime initializes ``` Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6de0fb0 in ?? () #0 0x00007ffff6de0fb0 in ?? () #1 0x00000000004b89a7 in string2ll_resolver.lto_priv () #2 0x00007ffff7de4d94 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7ddceaa in dl_main () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df0b5f in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7ddad2e in _dl_start () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff7dd9ef8 in _start () from /lib64/ld-linux-x86-64.so.2 valkey-io#7 0x0000000000000003 in ?? () valkey-io#8 0x00007fffffffe04c in ?? () valkey-io#9 0x00007fffffffe0b6 in ?? () valkey-io#10 0x00007fffffffe0bd in ?? () valkey-io#11 0x0000000000000000 in ?? () ``` 2. TSAN abort on `RTLD_DEEPBIND` — module loader uses `RTLD_DEEPBIND` which is incompatible with TSAN ``` cat /workplace/baswanth/tsan/valkey-search/testing/integration/.build-release-tsan/output/6379_stdout.txt 2>/dev/null | tail -20 45443:M 25 Feb 2026 01:57:58.222 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 45443:M 25 Feb 2026 01:57:58.222 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 45443:M 25 Feb 2026 01:57:58.222 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=45443, just started 45443:M 25 Feb 2026 01:57:58.222 * Configuration loaded 45443:M 25 Feb 2026 01:57:58.224 * monotonic clock: POSIX clock_gettime 45443:M 25 Feb 2026 01:57:58.226 * Running mode=cluster, port=6379. 45443:M 25 Feb 2026 01:57:58.226 * No cluster configuration found, I'm a13c403e8a798f9ad051c654ff540ce1b3833b9d ==45443==You are trying to dlopen a /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so shared library with RTLD_DEEPBIND flag which is incompatible with sanitizer runtime (see google/sanitizers#611 for details). If you want to run /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags. ``` 3. `CHECK failed: sanitizer_deadlock_detector.h:67` — TSAN deadlock detector overflows during module load ``` 7826:M 25 Feb 2026 02:39:45.284 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=17826, just started 17826:M 25 Feb 2026 02:39:45.284 * Configuration loaded 17826:M 25 Feb 2026 02:39:45.286 * monotonic clock: POSIX clock_gettime 17826:M 25 Feb 2026 02:39:45.288 * Running mode=cluster, port=6379. 17826:M 25 Feb 2026 02:39:45.289 * No cluster configuration found, I'm 134edf06efe1a2e084ce9c94a43dab67e8376dfc 17826:M 25 Feb 2026 02:39:45.298 . <module> Setting shards to 524287 ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=17826) ``` core changes that needs to be added. ``` cd /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-server && git diff src/util.c src/module.c src/config.h diff --git a/src/config.h b/src/config.h index 0c6b111a9..f80fa8cbc 100644 --- a/src/config.h +++ b/src/config.h @@ -172,6 +172,10 @@ #define VALKEY_NO_SANITIZE(sanitizer) #endif +#if defined(__SANITIZE_THREAD__) +#define VALKEY_THREAD_SANITIZER 1 +#endif + #if defined(__SANITIZE_ADDRESS__) /* GCC */ #define VALKEY_ADDRESS_SANITIZER 1 diff --git a/src/module.c b/src/module.c index 8d1bad4fe..0435499a8 100644 --- a/src/module.c +++ b/src/module.c @@ -12545,7 +12545,7 @@ int moduleLoad(const char *path, void **module_argv, int module_argc, int is_loa } int dlopen_flags = RTLD_NOW | RTLD_LOCAL; -#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && __has_include(<dlfcn.h>) +#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && !defined(VALKEY_THREAD_SANITIZER) && __has_include(<dlfcn.h>) /* RTLD_DEEPBIND, which is required for loading modules that contains the * same symbols, does not work with ASAN. Therefore, we exclude * RTLD_DEEPBIND when doing test builds with ASAN. diff --git a/src/util.c b/src/util.c index 0631736c8..cf5a073b1 100644 --- a/src/util.c +++ b/src/util.c @@ -622,7 +622,7 @@ static int string2llScalar(const char *s, size_t slen, long long *value) { } #if HAVE_IFUNC && HAVE_X86_SIMD -__attribute__((no_sanitize_address, used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { +__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { /* Ifunc resolvers run before ASan initialization and before CPU detection * is initialized, so disable ASan and init CPU detection here. */ __builtin_cpu_init(); ``` --------- Signed-off-by: Baswanth Vegunta <baswanth@amazon.com> Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
No description provided.