Fix normalize json wildcard string-array values for TAG indexing by bandalgomsu · Pull Request #924 · valkey-io/valkey-search

bandalgomsu · 2026-03-23T12:55:20Z

Fix normalize wildcard string-array values for TAG indexing

For JSON wildcard paths, JSON.GET returns an array string (e.g. ["Seoul","New York"]).
The previous normalization incorrectly transformed this into Seoul","New York instead of a TAG-splittable format, causing lookup mismatches for values such as New York.

["Seoul","New York"]
Before: Seoul","New York
After: Seoul,New York

issue : #907

allenss-amazon · 2026-03-29T22:02:35Z

integration/test_non_vector.py

+
+        assert client.execute_command(
+            "JSON.SET", "user:1", "$",
+            '{"address":[{"city":"Seoul"},{"city":"New York"}]}'


Let's add a few more data values. Especially the harder cases. Like an empty string and some duplicate values.

allenss-amazon · 2026-03-29T22:18:49Z

src/attribute_data_type.cc

    return absl::NotFoundError("Invalid record");
  }
  if (absl::ConsumePrefix(&record, "[")) {
-    absl::ConsumeSuffix(&record, "]");


This routine is shared with the other data types: numeric, string and vector. Either we need additional testing to make sure that these changes don't cause a problem for those data types OR perhaps we should pass in the data type that it's being used for -- so that the conversions are handled on a per-type basis.

on second thought, to correctly handle string/tag types, I think we should implement a JSON array parser that iterates through the input.

if the data type is known in advance, we can apply a fast path (like previously method) for numeric/vector types.

alternatively, we can detect the type by checking the first character (e.g., '"' for string arrays) and choose the appropriate path.

what do you think about ? ?

The original GCP code only supported HASH, JSON was added later. That's why this routine has a single string output, because the ingestion engine is built around the idea of one attribute -> one string's worth of data.

IMO, we should either pass into this routine the data type OR break this up into virtual functions. In either case, there's type specific parsing behavior.

For text, only a single string should be supported. I believe that some special character handling may be required.
For numeric, only a single string or a single number should be supported. (Arguably, an array of numbers could be supported in the future).
For Tags, only a single string or an array of strings should be supported. In the array case, it should reformat the data into the same format that a HASH value of multiple tags would have -- this may require accessing the index definition to get the separator character.
For Vector, only an array should be supported.

Signed-off-by: Su Ko <rhtn1128@gmail.com>

bandalgomsu changed the title ~~Fix normalize wildcard string-array values for TAG indexing~~ Fix normalize json wildcard string-array values for TAG indexing Mar 23, 2026

bandalgomsu force-pushed the gh-907 branch 5 times, most recently from 3f315e7 to b7ef702 Compare March 24, 2026 21:39

allenss-amazon reviewed Mar 29, 2026

View reviewed changes

Fix normalize wildcard string-array values for TAG indexing

73873a6

Signed-off-by: Su Ko <rhtn1128@gmail.com>

bandalgomsu force-pushed the gh-907 branch from b7ef702 to 73873a6 Compare April 5, 2026 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix normalize json wildcard string-array values for TAG indexing#924

Fix normalize json wildcard string-array values for TAG indexing#924
bandalgomsu wants to merge 1 commit intovalkey-io:mainfrom
bandalgomsu:gh-907

bandalgomsu commented Mar 23, 2026

Uh oh!

allenss-amazon Mar 29, 2026

Uh oh!

allenss-amazon Mar 29, 2026

Uh oh!

bandalgomsu Mar 31, 2026

Uh oh!

allenss-amazon Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bandalgomsu commented Mar 23, 2026

Uh oh!

allenss-amazon Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

bandalgomsu Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

allenss-amazon Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants