Skip to content

Fix normalize json wildcard string-array values for TAG indexing#924

Open
bandalgomsu wants to merge 1 commit intovalkey-io:mainfrom
bandalgomsu:gh-907
Open

Fix normalize json wildcard string-array values for TAG indexing#924
bandalgomsu wants to merge 1 commit intovalkey-io:mainfrom
bandalgomsu:gh-907

Conversation

@bandalgomsu
Copy link
Copy Markdown
Contributor

Fix normalize wildcard string-array values for TAG indexing

For JSON wildcard paths, JSON.GET returns an array string (e.g. ["Seoul","New York"]).
The previous normalization incorrectly transformed this into Seoul","New York instead of a TAG-splittable format, causing lookup mismatches for values such as New York.

["Seoul","New York"]
Before: Seoul","New York
After: Seoul,New York

issue : #907

@bandalgomsu bandalgomsu changed the title Fix normalize wildcard string-array values for TAG indexing Fix normalize json wildcard string-array values for TAG indexing Mar 23, 2026
@bandalgomsu bandalgomsu force-pushed the gh-907 branch 5 times, most recently from 3f315e7 to b7ef702 Compare March 24, 2026 21:39

assert client.execute_command(
"JSON.SET", "user:1", "$",
'{"address":[{"city":"Seoul"},{"city":"New York"}]}'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a few more data values. Especially the harder cases. Like an empty string and some duplicate values.

return absl::NotFoundError("Invalid record");
}
if (absl::ConsumePrefix(&record, "[")) {
absl::ConsumeSuffix(&record, "]");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routine is shared with the other data types: numeric, string and vector. Either we need additional testing to make sure that these changes don't cause a problem for those data types OR perhaps we should pass in the data type that it's being used for -- so that the conversions are handled on a per-type basis.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, to correctly handle string/tag types, I think we should implement a JSON array parser that iterates through the input.

if the data type is known in advance, we can apply a fast path (like previously method) for numeric/vector types.

alternatively, we can detect the type by checking the first character (e.g., '"' for string arrays) and choose the appropriate path.

what do you think about ? ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original GCP code only supported HASH, JSON was added later. That's why this routine has a single string output, because the ingestion engine is built around the idea of one attribute -> one string's worth of data.

IMO, we should either pass into this routine the data type OR break this up into virtual functions. In either case, there's type specific parsing behavior.

For text, only a single string should be supported. I believe that some special character handling may be required.
For numeric, only a single string or a single number should be supported. (Arguably, an array of numbers could be supported in the future).
For Tags, only a single string or an array of strings should be supported. In the array case, it should reformat the data into the same format that a HASH value of multiple tags would have -- this may require accessing the index definition to get the separator character.
For Vector, only an array should be supported.

Signed-off-by: Su Ko <rhtn1128@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants