-
Notifications
You must be signed in to change notification settings - Fork 2k
[BUG]: properly escape _ and escape % rather than stripping for SQLite $contains filter
#4402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…QLite `$contains` filter
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
HammadB
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah. The python code explicitly handled this IIRC
I don't think it did unless PyPika had something built in.
|
Pretty sure this is making an API behavior change? It is currently covered by tests This could be a breaking change AFAICT and should be thought about + communicated. I think its worth thinking about if we want parity with distributed also. |
|
Proper Escaping of Special Characters in SQLite $contains Filters This PR fixes string filtering logic for SQLite collections by correctly escaping Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
Yes, it's technically a breaking change. We currently do not have parity between local and distributed in terms of handling |
…QLite `$contains` filter (chroma-core#4402) ## Description of changes I saw a `test_collection_sqlite` proptest failure in CI and was able to reproduce it locally by updating `rust/frontend/tests/test_collection.proptest-regressions`. SQLite has special handling for two characters in a `LIKE` query: `%` and `_`. We were previously stripping out `%` from user input and passing `_` through. However, because `_` acts as a wildcard, this can lead to unexpected results. After discussing with @Sicheng-Pan we decided to escape both characters, so users can provide `%` and `_` in their `$contains` query and it will work as expected (a character literal). ## Test plan _How are these changes tested?_ Added proptest case was previously failing. ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_ n/a
…hroma-core#4775) ## Description of changes chroma-core#4402 caused `test_filtering.py` to start flaking because the test assertions around `_` handling were not updated. ## Test plan _How are these changes tested?_ - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust
Description of changes
I saw a
test_collection_sqliteproptest failure in CI and was able to reproduce it locally by updatingrust/frontend/tests/test_collection.proptest-regressions.SQLite has special handling for two characters in a
LIKEquery:%and_. We were previously stripping out%from user input and passing_through. However, because_acts as a wildcard, this can lead to unexpected results.After discussing with @Sicheng-Pan we decided to escape both characters, so users can provide
%and_in their$containsquery and it will work as expected (a character literal).Test plan
How are these changes tested?
Added proptest case was previously failing.
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a