Skip to content

Implementation for match_only_text field#11039

Merged
andrross merged 24 commits intoopensearch-project:mainfrom
rishabhmaurya:rishma-match-only-field
Jan 3, 2024
Merged

Implementation for match_only_text field#11039
andrross merged 24 commits intoopensearch-project:mainfrom
rishabhmaurya:rishma-match-only-field

Conversation

@rishabhmaurya
Copy link
Copy Markdown
Contributor

@rishabhmaurya rishabhmaurya commented Oct 31, 2023

Description

  • Implementation of match_only_text field with index_options fixed to docs, norms disabled to optimize on storage.
  • Supports query types with added latency - Phrase, Prefix, MultiPhrase & MultiPhrasePrefix.
  • It supports all features same as text field type except following -
Features not supported -
  • Interval Queries
  • Span queries
  • Queries on index time phrase and index prefix fields.
  • Aggregation queries - just like text field isn't meant for aggregation queries.
  • Disabling _sourcefield will disable any positional query support (Note: instead of not allowing disabling _source field, I have decided to throw exceptions when positional queries would be executed for users who don't have use case of positional queries and want to save on space).
  • update to index_options to any value other than docs isn't allowed.
  • Scoring, boost may not work as expected, so this field type isn't meant to be used for relevancy purposes.

Best used for -

  • In log analytics, where the goal is to search for log entries based on specific keywords or error codes or exception type in error message, the focus may be on retrieving relevant logs rather than analyzing the frequency or position of terms within each log entry.
  • In a job portal, users often search for jobs based on job titles, skills, or locations. The frequency and position of terms within job descriptions may be less significant for these searches.
  • In a news application, users might search for articles based on keywords in headlines or summaries. The focus is on finding relevant news articles rather than analyzing the frequency or position of terms within each article.

Migration from text field

reindex api can be used to migrate the index from text to match_only_text field and viceversa.

Implementation breakdown -

  1. New MatchOnlyTextFieldMapper which extends TextFieldMapper with few constraints and different defaults
  2. New FieldType for MatchOnlyTextFieldMapper which adds constraints specific to match_only_text field and supports positional queries by reading value of the match_only_text field from the _source field.
  3. SourceFieldMatchQuery - query which accepts a delegate query, to apply various filters to prune the resultset, and a source filter query. It loads the _source for each hit from result of delegate query, creates a single doc Lucene MemoryIndex for each hit and runs source filter query against it. It uses ConstantScoreWeight to disable scoring.

Testing done

  1. Unit tests for Mapper - asserting the OpenSearch DSL query to the SourceFieldMatchQuery created by the MatchOnlyTextFieldMapper. It also validates various unsupported cases.
  2. Unit tests for SourceFieldMatchQuery - it create a Lucene index and tries to check possible scenarios for SourceFieldMatchQuery like -
    1. docs matching both delegate and filter query
    2. docs matching only delegate query
    3. docs matching only filter query
    4. docs matching neither delegate nor filter query
    5. Expected behavior when source field is disabled
    6. Expected behavior when there is a missing field.
  3. Created clone of various integration tests where text field was getting used by replacing text with match_only_text field. Overridden the expected behavior wherever necessary.

TODOs

  1. Support for interval queries?
  2. Add support for match_only_field type on dashboard
  3. Public documentation - [DOC] match_only_text handler  documentation-website#5427
  4. Performance testing -
  5. Cache logic validation

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

#6836
opensearch-project/documentation-website#5427

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch from e437959 to 0f5485c Compare October 31, 2023 22:07
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 31, 2023

Compatibility status:

Checks if related components are compatible with change f1fb443

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Nov 1, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ccab297: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for a9d1f6d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 05d96bd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6fdb8da: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch from 6fdb8da to 2fdc207 Compare November 16, 2023 01:27
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2fdc207: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e213f84: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c20df4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for c6ce7f9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e2c1886: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch 2 times, most recently from f9fc10f to b765a36 Compare November 17, 2023 20:47
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for f9fc10f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for b765a36: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 40e205e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…teg tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch from 2bdec4b to e879baf Compare January 2, 2024 20:00
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 2, 2024

✅ Gradle check result for e879baf: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 2, 2024

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (63f4f13) 71.47% compared to head (f1fb443) 71.40%.
Report is 1 commits behind head on main.

Files Patch % Lines
.../opensearch/index/query/SourceFieldMatchQuery.java 67.34% 8 Missing and 8 partials ⚠️
...nsearch/index/mapper/MatchOnlyTextFieldMapper.java 93.54% 4 Missing and 4 partials ⚠️
...a/org/opensearch/index/search/MultiMatchQuery.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11039      +/-   ##
============================================
- Coverage     71.47%   71.40%   -0.07%     
+ Complexity    59253    59240      -13     
============================================
  Files          4907     4909       +2     
  Lines        278248   278426     +178     
  Branches      40428    40460      +32     
============================================
- Hits         198871   198817      -54     
- Misses        62850    63093     +243     
+ Partials      16527    16516      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one changelog entry nitpick

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 2, 2024

❕ Gradle check result for f1fb443: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@andrross andrross added the backport 2.x Backport to 2.x branch label Jan 3, 2024
@andrross andrross merged commit 7b1c2c7 into opensearch-project:main Jan 3, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 3, 2024
* Implementation for match_only_text field

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix build failures

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix bugs

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added mapper tests, stil failing on prefix and phrase tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Disable index prefix and phrase mapper

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added unit tests for phrase and multiphrase query validation

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit tests for prefix and prefix phrase queries

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add a test to cover 3 word with synonym match phrase prefix query

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit test for SourceFieldMatchQuery

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added test for _source disabled case

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit test for missing field

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* more validation tests and changelog update

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added integration tests for match_only_text replicating text field integ tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added skip section in integ test to fix mixed cluster failures

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* remove unused import

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Address PR comments

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* fix integ tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix flaky test due to random indexwriter

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* pr comment: header modification

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Address PR comments

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* addded change to the right section of CHANGELOG

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* overriding the textFieldType before every test

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* rename @before method

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* update changelog description

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

---------

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
(cherry picked from commit 7b1c2c7)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Jan 3, 2024
* Implementation for match_only_text field



* Fix build failures



* Fix bugs



* Added mapper tests, stil failing on prefix and phrase tests



* Disable index prefix and phrase mapper



* Added unit tests for phrase and multiphrase query validation



* Add unit tests for prefix and prefix phrase queries



* Add a test to cover 3 word with synonym match phrase prefix query



* Add unit test for SourceFieldMatchQuery



* Added test for _source disabled case



* Add unit test for missing field



* more validation tests and changelog update



* Added integration tests for match_only_text replicating text field integ tests



* Added skip section in integ test to fix mixed cluster failures



* remove unused import



* Address PR comments



* fix integ tests



* Fix flaky test due to random indexwriter



* pr comment: header modification



* Address PR comments



* addded change to the right section of CHANGELOG



* overriding the textFieldType before every test



* rename @before method



* update changelog description



---------


(cherry picked from commit 7b1c2c7)

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
* Implementation for match_only_text field

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix build failures

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix bugs

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added mapper tests, stil failing on prefix and phrase tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Disable index prefix and phrase mapper

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added unit tests for phrase and multiphrase query validation

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit tests for prefix and prefix phrase queries

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add a test to cover 3 word with synonym match phrase prefix query

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit test for SourceFieldMatchQuery

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added test for _source disabled case

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit test for missing field

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* more validation tests and changelog update

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added integration tests for match_only_text replicating text field integ tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added skip section in integ test to fix mixed cluster failures

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* remove unused import

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Address PR comments

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* fix integ tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix flaky test due to random indexwriter

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* pr comment: header modification

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Address PR comments

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* addded change to the right section of CHANGELOG

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* overriding the textFieldType before every test

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* rename @before method

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* update changelog description

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

---------

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Implementation for match_only_text field

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix build failures

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix bugs

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added mapper tests, stil failing on prefix and phrase tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Disable index prefix and phrase mapper

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added unit tests for phrase and multiphrase query validation

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit tests for prefix and prefix phrase queries

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add a test to cover 3 word with synonym match phrase prefix query

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit test for SourceFieldMatchQuery

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added test for _source disabled case

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Add unit test for missing field

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* more validation tests and changelog update

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added integration tests for match_only_text replicating text field integ tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Added skip section in integ test to fix mixed cluster failures

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* remove unused import

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Address PR comments

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* fix integ tests

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Fix flaky test due to random indexwriter

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* pr comment: header modification

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* Address PR comments

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* addded change to the right section of CHANGELOG

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* overriding the textFieldType before every test

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* rename @before method

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

* update changelog description

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>

---------

Signed-off-by: Rishabh Maurya <rishabhmaurya05@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Backport to 2.x branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants