Skip to content

Fix bug "synonym_graph filter fails with word_delimiter_graph when using whitespace or classic tokenizer in synonym_analyzer"#19248

Merged
andrross merged 8 commits intoopensearch-project:mainfrom
laminelam:feature/synonym_graph_bug_97
Mar 19, 2026
Merged

Fix bug "synonym_graph filter fails with word_delimiter_graph when using whitespace or classic tokenizer in synonym_analyzer"#19248
andrross merged 8 commits intoopensearch-project:mainfrom
laminelam:feature/synonym_graph_bug_97

Conversation

@laminelam
Copy link
Copy Markdown
Contributor

@laminelam laminelam commented Sep 7, 2025

This PR fixes the "synonym_graph filter fails with word_delimiter_graph when using whitespace or classic tokenizer in synonym_analyzer" bug

Investigated the issue and looks like there are 2 causes:

  • When building the analyzers, if one fails for some reason, it throws an exception and the process stops. So the customSynonymAnalyzer does not get instantiated.
  • On the other hand, if the customSynonymAnalyzer depends on another one that hasn't been built (and registred) yet the process fails too.

analysisRegistry.getAnalyzer(synonymAnalyzerName);

This is not enough because it only looks into the built in and pre built in analyzers. The one from settings are not there.

The solution is two-fold:

  • Fail safe instead of fail fast when building the analyzers.
  • Build the depending analyzers first.

Fail safe instead of fail fast when building the analyzers
Right now, if an analyzer fails for some reason the whole building process fails with an exception.

Build the depending analyzers first:
Synonym custom analyzers may depend on another analyzer that has to be built first.
The PR adds a logic to:

  • add option "order" attribute that defines precedence order between analyzers
  • add 'analyzersBuiltSoFar' to getChainAwareTokenFilterFactory to pass the already built analyzers needed by the one being built

Resolves #18037

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing labels Sep 7, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 7, 2025

❌ Gradle check result for 5c8fbe6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@laminelam
Copy link
Copy Markdown
Contributor Author

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 7, 2025

❌ Gradle check result for 267f48e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@gaobinlong gaobinlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DCO check is failed, please amend your commit with '-s' to include the sign off info, and change log is needed.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1e2895a

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 1e2895a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 739a656

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e73f993

@andrross andrross force-pushed the feature/synonym_graph_bug_97 branch 2 times, most recently from cebf2d0 to 3fcb384 Compare March 16, 2026 22:24
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3fcb384

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 3fcb384: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 3fcb384: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Lamine Idjeraoui and others added 8 commits March 18, 2026 16:13
…whitespace or classic tokenizer in synonym_analyzer" bug opensearch-project#18037

add 'analyzersBuiltSoFar' to getChainAwareTokenFilterFactory to build custom analyzers depending on other (already built) analyzers
The analyzers are built following the order of precedence specified in the settings

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
…rgs)

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
…g kahn's algorithm topological sort

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the feature/synonym_graph_bug_97 branch from 3fcb384 to 03518e0 Compare March 18, 2026 23:13
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 03518e0

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 03518e0: SUCCESS

@andrross andrross merged commit 9c29462 into opensearch-project:main Mar 19, 2026
33 of 44 checks passed
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Mar 20, 2026
Fix "synonym_graph filter fails with word_delimiter_graph when using
whitespace or classic tokenizer in synonym_analyzer" bug. Use
automatic dependency detection using kahn's algorithm topological
sort.

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Lamine Idjeraoui <lidjeraoui@apple.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Indexing Indexing, Bulk Indexing and anything related to indexing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] synonym_graph filter fails with word_delimiter_graph when using whitespace or classic tokenizer in synonym_analyzer – similar to #16263

5 participants