Skip to content

Upgrade lucene to version 10.2.1#17961

Merged
mch2 merged 14 commits intoopensearch-project:mainfrom
expani:lucene_10_2_0_upgrade
May 19, 2025
Merged

Upgrade lucene to version 10.2.1#17961
mch2 merged 14 commits intoopensearch-project:mainfrom
expani:lucene_10_2_0_upgrade

Conversation

@expani
Copy link
Copy Markdown
Contributor

@expani expani commented Apr 16, 2025

Description

Upgrading to Lucene 10.2.1
https://lucene.apache.org/core/10_2_1/changes/Changes.html

Performance Testing Areas

  • Snapshot generation and testing search heavy workloads like Big5 via multiple runs
  • Ensuring no new regressions seen in indexing like force merge time seen with Lucene 10.1.0

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 5e74113: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 93c4e0c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 9a34fb8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 84276e4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harshavamsi
Copy link
Copy Markdown
Contributor

@expani
Copy link
Copy Markdown
Contributor Author

expani commented Apr 16, 2025

@harshavamsi I was thinking if you can merge the constant scorer change with some context as to why it helps. I can rebase it once merged and fix Lucene 10.2.0 upgrade stuff.

I want to focus on test failures in this PR. Like this one

REPRODUCE WITH: ./gradlew ':plugins:analysis-icu:test' --tests "org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile" -Dtests.seed=5147BC3990E890F4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=kw -Dtests.timezone=Pacific/Gambier -Druntime.java=21

IcuTokenizerFactoryTests > testIcuCustomizeRuleFile FAILED
    java.lang.ExceptionInInitializerError
        at __randomizedtesting.SeedInfo.seed([5147BC3990E890F4:91E71A7FBEA22BE7]:0)
        at org.opensearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:541)
        at org.opensearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:338)
        at org.opensearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:241)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1767)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1755)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.createTestAnalysis(IcuTokenizerFactoryTests.java:129)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile(IcuTokenizerFactoryTests.java:67)

        Caused by:
        com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
            at app//com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:506)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:354)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:347)
            at app//com.ibm.icu.impl.SoftCache.getInstance(SoftCache.java:69)
            at app//com.ibm.icu.impl.Norm2AllModes.getInstance(Norm2AllModes.java:344)
            at app//com.ibm.icu.text.Normalizer2.getInstance(Normalizer2.java:219)
            at app//org.opensearch.index.analysis.IcuFoldingTokenFilterFactory.<clinit>(IcuFoldingTokenFilterFactory.java:57)
            ... 7 more

            Caused by:
            java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
                at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:606)
                at com.ibm.icu.impl.ICUBinary.readHeaderAndDataVersion(ICUBinary.java:557)
                at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:453)
                ... 13 more

@harshavamsi
Copy link
Copy Markdown
Contributor

@harshavamsi I was thinking if you can merge the constant scorer change with some context as to why it helps. I can rebase it once merged and fix Lucene 10.2.0 upgrade stuff.

I want to focus on test failures in this PR. Like this one

REPRODUCE WITH: ./gradlew ':plugins:analysis-icu:test' --tests "org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile" -Dtests.seed=5147BC3990E890F4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=kw -Dtests.timezone=Pacific/Gambier -Druntime.java=21

IcuTokenizerFactoryTests > testIcuCustomizeRuleFile FAILED
    java.lang.ExceptionInInitializerError
        at __randomizedtesting.SeedInfo.seed([5147BC3990E890F4:91E71A7FBEA22BE7]:0)
        at org.opensearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:541)
        at org.opensearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:338)
        at org.opensearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:241)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1767)
        at org.opensearch.test.OpenSearchTestCase.createTestAnalysis(OpenSearchTestCase.java:1755)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.createTestAnalysis(IcuTokenizerFactoryTests.java:129)
        at org.opensearch.index.analysis.IcuTokenizerFactoryTests.testIcuCustomizeRuleFile(IcuTokenizerFactoryTests.java:67)

        Caused by:
        com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
            at app//com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:506)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:354)
            at app//com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:347)
            at app//com.ibm.icu.impl.SoftCache.getInstance(SoftCache.java:69)
            at app//com.ibm.icu.impl.Norm2AllModes.getInstance(Norm2AllModes.java:344)
            at app//com.ibm.icu.text.Normalizer2.getInstance(Normalizer2.java:219)
            at app//org.opensearch.index.analysis.IcuFoldingTokenFilterFactory.<clinit>(IcuFoldingTokenFilterFactory.java:57)
            ... 7 more

            Caused by:
            java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 5.0.0.0
                at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:606)
                at com.ibm.icu.impl.ICUBinary.readHeaderAndDataVersion(ICUBinary.java:557)
                at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:453)
                ... 13 more

sounds good, i'll add some context

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 48d60bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@expani
Copy link
Copy Markdown
Contributor Author

expani commented Apr 16, 2025

Failure due to known flaky test #15806 with different seeds

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=A77C55DF82AC524C -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ann -Dtests.timezone=W-SU -Druntime.java=21

Tried the same with current mainline and it fails as well. So, don't think it's related to Lucene 10.2.0 upgrade.

expani and others added 9 commits May 15, 2025 12:41
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
…h current ConstantScoreSupplier

Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
@andrross andrross force-pushed the lucene_10_2_0_upgrade branch from d7b39c5 to 9d8ce0b Compare May 15, 2025 19:42
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 9d8ce0b: SUCCESS

@andrross
Copy link
Copy Markdown
Member

@expani What do you think? Should we merge this?

@getsaurabh02
Copy link
Copy Markdown
Member

getsaurabh02 commented May 19, 2025

@andrross @expani @harshavamsi Can we please merge this and follwup on the regressions separately, with a plan for 3.1

@asimmahmood1
Copy link
Copy Markdown
Contributor

@andrross @expani is OOO, I'm ok the merge this. I can help track any of the performance regression with goal to get fixed by 3.1 release.

@andrross
Copy link
Copy Markdown
Member

I've got some commits in this PR, so I'd like to get another maintainer review. @mch2 @msfroh Can one of you take a look?

@mch2 mch2 merged commit 370cd8c into opensearch-project:main May 19, 2025
29 of 30 checks passed
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Jun 1, 2025
* Upgrade lucene to version 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* Removed usage of non public constructor for DocIdSetBuilder

Signed-off-by: expani <anijainc@amazon.com>

* Increment version and fixed another compilation error

Signed-off-by: expani <anijainc@amazon.com>

* Updating license sha for lucene 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* Upgraded icu4j in conjunction with Lucene 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* update sha for icu4j

Signed-off-by: expani <anijainc@amazon.com>

* Update to 10.2.1

Signed-off-by: Andrew Ross <andrross@amazon.com>

* Add changelog entry

Signed-off-by: Andrew Ross <andrross@amazon.com>

* Updated test based on Lucene-opensearch-project#14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Delegating nextDoc to advance as previous assumption doesn't hold with current ConstantScoreSupplier

Signed-off-by: expani <anijainc@amazon.com>

* Implemented cost function

Signed-off-by: expani <anijainc@amazon.com>

---------

Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
neuenfeldttj added a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
* Upgrade lucene to version 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* Removed usage of non public constructor for DocIdSetBuilder

Signed-off-by: expani <anijainc@amazon.com>

* Increment version and fixed another compilation error

Signed-off-by: expani <anijainc@amazon.com>

* Updating license sha for lucene 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* Upgraded icu4j in conjunction with Lucene 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* update sha for icu4j

Signed-off-by: expani <anijainc@amazon.com>

* Update to 10.2.1

Signed-off-by: Andrew Ross <andrross@amazon.com>

* Add changelog entry

Signed-off-by: Andrew Ross <andrross@amazon.com>

* Updated test based on Lucene-opensearch-project#14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Delegating nextDoc to advance as previous assumption doesn't hold with current ConstantScoreSupplier

Signed-off-by: expani <anijainc@amazon.com>

* Implemented cost function

Signed-off-by: expani <anijainc@amazon.com>

---------

Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>Signed-off-by: TJ Neuenfeldt <tjneu@amazon.com>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
* Upgrade lucene to version 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* Removed usage of non public constructor for DocIdSetBuilder

Signed-off-by: expani <anijainc@amazon.com>

* Increment version and fixed another compilation error

Signed-off-by: expani <anijainc@amazon.com>

* Updating license sha for lucene 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* Upgraded icu4j in conjunction with Lucene 10.2.0

Signed-off-by: expani <anijainc@amazon.com>

* update sha for icu4j

Signed-off-by: expani <anijainc@amazon.com>

* Update to 10.2.1

Signed-off-by: Andrew Ross <andrross@amazon.com>

* Add changelog entry

Signed-off-by: Andrew Ross <andrross@amazon.com>

* Updated test based on Lucene-opensearch-project#14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Updated test based on Lucene-14561

Signed-off-by: expani <anijainc@amazon.com>

* Delegating nextDoc to advance as previous assumption doesn't hold with current ConstantScoreSupplier

Signed-off-by: expani <anijainc@amazon.com>

* Implemented cost function

Signed-off-by: expani <anijainc@amazon.com>

---------

Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.