Skip to content

[Pull-based Ingestion] Support segment replication for pull-based ingestion#17359

Merged
mch2 merged 4 commits intoopensearch-project:mainfrom
varunbharadwaj:vb/segrep
Feb 27, 2025
Merged

[Pull-based Ingestion] Support segment replication for pull-based ingestion#17359
mch2 merged 4 commits intoopensearch-project:mainfrom
varunbharadwaj:vb/segrep

Conversation

@varunbharadwaj
Copy link
Copy Markdown
Contributor

@varunbharadwaj varunbharadwaj commented Feb 14, 2025

Description

This PR is a follow up for pull-based-ingestion to support segment replication with remote store. The primary shard will ingest from the streaming source and replica shards will rely on segment replication.

This PR refactors IngestionEngine to inherit from InternalEngine to support replication, recovery and avoid duplicate code. Some of the changes required to support segRep and peer recovery are enhancing IngestionEngine to include required listeners, support working with NRTReplicationEngine, tracking latest index commits, prevent snapshotted index deletion, among many others. These changes are already available in InternalEngine, and can be reused by IngestionEngine after this change.

Integration tests are added to validate end-to-end pull-based ingestion with segment replication, peer recover, replica promotion and remote store.

Related Issues

Resolves #16929

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Feb 14, 2025
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 7a682ef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for fae7e91: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 22464ed: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some build failures here. FYI you should be able to run ./gradlew precommit locally to find these before pushing your commit.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for f56f90f: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 26, 2025

Codecov Report

Attention: Patch coverage is 83.01887% with 9 lines in your changes missing coverage. Please review.

Project coverage is 72.53%. Comparing base (0ffed5e) to head (c5d7445).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...opensearch/index/translog/NoOpTranslogManager.java 73.33% 3 Missing and 1 partial ⚠️
...a/org/opensearch/index/engine/IngestionEngine.java 86.36% 1 Missing and 2 partials ⚠️
.../indices/pollingingest/IngestionEngineFactory.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17359      +/-   ##
============================================
+ Coverage     72.42%   72.53%   +0.11%     
- Complexity    65611    65675      +64     
============================================
  Files          5304     5304              
  Lines        304743   304464     -279     
  Branches      44189    44145      -44     
============================================
+ Hits         220701   220840     +139     
+ Misses        65888    65495     -393     
+ Partials      18154    18129      -25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for bb213a7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…ecovery

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for c5d7445: SUCCESS

@yupeng9
Copy link
Copy Markdown
Contributor

yupeng9 commented Feb 27, 2025

LGTM

Copy link
Copy Markdown
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty reasonable to me as a first step. In the long run, I think I'd like to extract an abstract base class for both InternalEngine and IngestionEngine, but I think that belongs in another PR, since this one's already big enough.

Thanks, @varunbharadwaj!

@varunbharadwaj
Copy link
Copy Markdown
Contributor Author

This looks pretty reasonable to me as a first step. In the long run, I think I'd like to extract an abstract base class for both InternalEngine and IngestionEngine, but I think that belongs in another PR, since this one's already big enough.

Thanks, @varunbharadwaj!

Thanks for reviewing. Yeah, will create a follow up PR for the refactoring.

Copy link
Copy Markdown
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, lgtm!

@mch2 mch2 merged commit 415abb9 into opensearch-project:main Feb 27, 2025
38 of 39 checks passed
@gbbafna
Copy link
Copy Markdown
Contributor

gbbafna commented Mar 4, 2025

@varunbharadwaj , @yupeng9 : Are we only supporting only segrep with pull based ingestion ?

@varunbharadwaj
Copy link
Copy Markdown
Contributor Author

@varunbharadwaj , @yupeng9 : Are we only supporting only segrep with pull based ingestion ?

Yes, the plan is to start with segrep. We will probably explore multi-node ingestion in the future, but is still in discussion.

vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Mar 18, 2025
…estion (opensearch-project#17359)

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing skip-changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] A new IngestionEngine that can pull data from streaming sources.

7 participants