Use Lucene soft-deletes in peer recovery#30522
Conversation
We are targeting to replace translog by Lucene history in peer-recovery. In peer-recovery, the primary transfers a safe commit to replicas then sends subsequent operations after the local checkpoint of that safe commit. This requires the primary having all operations after the checkpoint of the current safe commit. This commit hardens the soft-deletes retention merge policy by taking safe-commit into account.
|
Pinging @elastic/es-distributed |
|
@elasticmachine test this please |
| assertEquals(DocWriteResponse.Result.DELETED, client().prepareDelete("index", "type", "2").get().getResult()); | ||
| if (IndexSettings.INDEX_SOFT_DELETES_SETTING.get(settings)) { | ||
| persistGlobalCheckpoint("index"); // Need to persist the global checkpoint for the soft-deletes retention MP. | ||
| forceMerge(); |
There was a problem hiding this comment.
why is this needed? can we randomly do it?
| if (primary.indexSettings().isSoftDeleteEnabled()) { | ||
| // Need to sync global checkpoint and create a new safe commit as the soft-deletes MergePolicy depends on it. | ||
| primary.flush(new FlushRequest()); | ||
| primary.forceMerge(new ForceMergeRequest().onlyExpungeDeletes(true)); |
There was a problem hiding this comment.
why is this needed? the force merge seems not necessary maybe do it randomly
There was a problem hiding this comment.
Yeah, I was wrong - a flush is enough. I pushed 9ae627c.
| private final LongSupplier globalCheckpointSupplier; | ||
| private int retentionLockCount; | ||
| private long checkpointOfSafeCommit; | ||
| private volatile long minRequiredSeqNoForRecovery; |
There was a problem hiding this comment.
just make all methods synchronized then we don't need to do the volatile thing here. the contention on these locks seems very low.
| */ | ||
| synchronized Releasable acquireRetentionLock() { | ||
| retentionLockCount++; | ||
| assert retentionLockCount > 0 : "Invalid number of retention locks [" + retentionLockCount + "]"; |
There was a problem hiding this comment.
if this is needed I think the assertion should happen before we mutate?
|
@simonw I've addressed your comments. Can you please take another look? Thank you! |
| // Package-level for testing | ||
| synchronized long getMinSeqNoToRetain() { | ||
| final long minSeqNoForQueryingChanges = globalCheckpointSupplier.getAsLong() + 1 - retentionOperations; | ||
| return Math.min(minRequiredSeqNoForRecovery, minSeqNoForQueryingChanges); |
There was a problem hiding this comment.
instead of adding the 1 on assignment which is done in 2 places should we do it here?
| // Deleting a doc causes its memory to be freed from the breaker | ||
| deleteDoc(primary, "_doc", "0"); | ||
| primary.sync(); // need to sync global checkpoint as the soft-deletes retention MergePolicy depends on it. | ||
| if (primary.indexSettings().isSoftDeleteEnabled()) { |
There was a problem hiding this comment.
not sure about this I think @bleskes needs to look at this
| persistGlobalCheckpoint("index"); // Need to persist the global checkpoint for the soft-deletes retention MP. | ||
| // Need to persist the global checkpoint and flush a new safe commit | ||
| // as the soft-deletes retention merge policy depends on the local-checkpoint of the safe commit. | ||
| persistGlobalCheckpoint("index"); |
There was a problem hiding this comment.
not sure about this I think @bleskes needs to look at this
There was a problem hiding this comment.
I don't follow the comment - @dnhatn can you please explain?
There was a problem hiding this comment.
I added a comment for this.
bleskes
left a comment
There was a problem hiding this comment.
I left some comments. I would also like to understand (and see a test) how this interact with ongoing merges that have started before the safe commit was updated and thus use old information (I think it's OK but we need to know for sure). This PR tries to guarantee that all operations above the commit's local checkpoint will be available in any future NRT reader - that may include the result of such a merge.
| private void updateMinRequiredSeqNoForRecovery() { | ||
| assert Thread.holdsLock(this) : Thread.currentThread().getName(); | ||
| if (retentionLockCount == 0) { | ||
| this.minRequiredSeqNoForRecovery = checkpointOfSafeCommit; |
There was a problem hiding this comment.
name this storeRecovery? I think it will help clarify that this is only used for store recovery.
| persistGlobalCheckpoint("index"); // Need to persist the global checkpoint for the soft-deletes retention MP. | ||
| // Need to persist the global checkpoint and flush a new safe commit | ||
| // as the soft-deletes retention merge policy depends on the local-checkpoint of the safe commit. | ||
| persistGlobalCheckpoint("index"); |
There was a problem hiding this comment.
I don't follow the comment - @dnhatn can you please explain?
| primary.sync(); // need to sync global checkpoint as the soft-deletes retention MergePolicy depends on it. | ||
| if (primary.indexSettings().isSoftDeleteEnabled()) { | ||
| // Need to persist the global checkpoint and flush a new safe commit | ||
| // as the soft-deletes retention merge policy depends on the local-checkpoint of the safe commit. |
There was a problem hiding this comment.
same here. Can you please explain more?
There was a problem hiding this comment.
I added a comment for this.
| * make sure that all operations after the local checkpoint of the safe commit are retained until the lock is released. | ||
| * This is a analogy to the translog's retention lock; see {@link Translog#acquireRetentionLock()} | ||
| */ | ||
| synchronized Releasable acquireRetentionLock() { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
@bleskes and @s1monw I've included the peer-recovery usage in this PR so that we can have the whole picture of this change. I will continue working on tests. |
|
Looks great IMO |
|
@bleskes this is still waiting on you |
bleskes
left a comment
There was a problem hiding this comment.
This looks great. My only concern is that we should randomize the usage of soft deletes in our standard testing. We don't plan to turn it on by default for 6.x and we should make sure everything works without it as well as with.
|
|
||
| private SoftDeletesPolicy newSoftDeletesPolicy() throws IOException { | ||
| final Map<String, String> commitUserData = store.readLastCommittedSegmentsInfo().userData; | ||
| final long lastSeqNoSeenByMergePolicy; |
There was a problem hiding this comment.
nit: rename to minRetainedSeqNo ?
| @Override | ||
| public Closeable acquireTranslogRetentionLock() { | ||
| return getTranslog().acquireRetentionLock(); | ||
| public Translog.Snapshot newTranslogSnapshotBetween(long minSeqNo, long maxSeqNo) throws IOException { |
There was a problem hiding this comment.
I think we can remove this as a follow up, in favor of readHistoryOperations ?
There was a problem hiding this comment.
Yes, we will remove this method after making primary-replica resync use Lucene.
| store = createStore(); | ||
| engine = createEngine(config(indexSettings, store, createTempDir(), newMergePolicy(), null, null, globalCheckpoint::get)); | ||
| assertThat(engine.getMinRetainedSeqNo(), equalTo(0L)); | ||
| long lastSeqNoSeenByMP = engine.getMinRetainedSeqNo(); |
@bleskes Once this PR is in, I will make a follow-up to randomize the soft-deletes setting? Are you okay? |
Of course. |
|
@elasticmachine test this please |
This commit adds Lucene soft-deletes as another source for peer-recovery besides translog. Relates #29530
Today a file-based recovery will replay all existing translog operations from the primary on a replica so that that replica can have a full history in translog as the primary. However, with soft-deletes enabled, we should not do it because: 1. All operations before the local checkpoint of the safe commit exist in the commit already. 2. The number of operations before the local checkpoint may be considerable and requires a significant amount of time to replay on a replica. Relates elastic#30522 Relates elastic#29530
…es (#33190) Today a file-based recovery will replay all existing translog operations from the primary on a replica so that that replica can have a full history in translog as the primary. However, with soft-deletes enabled, we should not do it because: 1. All operations before the local checkpoint of the safe commit exist in the commit already. 2. The number of operations before the local checkpoint may be considerable and requires a significant amount of time to replay on a replica. Relates #30522 Relates #29530
…es (#33190) Today a file-based recovery will replay all existing translog operations from the primary on a replica so that that replica can have a full history in translog as the primary. However, with soft-deletes enabled, we should not do it because: 1. All operations before the local checkpoint of the safe commit exist in the commit already. 2. The number of operations before the local checkpoint may be considerable and requires a significant amount of time to replay on a replica. Relates #30522 Relates #29530
This PR integrates Lucene soft-deletes (LUCENE-8200) into Elasticsearch. Highlight works in this PR include: 1. Replace hard-deletes by soft-deletes in InternalEngine 2. Use _recovery_source if _source is disabled or modified (elastic#31106) 3. Soft-deletes retention policy based on the global checkpoint (elastic#30335) 4. Read operation history from Lucene instead of translog (elastic#30120) 5. Use Lucene history in peer-recovery (elastic#30522) These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand jpountz@gmail.com Co-authored-by: Boaz Leskes b.leskes@gmail.com Co-authored-by: Jason Tedor jason@tedor.me Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co Co-authored-by: Simon Willnauer simonw@apache.org
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (elastic#31106) - Soft-deletes retention policy based on the global checkpoint (elastic#30335) - Read operation history from Lucene instead of translog (elastic#30120) - Use Lucene history in peer-recovery (elastic#30522) Relates elastic#30086 Closes elastic#29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>
This commit adds Lucene soft-deletes as another source for peer-recovery besides translog.
Relates #29530