Introduce retention leases versioning#37951
Conversation
Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.
|
Pinging @elastic/es-distributed |
dnhatn
left a comment
There was a problem hiding this comment.
Thanks @jasontedor for a quick response :). I left a comment to discuss.
| assert primaryMode == false; | ||
| this.retentionLeases.clear(); | ||
| this.retentionLeases.putAll(retentionLeases.stream().collect(Collectors.toMap(RetentionLease::id, Function.identity()))); | ||
| if (retentionLeases.version() > version) { |
There was a problem hiding this comment.
We might have an issue with the primary fail-over here. Suppose the primary is syncing two new leases requests L1 and L2, both of them are processed on replica-1, but the primary crashes before any of them arrives on the replica-2. If the replica-2 is promoted, then we add two new leases: L3 and L4. Unfortunately, replica-1 will ignore these two new requests, and its leases (L1 and L2) are different from the leases on the new primary (L3 and L4). I think we can use the primary-term with the leases version to have a total ordering pair <<term, version>>. What do you think?
There was a problem hiding this comment.
I wish we could find a simpler way, but I can't. +1 from me on the suggestion.
There was a problem hiding this comment.
I also looked for a simpler way, and @ywelsch did too. I could not find anything, and neither could @ywelsch. We synced earlier and agreed that this looks like the only way forward. We need to do a refactoring to the replication tracker and index shard to push the operation primary term down into the tracker, and then we will proceed with updating this PR.
There was a problem hiding this comment.
@dnhatn is going to work on adding the retention lease infrastructure to index level replication test case infrastructure so that we can test these scenarios.
There was a problem hiding this comment.
@dnhatn I have updated this PR to include the primary term here. I want to add some additional testing, but this PR is ready for another review round from you.
* master: (100 commits) Push primary term to replication tracker (elastic#38044) Introduce ability to minimize round-trips in CCS (elastic#37828) Don't Assert Ack on when Publish Timeout is 0 in Test (elastic#38077) Reduce object creation in Rounding class (elastic#38061) Treat put-mapping calls with `_doc` as a top-level key as typed calls. (elastic#38032) Fix test bug when testing the merging of mappings and templates. (elastic#38021) spelling: java script -- not JavaScript (elastic#37057) Enable SSL in reindex with security QA tests (elastic#37600) Disable BWC tests during backport (elastic#38074) SQL: Added SSL configuration options tests (elastic#37875) Minor fixes in the release notes script. (elastic#37967) Fix typo in docs. (elastic#38018) Update Lucene repo for 7.0.0-alpha2 (elastic#37985) Fix size of rolling-upgrade bootstrap config (elastic#38031) fix DateIndexNameProcessorTests offset pattern (elastic#38069) Speed up converting of temporal accessor to zoned date time (elastic#37915) Work around JDK8 timezone bug in tests (elastic#37968) Correct arg names when update mapping/settings from leader (elastic#38063) Introduce ssl settings to reindex from remote (elastic#37527) Mute testRetentionLeasesSyncOnExpiration ...
…ersion * elastic/master: Do not set up NodeAndClusterIdStateListener in test (elastic#38110) ML: better handle task state race condition (elastic#38040) Soft-deletes policy should always fetch latest leases (elastic#37940) Handle scheduler exceptions (elastic#38014) Minor logging improvements (elastic#38084) Fix Painless void return bug (elastic#38046) Update PutFollowAction serialization post-backport (elastic#37989) fix a few versionAdded values in ElasticsearchExceptions (elastic#37877) Reenable BWC tests after backport of elastic#37899 (elastic#38093) Mute failing test Mute failing test Fail start on obsolete indices documentation (elastic#37786) SQL: Implement FIRST/LAST aggregate functions (elastic#37936) Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock remove unused parser fields in RemoteResponseParsers
dnhatn
left a comment
There was a problem hiding this comment.
I left some minor comments. This PR is really good already. Thanks @jasontedor.
server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/seqno/RetentionLeases.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/seqno/RetentionLeases.java
Outdated
Show resolved
Hide resolved
…TrackerRetentionLeaseTests.java
|
@elasticmachine test this please |
|
@elasticmachine test this please |
* master: (36 commits) Ensure joda compatibility in custom date formats (elastic#38171) Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (elastic#38169) Do not set timeout for IndexRequests in GatewayIndexStateIT (elastic#38147) Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (elastic#38178) SQL: [Docs] Add limitation for aggregate functions on scalars (elastic#38186) Add elasticsearch-node detach-cluster command (elastic#37979) Add tests for fractional epoch parsing (elastic#38162) Enable bw tests for elastic#37871 and elastic#38032. (elastic#38167) Clear send behavior rule in CloseWhileRelocatingShardsIT (elastic#38159) Fix testCorruptedIndex (elastic#38161) Add finalReduce flag to SearchRequest (elastic#38104) Forbid negative field boosts in analyzed queries (elastic#37930) Remove AtomiFieldData#getLegacyFieldValues (elastic#38087) Universal cluster bootstrap method for tests with autoMinMasterNodes=false (elastic#38038) Fix FullClusterRestartIT.testHistoryUUIDIsAdded (elastic#38098) Replace joda time in ingest-common module (elastic#38088) Fix eclipse config for ssl-config (elastic#38096) Don't load global ordinals with the `map` execution_hint (elastic#37833) Relax fault detector in some disruption tests (elastic#38101) Fix java time epoch date formatters (elastic#37829) ...
* master: Replace awaitBusy with assertBusy in atLeastDocsIndexed (elastic#38190) Adjust SearchRequest version checks (elastic#38181) AwaitsFix testClientSucceedsWithVerificationDisabled (elastic#38213) Zen2ify RareClusterStateIT (elastic#38184) ML: Fix error race condition on stop _all datafeeds and close _all jobs (elastic#38113) AwaitsFix PUT mapping with _doc on an index that has types (elastic#38204) Allow built-in monitoring_user role to call GET _xpack API (elastic#38060) Update geo_shape docs to include unsupported features (elastic#38138) [ML] Remove "8" prefixes from file structure finder timestamp formats (elastic#38016) Disable bwc tests while backporting elastic#38104 (elastic#38182) Enable TLSv1.3 by default for JDKs with support (elastic#38103) Fix _host based require filters (elastic#38173) RestoreService should update primary terms when restoring shards of existing indices (elastic#38177) Throw if two inner_hits have the same name (elastic#37645)
…ersion * elastic/master: SnapshotShardsService Simplifications (elastic#38025) Default include_type_name to false in the yml test harness. (elastic#38058) Disable bwc preparing to backport of#37977, elastic#37857 and elastic#37872 (elastic#38126) Adding ml_settings entry to HLRC and Docs for deprecation_info (elastic#38118)
* elastic/master: Introduce retention leases versioning (elastic#37951) Correctly disable tests for FIPS JVMs (elastic#38214)
* elastic/master: (54 commits) Introduce retention leases versioning (elastic#37951) Correctly disable tests for FIPS JVMs (elastic#38214) AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (elastic#38227) Preserve ILM operation mode when creating new lifecycles (elastic#38134) Enable trace log in FollowerFailOverIT (elastic#38148) SnapshotShardsService Simplifications (elastic#38025) Default include_type_name to false in the yml test harness. (elastic#38058) Disable bwc preparing to backport of#37977, elastic#37857 and elastic#37872 (elastic#38126) Adding ml_settings entry to HLRC and Docs for deprecation_info (elastic#38118) Replace awaitBusy with assertBusy in atLeastDocsIndexed (elastic#38190) Adjust SearchRequest version checks (elastic#38181) AwaitsFix testClientSucceedsWithVerificationDisabled (elastic#38213) Zen2ify RareClusterStateIT (elastic#38184) ML: Fix error race condition on stop _all datafeeds and close _all jobs (elastic#38113) AwaitsFix PUT mapping with _doc on an index that has types (elastic#38204) Allow built-in monitoring_user role to call GET _xpack API (elastic#38060) Update geo_shape docs to include unsupported features (elastic#38138) [ML] Remove "8" prefixes from file structure finder timestamp formats (elastic#38016) Disable bwc tests while backporting elastic#38104 (elastic#38182) Enable TLSv1.3 by default for JDKs with support (elastic#38103) ...
We should increase primary term before renewing leases; otherwise, the term of the latest RetentionLeases will be lower than the current term. Relates #37951
If the innerLength is 0, the version won't be increased; then there will be two RetentionLeases with the same term and version, but their leases are different. Relates elastic#37951 Closes elastic#38245
Because concurrent sync requests from a primary to its replicas could be in flight, it can be the case that an older retention leases collection arrives and is processed on the replica after a newer retention leases collection has arrived and been processed. Without a defense, in this case the replica would overwrite the newer retention leases with the older retention leases. This commit addresses this issue by introducing a versioning scheme to retention leases. This versioning scheme is used to resolve out-of-order processing on the replica. We persist this version into Lucene and restore it on recovery. The encoding of retention leases is starting to get a little ugly. We can consider addressing this in a follow-up.
Relates #37165