Add tombstone document into Lucene for Noop#30226
Conversation
This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy.
|
Pinging @elastic/es-distributed |
|
/cc @martijnvg and @jasontedor |
bleskes
left a comment
There was a problem hiding this comment.
LGTM. Left some minor nits.
| public ParsedDocument createDeleteTombstoneDoc(String index, String type, String id) throws MapperParsingException { | ||
| final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); | ||
| return documentParser.parseDocument(emptySource, tombstoneMetadataFieldMappers); | ||
| final Collection<String> deleteFields = Arrays.asList(VersionFieldMapper.NAME, IdFieldMapper.NAME, TypeFieldMapper.NAME, |
There was a problem hiding this comment.
Why did you stop caching this?
| this.seqID.primaryTerm.setLongValue(primaryTerm); | ||
| } | ||
|
|
||
| ParsedDocument toTombstone() { |
There was a problem hiding this comment.
maybe call this "markAsSoftDeleted"? NoOps doc are not really tombstones.
There was a problem hiding this comment.
Yes, noop docs are not really tombstones but markAsSoftDeleted is not correct. Is it ok for us to call noop docs "noop tombstones"?
There was a problem hiding this comment.
same confusion. Please ignore. I'll come up with better name than tombstone, if I can, but don't let that stop you.
| return documentParser.parseDocument(emptySource, deleteFieldMappers).toTombstone(); | ||
| } | ||
|
|
||
| public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException { |
There was a problem hiding this comment.
maybe call this createNoopDoc ? it's not really a tombstone
| public final Field seqNo; | ||
| public final Field seqNoDocValue; | ||
| public final Field primaryTerm; | ||
| public final Field tombstoneField; |
There was a problem hiding this comment.
isn't this a softDeleteField?
There was a problem hiding this comment.
No, this is not a softDeletes field. This field is used by a delete-op and an noop only.
There was a problem hiding this comment.
doh. I got confused. Sorry.
| final long primaryTerm = readNumericDV(leaves.get(leafIndex), SeqNoFieldMapper.PRIMARY_TERM_NAME, segmentDocID); | ||
| final FieldsVisitor fields = new FieldsVisitor(true); | ||
| searcher.doc(docID, fields); | ||
| fields.postProcess(mapper); |
There was a problem hiding this comment.
do we really need this? if not, then we can avoid chaining in the mapper..
There was a problem hiding this comment.
Yeah, I can get docType explicitly and remove this call.
There was a problem hiding this comment.
@bleskes We have to call postProcess to extract doc type and doc id (via Uid); otherwise we have to pass docType into FieldsVisitor manually.
| /** | ||
| * Asserts the provided engine has a consistent document history between translog and Lucene index. | ||
| */ | ||
| public static void assertConsistentHistoryBetweenTranslogAndLuceneIndex(Engine engine, MapperService mapper) throws IOException { |
| public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException { | ||
| final String id = ""; // _id won't be used. | ||
| final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); | ||
| final Collection<String> noopFields = |
There was a problem hiding this comment.
maybe make noop fields a constant?
| final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); | ||
| final Collection<String> noopFields = | ||
| Arrays.asList(SeqNoFieldMapper.NAME, SeqNoFieldMapper.PRIMARY_TERM_NAME, SeqNoFieldMapper.TOMBSTONE_NAME); | ||
| final MetadataFieldMapper[] noopFieldMappers = Stream.of(mapping.metadataMappers) |
There was a problem hiding this comment.
oh even better lets create these at construction time?
| assert tombstone.docs().size() == 1 : "Tombstone should have a single doc [" + tombstone + "]"; | ||
| addStaleDocs(tombstone.docs(), indexWriter); | ||
| } catch (Exception ex) { | ||
| if (indexWriter.getTragicException() != null) { |
There was a problem hiding this comment.
you should call here if (maybeFailEngine("delete", ex) { throw ex; }
| final IndexSettings indexSettings = mapperService.getIndexSettings(); | ||
| this.mapping = mapping; | ||
| this.documentParser = new DocumentParser(indexSettings, mapperService.documentMapperParser(), this); | ||
| final Collection<String> tombstoneFields = |
| this.seqID.primaryTerm.setLongValue(primaryTerm); | ||
| } | ||
|
|
||
| ParsedDocument toTombstone() { |
| public static final String NAME = "_seq_no"; | ||
| public static final String CONTENT_TYPE = "_seq_no"; | ||
| public static final String PRIMARY_TERM_NAME = "_primary_term"; | ||
| public static final String TOMBSTONE_NAME = "_tombstone"; |
There was a problem hiding this comment.
should we use this as the ground truth in the engine as well? we have a constant in Lucene.java too no?
There was a problem hiding this comment.
We don't have the same constant in Lucene or Engine. I am not sure about your suggestion here. Can you elaborate it?
| } | ||
| @Override | ||
| public ParsedDocument newNoopTombstoneDoc() { | ||
| final RootObjectMapper.Builder rootMapper = new RootObjectMapper.Builder("__noop"); |
There was a problem hiding this comment.
do we really need to build this mapper every time we call newNoopTombstoneDoc
|
@simonw I've addressed your feedbacks. Can you please take another look? Thank you! |
This reverts commit 11c2d53.
|
@elasticmachine test this please |
|
@elasticmachine retest this please |
|
The last two builds were failed of the incorrect numDocs. We may need to get #30228 in before this PR. |
|
@elasticmachine retest this please |
|
@elasticmachine retest this please. |
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contains both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have either _id or _version. This change makes _id and _version terms optional in PerThreadIDVersionAndSeqNoLookup. Relates elastic#30226
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226
This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy. Relates #29530
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226
This commit adds a tombstone document into Lucene for every No-op. With
this change, Lucene index is expected to have a complete history of
operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.
Relates #29530