-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tombstone document into Lucene for Noop #30226
Conversation
This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy.
Pinging @elastic/es-distributed |
/cc @martijnvg and @jasontedor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some minor nits.
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); | ||
return documentParser.parseDocument(emptySource, tombstoneMetadataFieldMappers); | ||
final Collection<String> deleteFields = Arrays.asList(VersionFieldMapper.NAME, IdFieldMapper.NAME, TypeFieldMapper.NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you stop caching this?
@@ -83,6 +83,13 @@ public void updateSeqID(long sequenceNumber, long primaryTerm) { | |||
this.seqID.primaryTerm.setLongValue(primaryTerm); | |||
} | |||
|
|||
ParsedDocument toTombstone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this "markAsSoftDeleted"? NoOps doc are not really tombstones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, noop docs are not really tombstones but markAsSoftDeleted
is not correct. Is it ok for us to call noop docs "noop tombstones"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same confusion. Please ignore. I'll come up with better name than tombstone, if I can, but don't let that stop you.
return documentParser.parseDocument(emptySource, deleteFieldMappers).toTombstone(); | ||
} | ||
|
||
public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this createNoopDoc ? it's not really a tombstone
@@ -69,26 +69,29 @@ | |||
public final Field seqNo; | |||
public final Field seqNoDocValue; | |||
public final Field primaryTerm; | |||
public final Field tombstoneField; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a softDeleteField?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is not a softDeletes field. This field is used by a delete-op and an noop only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh. I got confused. Sorry.
final long primaryTerm = readNumericDV(leaves.get(leafIndex), SeqNoFieldMapper.PRIMARY_TERM_NAME, segmentDocID); | ||
final FieldsVisitor fields = new FieldsVisitor(true); | ||
searcher.doc(docID, fields); | ||
fields.postProcess(mapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this? if not, then we can avoid chaining in the mapper..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can get docType explicitly and remove this call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bleskes We have to call postProcess
to extract doc type and doc id (via Uid); otherwise we have to pass docType into FieldsVisitor manually.
/** | ||
* Asserts the provided engine has a consistent document history between translog and Lucene index. | ||
*/ | ||
public static void assertConsistentHistoryBetweenTranslogAndLuceneIndex(Engine engine, MapperService mapper) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException { | ||
final String id = ""; // _id won't be used. | ||
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); | ||
final Collection<String> noopFields = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make noop fields a constant?
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); | ||
final Collection<String> noopFields = | ||
Arrays.asList(SeqNoFieldMapper.NAME, SeqNoFieldMapper.PRIMARY_TERM_NAME, SeqNoFieldMapper.TOMBSTONE_NAME); | ||
final MetadataFieldMapper[] noopFieldMappers = Stream.of(mapping.metadataMappers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh even better lets create these at construction time?
assert tombstone.docs().size() == 1 : "Tombstone should have a single doc [" + tombstone + "]"; | ||
addStaleDocs(tombstone.docs(), indexWriter); | ||
} catch (Exception ex) { | ||
if (indexWriter.getTragicException() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should call here if (maybeFailEngine("delete", ex) { throw ex; }
@@ -133,10 +132,6 @@ public DocumentMapper(MapperService mapperService, Mapping mapping) { | |||
final IndexSettings indexSettings = mapperService.getIndexSettings(); | |||
this.mapping = mapping; | |||
this.documentParser = new DocumentParser(indexSettings, mapperService.documentMapperParser(), this); | |||
final Collection<String> tombstoneFields = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm why is this no good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored it.
@@ -83,6 +83,13 @@ public void updateSeqID(long sequenceNumber, long primaryTerm) { | |||
this.seqID.primaryTerm.setLongValue(primaryTerm); | |||
} | |||
|
|||
ParsedDocument toTombstone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java docs?
} | ||
} | ||
|
||
public static final String NAME = "_seq_no"; | ||
public static final String CONTENT_TYPE = "_seq_no"; | ||
public static final String PRIMARY_TERM_NAME = "_primary_term"; | ||
public static final String TOMBSTONE_NAME = "_tombstone"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use this as the ground truth in the engine as well? we have a constant in Lucene.java too no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have the same constant in Lucene or Engine. I am not sure about your suggestion here. Can you elaborate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind I was confused
} | ||
@Override | ||
public ParsedDocument newNoopTombstoneDoc() { | ||
final RootObjectMapper.Builder rootMapper = new RootObjectMapper.Builder("__noop"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to build this mapper every time we call newNoopTombstoneDoc
@simonw I've addressed your feedbacks. Can you please take another look? Thank you! |
This reverts commit 11c2d53.
@elasticmachine test this please |
@elasticmachine retest this please |
The last two builds were failed of the incorrect numDocs. We may need to get #30228 in before this PR.
|
@elasticmachine retest this please |
@elasticmachine retest this please. |
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contains both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have either _id or _version. This change makes _id and _version terms optional in PerThreadIDVersionAndSeqNoLookup. Relates elastic#30226
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226
This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy. Relates #29530
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226
This commit adds a tombstone document into Lucene for every No-op. With
this change, Lucene index is expected to have a complete history of
operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.
Relates #29530