Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tombstone document into Lucene for Noop #30226

Merged
merged 9 commits into from
May 2, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 28, 2018

This commit adds a tombstone document into Lucene for every No-op. With
this change, Lucene index is expected to have a complete history of
operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.

Relates #29530

This commit adds a tombstone document into Lucene for every No-op. With
this change, Lucene index is expected to have a complete history of
operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.
@dnhatn dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 28, 2018
@dnhatn dnhatn requested review from s1monw and bleskes April 28, 2018 01:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member Author

dnhatn commented Apr 28, 2018

/cc @martijnvg and @jasontedor

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some minor nits.

final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
return documentParser.parseDocument(emptySource, tombstoneMetadataFieldMappers);
final Collection<String> deleteFields = Arrays.asList(VersionFieldMapper.NAME, IdFieldMapper.NAME, TypeFieldMapper.NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you stop caching this?

@@ -83,6 +83,13 @@ public void updateSeqID(long sequenceNumber, long primaryTerm) {
this.seqID.primaryTerm.setLongValue(primaryTerm);
}

ParsedDocument toTombstone() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this "markAsSoftDeleted"? NoOps doc are not really tombstones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, noop docs are not really tombstones but markAsSoftDeleted is not correct. Is it ok for us to call noop docs "noop tombstones"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same confusion. Please ignore. I'll come up with better name than tombstone, if I can, but don't let that stop you.

return documentParser.parseDocument(emptySource, deleteFieldMappers).toTombstone();
}

public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this createNoopDoc ? it's not really a tombstone

@@ -69,26 +69,29 @@
public final Field seqNo;
public final Field seqNoDocValue;
public final Field primaryTerm;
public final Field tombstoneField;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this a softDeleteField?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not a softDeletes field. This field is used by a delete-op and an noop only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh. I got confused. Sorry.

final long primaryTerm = readNumericDV(leaves.get(leafIndex), SeqNoFieldMapper.PRIMARY_TERM_NAME, segmentDocID);
final FieldsVisitor fields = new FieldsVisitor(true);
searcher.doc(docID, fields);
fields.postProcess(mapper);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this? if not, then we can avoid chaining in the mapper..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can get docType explicitly and remove this call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes We have to call postProcess to extract doc type and doc id (via Uid); otherwise we have to pass docType into FieldsVisitor manually.

/**
* Asserts the provided engine has a consistent document history between translog and Lucene index.
*/
public static void assertConsistentHistoryBetweenTranslogAndLuceneIndex(Engine engine, MapperService mapper) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException {
final String id = ""; // _id won't be used.
final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
final Collection<String> noopFields =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make noop fields a constant?

final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
final Collection<String> noopFields =
Arrays.asList(SeqNoFieldMapper.NAME, SeqNoFieldMapper.PRIMARY_TERM_NAME, SeqNoFieldMapper.TOMBSTONE_NAME);
final MetadataFieldMapper[] noopFieldMappers = Stream.of(mapping.metadataMappers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh even better lets create these at construction time?

assert tombstone.docs().size() == 1 : "Tombstone should have a single doc [" + tombstone + "]";
addStaleDocs(tombstone.docs(), indexWriter);
} catch (Exception ex) {
if (indexWriter.getTragicException() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should call here if (maybeFailEngine("delete", ex) { throw ex; }

@@ -133,10 +132,6 @@ public DocumentMapper(MapperService mapperService, Mapping mapping) {
final IndexSettings indexSettings = mapperService.getIndexSettings();
this.mapping = mapping;
this.documentParser = new DocumentParser(indexSettings, mapperService.documentMapperParser(), this);
final Collection<String> tombstoneFields =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why is this no good?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored it.

@@ -83,6 +83,13 @@ public void updateSeqID(long sequenceNumber, long primaryTerm) {
this.seqID.primaryTerm.setLongValue(primaryTerm);
}

ParsedDocument toTombstone() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java docs?

}
}

public static final String NAME = "_seq_no";
public static final String CONTENT_TYPE = "_seq_no";
public static final String PRIMARY_TERM_NAME = "_primary_term";
public static final String TOMBSTONE_NAME = "_tombstone";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use this as the ground truth in the engine as well? we have a constant in Lucene.java too no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the same constant in Lucene or Engine. I am not sure about your suggestion here. Can you elaborate it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind I was confused

}
@Override
public ParsedDocument newNoopTombstoneDoc() {
final RootObjectMapper.Builder rootMapper = new RootObjectMapper.Builder("__noop");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to build this mapper every time we call newNoopTombstoneDoc

@dnhatn
Copy link
Member Author

dnhatn commented Apr 30, 2018

@simonw I've addressed your feedbacks. Can you please take another look? Thank you!

@dnhatn dnhatn requested a review from s1monw April 30, 2018 14:27
@dnhatn
Copy link
Member Author

dnhatn commented Apr 30, 2018

@elasticmachine test this please

@dnhatn
Copy link
Member Author

dnhatn commented May 1, 2018

@elasticmachine retest this please

@dnhatn
Copy link
Member Author

dnhatn commented May 1, 2018

The last two builds were failed of the incorrect numDocs. We may need to get #30228 in before this PR.

FAILURE 42.5s J0 | MlDistributedFailureIT.testFailOver <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: sync id is equal but number of docs does not match on node node_t2. expected 18 but got 19
   > Expected: <19L>
   >      but: was <18L>
org.elasticsearch.test.InternalTestCluster.assertSameSyncIdSameDocs(InternalTestCluster.java:1124)
FAILURE 22.5s J0 | MlDistributedFailureIT.testFullClusterRestart <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: sync id is equal but number of docs does not match on node node_t1. expected 17 but got 16
   > Expected: <16L>
   >      but: was <17L>
org.elasticsearch.test.InternalTestCluster.assertSameSyncIdSameDocs(InternalTestCluster.java:1124)

@dnhatn
Copy link
Member Author

dnhatn commented May 1, 2018

@elasticmachine retest this please

@dnhatn
Copy link
Member Author

dnhatn commented May 1, 2018

@elasticmachine retest this please.

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2018

Thanks @bleskes and @simonw!

@dnhatn dnhatn merged commit d621fc7 into elastic:ccr May 2, 2018
@dnhatn dnhatn deleted the add-tombstone-noop branch May 2, 2018 13:08
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request May 5, 2018
Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contains both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have either _id or _version.

This change makes _id and _version terms optional in
PerThreadIDVersionAndSeqNoLookup.

Relates elastic#30226
dnhatn added a commit that referenced this pull request May 8, 2018
Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contain both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have neither _id or _version because a no-op does not
contain these terms.

This change adds a dummy version to no-ops and makes _id terms optional
in PerThreadIDVersionAndSeqNoLookup.

Relates #30226
dnhatn added a commit that referenced this pull request May 10, 2018
This commit adds a tombstone document into Lucene for every No-op.
With this change, Lucene index is expected to have a complete history
of operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.

Relates #29530
dnhatn added a commit that referenced this pull request May 10, 2018
Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contain both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have neither _id or _version because a no-op does not
contain these terms.

This change adds a dummy version to no-ops and makes _id terms optional
in PerThreadIDVersionAndSeqNoLookup.

Relates #30226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants