Add tombstone document into Lucene for Noop #30226

dnhatn · 2018-04-28T01:41:15Z

This commit adds a tombstone document into Lucene for every No-op. With
this change, Lucene index is expected to have a complete history of
operations like Translog. In fact, this guarantee is subjected to the
soft-deletes retention merge policy.

Relates #29530

This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy.

elasticmachine · 2018-04-28T01:41:17Z

Pinging @elastic/es-distributed

dnhatn · 2018-04-28T01:41:40Z

/cc @martijnvg and @jasontedor

bleskes

LGTM. Left some minor nits.

bleskes · 2018-04-30T08:34:22Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentMapper.java

        final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
-        return documentParser.parseDocument(emptySource, tombstoneMetadataFieldMappers);
+        final Collection<String> deleteFields = Arrays.asList(VersionFieldMapper.NAME, IdFieldMapper.NAME, TypeFieldMapper.NAME,


Why did you stop caching this?

bleskes · 2018-04-30T08:36:01Z

server/src/main/java/org/elasticsearch/index/mapper/ParsedDocument.java

@@ -83,6 +83,13 @@ public void updateSeqID(long sequenceNumber, long primaryTerm) {
        this.seqID.primaryTerm.setLongValue(primaryTerm);
    }

+    ParsedDocument toTombstone() {


maybe call this "markAsSoftDeleted"? NoOps doc are not really tombstones.

Yes, noop docs are not really tombstones but markAsSoftDeleted is not correct. Is it ok for us to call noop docs "noop tombstones"?

same confusion. Please ignore. I'll come up with better name than tombstone, if I can, but don't let that stop you.

bleskes · 2018-04-30T08:36:22Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentMapper.java

+        return documentParser.parseDocument(emptySource, deleteFieldMappers).toTombstone();
+    }
+
+    public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException {


maybe call this createNoopDoc ? it's not really a tombstone

bleskes · 2018-04-30T08:36:37Z

server/src/main/java/org/elasticsearch/index/mapper/SeqNoFieldMapper.java

@@ -69,26 +69,29 @@
        public final Field seqNo;
        public final Field seqNoDocValue;
        public final Field primaryTerm;
+        public final Field tombstoneField;


isn't this a softDeleteField?

No, this is not a softDeletes field. This field is used by a delete-op and an noop only.

doh. I got confused. Sorry.

bleskes · 2018-04-30T08:43:12Z

test/framework/src/main/java/org/elasticsearch/index/engine/EngineTestCase.java

+        final long primaryTerm = readNumericDV(leaves.get(leafIndex), SeqNoFieldMapper.PRIMARY_TERM_NAME, segmentDocID);
+        final FieldsVisitor fields = new FieldsVisitor(true);
+        searcher.doc(docID, fields);
+        fields.postProcess(mapper);


do we really need this? if not, then we can avoid chaining in the mapper..

Yeah, I can get docType explicitly and remove this call.

@bleskes We have to call postProcess to extract doc type and doc id (via Uid); otherwise we have to pass docType into FieldsVisitor manually.

bleskes · 2018-04-30T08:43:19Z

test/framework/src/main/java/org/elasticsearch/index/engine/EngineTestCase.java

+    /**
+     * Asserts the provided engine has a consistent document history between translog and Lucene index.
+     */
+    public static void assertConsistentHistoryBetweenTranslogAndLuceneIndex(Engine engine, MapperService mapper) throws IOException {


s1monw · 2018-04-30T08:25:58Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentMapper.java

+    public ParsedDocument createNoopTombstoneDoc(String index) throws MapperParsingException {
+        final String id = ""; // _id won't be used.
+        final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
+        final Collection<String> noopFields =


maybe make noop fields a constant?

s1monw · 2018-04-30T08:28:12Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentMapper.java

+        final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON);
+        final Collection<String> noopFields =
+            Arrays.asList(SeqNoFieldMapper.NAME, SeqNoFieldMapper.PRIMARY_TERM_NAME, SeqNoFieldMapper.TOMBSTONE_NAME);
+        final MetadataFieldMapper[] noopFieldMappers = Stream.of(mapping.metadataMappers)


oh even better lets create these at construction time?

s1monw · 2018-04-30T08:32:44Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                    assert tombstone.docs().size() == 1 : "Tombstone should have a single doc [" + tombstone + "]";
+                    addStaleDocs(tombstone.docs(), indexWriter);
+                } catch (Exception ex) {
+                    if (indexWriter.getTragicException() != null) {


you should call here if (maybeFailEngine("delete", ex) { throw ex; }

s1monw · 2018-04-30T08:37:56Z

server/src/main/java/org/elasticsearch/index/mapper/DocumentMapper.java

@@ -133,10 +132,6 @@ public DocumentMapper(MapperService mapperService, Mapping mapping) {
        final IndexSettings indexSettings = mapperService.getIndexSettings();
        this.mapping = mapping;
        this.documentParser = new DocumentParser(indexSettings, mapperService.documentMapperParser(), this);
-        final Collection<String> tombstoneFields =


hmm why is this no good?

I restored it.

s1monw · 2018-04-30T08:38:08Z

server/src/main/java/org/elasticsearch/index/mapper/ParsedDocument.java

@@ -83,6 +83,13 @@ public void updateSeqID(long sequenceNumber, long primaryTerm) {
        this.seqID.primaryTerm.setLongValue(primaryTerm);
    }

+    ParsedDocument toTombstone() {


s1monw · 2018-04-30T08:40:00Z

server/src/main/java/org/elasticsearch/index/mapper/SeqNoFieldMapper.java

        }
    }

    public static final String NAME = "_seq_no";
    public static final String CONTENT_TYPE = "_seq_no";
    public static final String PRIMARY_TERM_NAME = "_primary_term";
+    public static final String TOMBSTONE_NAME = "_tombstone";


should we use this as the ground truth in the engine as well? we have a constant in Lucene.java too no?

We don't have the same constant in Lucene or Engine. I am not sure about your suggestion here. Can you elaborate it?

nevermind I was confused

s1monw · 2018-04-30T08:43:43Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+            }
+            @Override
+            public ParsedDocument newNoopTombstoneDoc() {
+                final RootObjectMapper.Builder rootMapper = new RootObjectMapper.Builder("__noop");


do we really need to build this mapper every time we call newNoopTombstoneDoc

We have a single type in 7.0

dnhatn · 2018-04-30T14:27:30Z

@simonw I've addressed your feedbacks. Can you please take another look? Thank you!

This reverts commit 11c2d53.

dnhatn · 2018-04-30T19:41:35Z

@elasticmachine test this please

dnhatn · 2018-05-01T00:21:54Z

@elasticmachine retest this please

dnhatn · 2018-05-01T02:56:53Z

The last two builds were failed of the incorrect numDocs. We may need to get #30228 in before this PR.

FAILURE 42.5s J0 | MlDistributedFailureIT.testFailOver <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: sync id is equal but number of docs does not match on node node_t2. expected 18 but got 19
   > Expected: <19L>
   >      but: was <18L>
org.elasticsearch.test.InternalTestCluster.assertSameSyncIdSameDocs(InternalTestCluster.java:1124)

FAILURE 22.5s J0 | MlDistributedFailureIT.testFullClusterRestart <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: sync id is equal but number of docs does not match on node node_t1. expected 17 but got 16
   > Expected: <16L>
   >      but: was <17L>
org.elasticsearch.test.InternalTestCluster.assertSameSyncIdSameDocs(InternalTestCluster.java:1124)

dnhatn · 2018-05-01T02:58:23Z

@elasticmachine retest this please

dnhatn · 2018-05-01T19:31:53Z

@elasticmachine retest this please.

dnhatn · 2018-05-02T13:07:02Z

Thanks @bleskes and @simonw!

Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contains both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have either _id or _version. This change makes _id and _version terms optional in PerThreadIDVersionAndSeqNoLookup. Relates elastic#30226

Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226

This commit adds a tombstone document into Lucene for every No-op. With this change, Lucene index is expected to have a complete history of operations like Translog. In fact, this guarantee is subjected to the soft-deletes retention merge policy. Relates #29530

Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226

dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 28, 2018

dnhatn requested review from s1monw and bleskes April 28, 2018 01:41

bleskes approved these changes Apr 30, 2018

View reviewed changes

s1monw requested changes Apr 30, 2018

View reviewed changes

dnhatn added 5 commits April 30, 2018 09:53

Use type explicitly

11c2d53

We have a single type in 7.0

Do not recreate fields

aa6cd3f

javadocs

8d465a0

maybe fail engine with exception

e0fe986

assert tombstone docs has _tombstone field

4b1b565

dnhatn requested a review from s1monw April 30, 2018 14:27

Revert "Use type explicitly"

c70aa04

This reverts commit 11c2d53.

dnhatn added 2 commits May 1, 2018 21:55

Merge branch 'ccr' into add-tombstone-noop

0dff89a

Fix testDocumentFailureReplication test

47716c4

s1monw approved these changes May 2, 2018

View reviewed changes

dnhatn merged commit d621fc7 into elastic:ccr May 2, 2018

dnhatn deleted the add-tombstone-noop branch May 2, 2018 13:08

dnhatn added the backport pending label May 2, 2018

dnhatn mentioned this pull request May 2, 2018

Use soft-deletes to maintain document history #29530

Closed

14 tasks

dnhatn mentioned this pull request May 5, 2018

Make _id terms optional in segment containing only noop #30409

Merged

dnhatn removed the backport pending label May 10, 2018

DaveCTurner mentioned this pull request Jan 17, 2020

Exceeding the maximum doc count of a shard fails the shard #51136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tombstone document into Lucene for Noop #30226

Add tombstone document into Lucene for Noop #30226

dnhatn commented Apr 28, 2018

elasticmachine commented Apr 28, 2018

dnhatn commented Apr 28, 2018

bleskes left a comment

bleskes Apr 30, 2018

bleskes Apr 30, 2018

dnhatn Apr 30, 2018

bleskes Apr 30, 2018

bleskes Apr 30, 2018

bleskes Apr 30, 2018

dnhatn Apr 30, 2018

bleskes Apr 30, 2018

bleskes Apr 30, 2018

dnhatn Apr 30, 2018

dnhatn Apr 30, 2018

bleskes Apr 30, 2018

s1monw Apr 30, 2018

s1monw Apr 30, 2018

s1monw Apr 30, 2018

s1monw Apr 30, 2018

dnhatn Apr 30, 2018

s1monw Apr 30, 2018

s1monw Apr 30, 2018

dnhatn Apr 30, 2018

s1monw May 2, 2018

s1monw Apr 30, 2018

dnhatn commented Apr 30, 2018

dnhatn commented Apr 30, 2018

dnhatn commented May 1, 2018

dnhatn commented May 1, 2018 •

edited

Loading

dnhatn commented May 1, 2018

dnhatn commented May 1, 2018

dnhatn commented May 2, 2018

Add tombstone document into Lucene for Noop #30226

Add tombstone document into Lucene for Noop #30226

Conversation

dnhatn commented Apr 28, 2018

elasticmachine commented Apr 28, 2018

dnhatn commented Apr 28, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented Apr 30, 2018

dnhatn commented Apr 30, 2018

dnhatn commented May 1, 2018

dnhatn commented May 1, 2018 • edited Loading

dnhatn commented May 1, 2018

dnhatn commented May 1, 2018

dnhatn commented May 2, 2018

dnhatn commented May 1, 2018 •

edited

Loading