Ensure no uncommitted ops when open readonly engine #41317

dnhatn · 2019-04-17T18:58:24Z

Closing a follower index can make the cluster become red because the sanity check (i.e., max_seq_no equals to the global checkpoint) does not hold for closed follow indices. The main purpose of this sanity check is to ensure that peer recovery of closed indices can safely skip phase 2 (won't replay translog operations) without losing data. We can achieve the same goal by making sure that all existing operations are committed in the last index commit.

Relates #33888
See #38767 (comment)

elasticmachine · 2019-04-17T18:58:26Z

Pinging @elastic/es-distributed

s1monw

I left some comments

s1monw · 2019-04-25T13:17:05Z

server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java

+     * so peer recovery of closed indices can skip phase 2 (i.e., not replaying translog operations) without losing data.
+     */
+    private void ensureNoUncommittedOperation(SeqNoStats seqNoStats, SegmentInfos segmentInfos) throws IOException {
+        // we can't enforce this check on an old index - should we prevent this engine as a recovery source?


what would that mean? Can we restrict this more ie. doesn't it only apply if we used to be a follower engine?

Yes, I pushed 8ba5ca5 to skip this check if max_seq_no equals to the global checkpoint.

s1monw · 2019-04-25T13:18:29Z

server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java

                this.indexCommit = Lucene.getIndexCommit(lastCommittedSegmentInfos, directory);
                reader = open(indexCommit);
                reader = wrapReader(reader, readerWrapperFunction);
                searcherManager = new SearcherManager(reader, searcherFactory);
+                if (seqNoStats == null) {


when does this happen ie. when is this null?

We should run this check only when this read-only engine is the only engine opened with its shard (i.e., when obtain write lock is true).

s1monw · 2019-04-25T13:19:20Z

server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java

+            localCheckpointTracker = createLocalCheckpointTracker(engineConfig, segmentInfos, logger,
+                () -> new Searcher("build_checkpoint_tracker", new IndexSearcher(reader), () -> {}), LocalCheckpointTracker::new);
+        }
+        try (Translog translog = new Translog(engineConfig.getTranslogConfig(), translogUUID, translogDeletionPolicy,


I am a bit afraid of what would happen if that translog is very large? I mean we are reading the entire thing off disk no? Also isn't this expected to be empty or at least small?

With 8ba5ca5, we now only execute this check if we were a follower before and have gaps in sequence numbers. Even in that situation, we would read a few last translog generations since we need to read only operations since the local checkpoint of the index commit.

dnhatn · 2019-04-26T02:24:02Z

@s1monw Thanks for looking. I've addressed your comments.

dnhatn · 2019-04-29T19:41:54Z

Discussed with @ywelsch on another channel, we preferred not to proceed with this change for it might not be sufficient for closed follower indices. We will explore another option using a read-only marker.

Ensure no uncommitted ops when open readonly engine

52dabdd

dnhatn added >enhancement :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v8.0.0 v7.2.0 labels Apr 17, 2019

dnhatn requested review from s1monw, ywelsch and henningandersen April 17, 2019 18:58

tlrx mentioned this pull request Apr 17, 2019

Replicate closed indices #33888

Closed

50 tasks

s1monw requested changes Apr 25, 2019

View reviewed changes

dnhatn added 3 commits April 25, 2019 13:24

Merge branch 'master' into no-uncommited-ops

b9f5f02

simonw feedback

8ba5ca5

unused imports

2e28eec

dnhatn requested a review from s1monw April 26, 2019 02:24

dnhatn closed this Apr 29, 2019

dnhatn deleted the no-uncommited-ops branch April 29, 2019 19:41

dnhatn mentioned this pull request May 23, 2019

Integrate closed replicated indices with closed follower indices #42442

Open

dnhatn removed v7.2.0 v8.0.0 labels Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure no uncommitted ops when open readonly engine #41317

Ensure no uncommitted ops when open readonly engine #41317

dnhatn commented Apr 17, 2019

elasticmachine commented Apr 17, 2019

s1monw left a comment

s1monw Apr 25, 2019

dnhatn Apr 26, 2019

s1monw Apr 25, 2019

dnhatn Apr 26, 2019

s1monw Apr 25, 2019

dnhatn Apr 26, 2019 •

edited

Loading

dnhatn commented Apr 26, 2019

dnhatn commented Apr 29, 2019

Ensure no uncommitted ops when open readonly engine #41317

Ensure no uncommitted ops when open readonly engine #41317

Conversation

dnhatn commented Apr 17, 2019

elasticmachine commented Apr 17, 2019

s1monw left a comment

Choose a reason for hiding this comment

s1monw Apr 25, 2019

Choose a reason for hiding this comment

dnhatn Apr 26, 2019

Choose a reason for hiding this comment

s1monw Apr 25, 2019

Choose a reason for hiding this comment

dnhatn Apr 26, 2019

Choose a reason for hiding this comment

s1monw Apr 25, 2019

Choose a reason for hiding this comment

dnhatn Apr 26, 2019 • edited Loading

Choose a reason for hiding this comment

dnhatn commented Apr 26, 2019

dnhatn commented Apr 29, 2019

dnhatn Apr 26, 2019 •

edited

Loading