Specialize pre-closing checks for engine implementations #38702

tlrx · 2019-02-11T09:52:15Z

This pull request allows engine implementations to perform specialized sanity checks during the closing of index shards.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@**.com>

elasticmachine · 2019-02-11T09:52:17Z

Pinging @elastic/es-distributed

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/CloseFollowerIndexIT.java

ywelsch

I've left three smaller comments on naming and structure, looking good o.w.

...java/org/elasticsearch/action/admin/indices/close/TransportVerifyShardBeforeCloseAction.java

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

tlrx · 2019-02-11T10:36:45Z

Thanks @ywelsch - I've applied your feedback.

ywelsch

LGTM

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/CloseFollowerIndexIT.java

tlrx · 2019-02-11T13:27:27Z

Thanks @ywelsch and @martijnvg

The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>

…8722) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>

…8723) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>

…8727) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> This commit also contains #37426. Related #33888

Now the test `CloseFollowerIndexIT` has been added in #38702, it needs to be adapted for replicated closed indices. The test closes the follower index which is lagging behind the leader index. When it's closed, no sanity checks are executed because it's a follower index (this is a consequence of #38702). But with replicated closed indices, the index is reinitialized as a closed index with a `NoOpEngine` and such engines make strong assertions on the values of the maximum sequence number and the global checkpoint. Since the values do not match, the shards cannot be created and fail and the cluster health turns RED. This commit adapts the `CloseFollowerIndexIT` test so that it wraps the default `UncaughtExceptionHandler` with a handler that tolerates any exception thrown by `ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint()`. Replacing the default uncaught exception handler requires specific permissions, and instead of creating another gradle project it duplicates the `internalClusterTest` task to make it work without security manager for this specific test only. Relates to #33888

Now the test `CloseFollowerIndexIT` has been added in elastic#38702, it needs to be adapted for replicated closed indices. The test closes the follower index which is lagging behind the leader index. When it's closed, no sanity checks are executed because it's a follower index (this is a consequence of elastic#38702). But with replicated closed indices, the index is reinitialized as a closed index with a `NoOpEngine` and such engines make strong assertions on the values of the maximum sequence number and the global checkpoint. Since the values do not match, the shards cannot be created and fail and the cluster health turns RED. This commit adapts the `CloseFollowerIndexIT` test so that it wraps the default `UncaughtExceptionHandler` with a handler that tolerates any exception thrown by `ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint()`. Replacing the default uncaught exception handler requires specific permissions, and instead of creating another gradle project it duplicates the `internalClusterTest` task to make it work without security manager for this specific test only. Relates to elastic#33888

Specialize pre-closing checks for engine implementations

b9308b3

tlrx added >enhancement blocker v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.7.0 v8.0.0 v7.2.0 labels Feb 11, 2019

tlrx requested a review from ywelsch February 11, 2019 09:52

tlrx commented Feb 11, 2019

View reviewed changes

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/CloseFollowerIndexIT.java Show resolved Hide resolved

ywelsch suggested changes Feb 11, 2019

View reviewed changes

Apply feedback

680af38

tlrx requested a review from ywelsch February 11, 2019 10:36

ywelsch approved these changes Feb 11, 2019

View reviewed changes

x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/CloseFollowerIndexIT.java Show resolved Hide resolved

Refresh index2

b684a74

tlrx mentioned this pull request Feb 11, 2019

Allow indices to be closed without executing sanity checks #38609

Closed

tlrx merged commit 514a762 into elastic:master Feb 11, 2019

tlrx deleted the pre-close-checks branch February 11, 2019 13:27

tlrx added the backport pending label Feb 11, 2019

This was referenced Feb 11, 2019

Specialize pre-closing checks for engine implementations (#38702) #38722

Merged

Specialize pre-closing checks for engine implementations (#38702) #38723

Merged

tlrx mentioned this pull request Feb 11, 2019

Specialize pre-closing checks for engine implementations (#38702) #38727

Merged

tlrx removed the backport pending label Feb 12, 2019

tlrx mentioned this pull request Feb 12, 2019

Adapt CloseFollowerIndexIT for replicated closed indices #38767

Merged

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 12, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize pre-closing checks for engine implementations #38702

Specialize pre-closing checks for engine implementations #38702

tlrx commented Feb 11, 2019 •

edited

Loading

elasticmachine commented Feb 11, 2019

ywelsch left a comment

tlrx commented Feb 11, 2019

ywelsch left a comment

tlrx commented Feb 11, 2019

Specialize pre-closing checks for engine implementations #38702

Specialize pre-closing checks for engine implementations #38702

Conversation

tlrx commented Feb 11, 2019 • edited Loading

elasticmachine commented Feb 11, 2019

ywelsch left a comment

Choose a reason for hiding this comment

tlrx commented Feb 11, 2019

ywelsch left a comment

Choose a reason for hiding this comment

tlrx commented Feb 11, 2019

tlrx commented Feb 11, 2019 •

edited

Loading