Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close Index API should force a flush if a sync is needed #37961

Merged
merged 1 commit into from
Jan 29, 2019

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jan 29, 2019

While working in #33888 I stumbled upon a trivial test case in which a no-op delete operation is executed on an empty index and this index is later closed and reopened as frozen, making the assertion added in #37426 to trip:

java.lang.AssertionError: max seq. no. [-1] does not match [5]
	at __randomizedtesting.SeedInfo.seed([B5AAFD71AAC09533]:0)
	at org.elasticsearch.index.engine.ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:141)
	at org.elasticsearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:115)
	at org.elasticsearch.index.engine.FrozenEngine.<init>(FrozenEngine.java:75)
	at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1431)
	at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1384)
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:424)
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95)
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302)
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93)
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1678)
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2228)

The assertion trips because the Translog is synced with max seq=X and global checkpoint=Y but those information are not flushed into the Lucene commit as no real operation has been executed and the IndexWriter has no uncommited changes.

The flushes executed by TransportVerifyShardBeforeCloseAction and IndexShard.close() won't persist the max seq no and global checkpoint in the Lucene commit, and when the ReadOnlyEngine is opened it will reload max seq no from the last Lucene commit and will detect a mismatch with the global checkpoint loaded from the translog.

This pull request changes the TransportVerifyShardBeforeCloseAction so that it forces a flush when the Translog.isSyncNeeded() method returns true, indicating that the global checkpoint != the last synced globalcheckpoint and also forces a Lucene commit with the same user data. Or we could always force the flush too.

@tlrx tlrx added >bug v7.0.0 :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v6.7.0 labels Jan 29, 2019
@tlrx tlrx requested a review from ywelsch January 29, 2019 10:14
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit 460f10c into elastic:master Jan 29, 2019
@tlrx tlrx deleted the force-flush-if-needed branch January 29, 2019 12:16
@tlrx
Copy link
Member Author

tlrx commented Jan 29, 2019

Thanks @ywelsch

tlrx added a commit that referenced this pull request Jan 29, 2019
This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a 
forced flush, forcing the translog and the Lucene commit to contain the same max seq 
number and global checkpoint in the case the Translog contains operations that were 
not written in the IndexWriter (like a Delete that touches a non existing doc). This way 
the assertion added in #37426 won't trip.

Related to #33888
@tlrx tlrx mentioned this pull request Jan 30, 2019
50 tasks
tlrx added a commit that referenced this pull request Feb 6, 2019
This commit changes the `TransportVerifyShardBeforeCloseAction` so that it 
always forces the flush of the shard. It seems that #37961 is not sufficient to 
ensure that the translog and the Lucene commit share the exact same max 
seq no and global checkpoint information in case of one or more noop 
operations have been made.

The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` 
and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial 
situation and they both fail 1 on 10 executions.

Relates to #33888
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Feb 6, 2019
…8401)

This commit changes the `TransportVerifyShardBeforeCloseAction` so that it 
always forces the flush of the shard. It seems that elastic#37961 is not sufficient to 
ensure that the translog and the Lucene commit share the exact same max 
seq no and global checkpoint information in case of one or more noop 
operations have been made.

The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` 
and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial 
situation and they both fail 1 on 10 executions.

Relates to elastic#33888
tlrx added a commit that referenced this pull request Feb 6, 2019
This commit changes the `TransportVerifyShardBeforeCloseAction` so that it 
always forces the flush of the shard. It seems that #37961 is not sufficient to 
ensure that the translog and the Lucene commit share the exact same max 
seq no and global checkpoint information in case of one or more noop 
operations have been made.

The `BulkWithUpdatesIT.testThatMissingIndexDoesNotAbortFullBulkRequest` 
and `FrozenIndexTests.testFreezeEmptyIndexWithTranslogOps` test this trivial 
situation and they both fail 1 on 10 executions.

Relates to #33888
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants