Copy checkpoint atomically when rolling generation #35407

DaveCTurner · 2018-11-09T09:29:06Z

Today when rolling a transog generation we copy the checkpoint from
translog.ckp to translog-nnnn.ckp using a simple Files.copy() followed by
appropriate fsync() calls. The copy operation is not atomic, so if we crash
at the wrong moment we can leave an incomplete checkpoint file on disk. In
practice the checkpoint is so small that it's either empty or fully written.
However, we do not correctly handle the case where it's empty when the node
restarts.

In contrast, in recoverFromFiles() we do copy the checkpoint atomically.
This commit extracts the atomic copy operation from recoverFromFiles() and
re-uses it in rollGeneration().

Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.

elasticmachine · 2018-11-09T09:29:08Z

Pinging @elastic/es-distributed

DaveCTurner · 2018-11-09T09:30:04Z

This situation occurred in https://discuss.elastic.co/t/failed-shard-after-ooming-corrupt-index/155612/3.

I recognise there's no tests for this change yet, because I don't know a good way to simulate this situation. Any ideas?

s1monw · 2018-11-09T11:01:29Z

God I don’t know how often I looked at this code and I missed that?! Code LGTM, regarding testing I think we can add a randomly throwing FS impl that’s corrupting files if they are not fsynced. We do this in some lucene testing directories but that’s a bigger change. I am ok with getting this in as is and start the conversation on a follow up issue

bleskes

LGTM. It terms of testing, we have various tests that inject errors in TranslogTests. Did you check if we already cover the case where files can be created, we run into out of disk space and leave them empty?

Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.

DaveCTurner added >bug v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v6.6.0 labels Nov 9, 2018

DaveCTurner requested a review from s1monw November 9, 2018 09:29

bleskes approved these changes Nov 9, 2018

View reviewed changes

s1monw approved these changes Nov 9, 2018

View reviewed changes

DaveCTurner added 3 commits November 21, 2018 09:56

Merge branch 'master' into 2018-11-09-copy-checkpoint-atomically

dc00c29

Add test for crash-resilience during a copy

1000605

Better test

7a8c8d3

DaveCTurner merged commit d01436d into elastic:master Nov 23, 2018

DaveCTurner deleted the 2018-11-09-copy-checkpoint-atomically branch November 27, 2018 15:30

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy checkpoint atomically when rolling generation #35407

Copy checkpoint atomically when rolling generation #35407

DaveCTurner commented Nov 9, 2018

elasticmachine commented Nov 9, 2018

DaveCTurner commented Nov 9, 2018

s1monw commented Nov 9, 2018

bleskes left a comment

Copy checkpoint atomically when rolling generation #35407

Copy checkpoint atomically when rolling generation #35407

Conversation

DaveCTurner commented Nov 9, 2018

elasticmachine commented Nov 9, 2018

DaveCTurner commented Nov 9, 2018

s1monw commented Nov 9, 2018

bleskes left a comment

Choose a reason for hiding this comment