-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy checkpoint atomically when rolling generation #35407
Copy checkpoint atomically when rolling generation #35407
Conversation
Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.
Pinging @elastic/es-distributed |
This situation occurred in https://discuss.elastic.co/t/failed-shard-after-ooming-corrupt-index/155612/3. I recognise there's no tests for this change yet, because I don't know a good way to simulate this situation. Any ideas? |
God I don’t know how often I looked at this code and I missed that?! Code LGTM, regarding testing I think we can add a randomly throwing FS impl that’s corrupting files if they are not fsynced. We do this in some lucene testing directories but that’s a bigger change. I am ok with getting this in as is and start the conversation on a follow up issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It terms of testing, we have various tests that inject errors in TranslogTests
. Did you check if we already cover the case where files can be created, we run into out of disk space and leave them empty?
Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.
Today when rolling a transog generation we copy the checkpoint from `translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by appropriate `fsync()` calls. The copy operation is not atomic, so if we crash at the wrong moment we can leave an incomplete checkpoint file on disk. In practice the checkpoint is so small that it's either empty or fully written. However, we do not correctly handle the case where it's empty when the node restarts. In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically. This commit extracts the atomic copy operation from `recoverFromFiles()` and re-uses it in `rollGeneration()`.
Today when rolling a transog generation we copy the checkpoint from
translog.ckp
totranslog-nnnn.ckp
using a simpleFiles.copy()
followed byappropriate
fsync()
calls. The copy operation is not atomic, so if we crashat the wrong moment we can leave an incomplete checkpoint file on disk. In
practice the checkpoint is so small that it's either empty or fully written.
However, we do not correctly handle the case where it's empty when the node
restarts.
In contrast, in
recoverFromFiles()
we do copy the checkpoint atomically.This commit extracts the atomic copy operation from
recoverFromFiles()
andre-uses it in
rollGeneration()
.