-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random FileNotFoundExceptions when performing a snapshot #7344
Comments
CheckIndex reports no errors |
@OlegYch can you double check that the directory |
yep, just did and they are all there, and restore worked except for the failed shards |
Can you send me logs from the node where these shards failed? |
the exceptions in the log are all like this (the full log is bloated with unrelated stuff):
suspiciously only two out of 6 nodes have failures and on those nodes it looks like all of shards have failed to snapshot |
If there is nothing else in the logs, the only scenario that I can think of that would lead to this failure is if during the snapshot process this nfs mount was somehow unavailable on these two nodes and it became available afterwards when you checked. Is this mount accessible to the user that elasticsearch is running under? |
indeed
fails on those nodes with Permission denied |
The snapshot directory has to be writable from all data nodes that contain primary shards of the indices that are getting snapshotted. So, I think fixing the permission issue would be the way to go. |
Since this particular issue was caused by incorrect permission settings, I am going to close it. In general, we are planning to address the confusion that this issue caused by adding repository validation as part of #7096. It should simplify troubleshooting of issues like this. |
i had a 1.2.2 cluster of 6 nodes with an nfs folder on one of the servers shared between all of them
when trying to do a snapshot i'm getting exceptions like
this only happens for some shards, others are snapshotted fine and i was able to restore them succesfully
after upgrade to 1.3.1 one of the indexes did not exhibit the problem anymore but the other one continued
since the upgrade i've created a few more indexes and now i'm getting same errors for them:
The text was updated successfully, but these errors were encountered: