Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random FileNotFoundExceptions when performing a snapshot #7344

Closed
OlegYch opened this issue Aug 19, 2014 · 9 comments
Closed

random FileNotFoundExceptions when performing a snapshot #7344

OlegYch opened this issue Aug 19, 2014 · 9 comments
Assignees

Comments

@OlegYch
Copy link

OlegYch commented Aug 19, 2014

i had a 1.2.2 cluster of 6 nodes with an nfs folder on one of the servers shared between all of them
when trying to do a snapshot i'm getting exceptions like

CreateSnapshotResponse[snapshotInfo=SnapshotInfo[name=2014-08-11-16-31-04,state=PARTIAL,reason=<null>,indices=Object[][{my_idx3,my_idx2}],
startTime=1407774870154,endTime=1407775114709,totalShards=17,successfulShards=14,
shardFailures=Object[][{
[my_idx2][4] failed, reason [IndexShardSnapshotFailedException[[my_idx2][4] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx2/4/__0 (No such file or directory)]; ],
[my_idx2][3] failed, reason [IndexShardSnapshotFailedException[[my_idx2][3] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx2/3/__0 (No such file or directory)]; ],
[my_idx3][0] failed, reason [IndexShardSnapshotFailedException[[my_idx3][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx3/0/__0 (No such file or directory)]; ]}]],
headers=<null>,remoteAddress=inet[masternode/redacted:9300]]

this only happens for some shards, others are snapshotted fine and i was able to restore them succesfully
after upgrade to 1.3.1 one of the indexes did not exhibit the problem anymore but the other one continued
since the upgrade i've created a few more indexes and now i'm getting same errors for them:

CreateSnapshotResponse[snapshotInfo=SnapshotInfo[name=2014-08-19-00-57-20,state=PARTIAL,reason=<null>,indices=Object[][{my_idx3,my_idx_redacted_7,my_idx_redacted_7,my_idx_redacted_7,my_idx_redacted_7,my_idx_redacted_7,my_idx_redacted_7,my_idx_redacted_7}],startTime=1408410040861,endTime=1408410112552,totalShards=96,successfulShards=64,shardFailures=Object[][
{[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][1] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][1] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/1/__0 (No such file or directory)]; ],
[my_idx_redacted_7][6] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][6] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/6/__0 (No such file or directory)]; ],
[my_idx_redacted_7][7] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][7] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/7/__0 (No such file or directory)]; ],
[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][1] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][1] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/1/__0 (No such file or directory)]; ],
[my_idx_redacted_7][6] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][6] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/6/__0 (No such file or directory)]; ],
[my_idx_redacted_7][8] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][8] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/8/__0 (No such file or directory)]; ],
[my_idx_redacted_7][7] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][7] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/7/__0 (No such file or directory)]; ],
[my_idx_redacted_7][6] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][6] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/6/__0 (No such file or directory)]; ],
[my_idx_redacted_7][2] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][2] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/2/__2 (No such file or directory)]; ],
[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][6] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][6] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/6/__0 (No such file or directory)]; ],
[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][2] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][2] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/2/__0 (No such file or directory)]; ],
[my_idx_redacted_7][10] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][10] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/10/__2 (No such file or directory)]; ],
[my_idx_redacted_7][9] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][9] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/9/__0 (No such file or directory)]; ]
,[my_idx_redacted_7][3] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][3] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/3/__1 (No such file or directory)]; ],
[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][4] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][4] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/4/__2 (No such file or directory)]; ],
[my_idx_redacted_7][8] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][8] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/8/__0 (No such file or directory)]; ],
[my_idx_redacted_7][3] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][3] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/3/__1 (No such file or directory)]; ],
[my_idx_redacted_7][4] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][4] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/4/__3 (No such file or directory)]; ],
[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__2 (No such file or directory)]; ],
[my_idx_redacted_7][8] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][8] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/8/__1 (No such file or directory)]; ],
[my_idx_redacted_7][10] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][10] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/10/__0 (No such file or directory)]; ],
[my_idx3][7] failed, reason [IndexShardSnapshotFailedException[[my_idx3][7] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx3/7/__0 (No such file or directory)]; ],
[my_idx3][0] failed, reason [IndexShardSnapshotFailedException[[my_idx3][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx3/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][2] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][2] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/2/__0 (No such file or directory)]; ],
[my_idx_redacted_7][0] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/0/__0 (No such file or directory)]; ],
[my_idx_redacted_7][6] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][6] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/6/__0 (No such file or directory)]; ],
[my_idx_redacted_7][8] failed, reason [IndexShardSnapshotFailedException[[my_idx_redacted_7][8] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/home/shared_dir/indices/my_idx_redacted_7/8/__0 (No such file or directory)]; ]}]],
headers=<null>,remoteAddress=inet[redacted/redacted:9300]]
@OlegYch
Copy link
Author

OlegYch commented Aug 19, 2014

CheckIndex reports no errors

@imotov
Copy link
Contributor

imotov commented Aug 19, 2014

@OlegYch can you double check that the directory /home/shared_dir/indices/ exists and points to the same location on all nodes?

@OlegYch
Copy link
Author

OlegYch commented Aug 19, 2014

yep, just did and they are all there, and restore worked except for the failed shards

@imotov
Copy link
Contributor

imotov commented Aug 19, 2014

Can you send me logs from the node where these shards failed?

@OlegYch
Copy link
Author

OlegYch commented Aug 19, 2014

the exceptions in the log are all like this (the full log is bloated with unrelated stuff):

[2014-08-19 22:57:34,967][WARN ][snapshots                ] [redacted] [[my_idx_redacted_7][10]] [prod:2014-08-19-22-55-04] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: [my_idx_redacted_7][10] Failed to perform snapshot (index files)
        at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$SnapshotContext.snapshot(BlobStoreIndexShardRepository.java:489)
        at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.snapshot(BlobStoreIndexShardRepository.java:131)
        at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.snapshot(IndexShardSnapshotAndRestoreService.java:86)
        at org.elasticsearch.snapshots.SnapshotsService$6.run(SnapshotsService.java:829)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.FileNotFoundException: /home/shared_dir/indices/my_idx_redacted_7/10/__1 (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
        at org.elasticsearch.common.blobstore.fs.FsImmutableBlobContainer$1.run(FsImmutableBlobContainer.java:50)
        ... 3 more

suspiciously only two out of 6 nodes have failures and on those nodes it looks like all of shards have failed to snapshot
es and nfs have exactly the same configuration on all nodes in the cluster though, and overall the configuration is pretty much identical..

@imotov
Copy link
Contributor

imotov commented Aug 19, 2014

If there is nothing else in the logs, the only scenario that I can think of that would lead to this failure is if during the snapshot process this nfs mount was somehow unavailable on these two nodes and it became available afterwards when you checked. Is this mount accessible to the user that elasticsearch is running under?

@OlegYch
Copy link
Author

OlegYch commented Aug 20, 2014

indeed

sudo -u elasticsearch touch /home/shared_dir/hello

fails on those nodes with Permission denied
and 'id elasticsearch' reports different values even though ls -la reports that the directory is owned by the same user...
any idea how to make the snapshot under those circumstances ?

@imotov
Copy link
Contributor

imotov commented Aug 20, 2014

The snapshot directory has to be writable from all data nodes that contain primary shards of the indices that are getting snapshotted. So, I think fixing the permission issue would be the way to go.

@imotov imotov self-assigned this Aug 20, 2014
@imotov
Copy link
Contributor

imotov commented Aug 20, 2014

Since this particular issue was caused by incorrect permission settings, I am going to close it. In general, we are planning to address the confusion that this issue caused by adding repository validation as part of #7096. It should simplify troubleshooting of issues like this.

@imotov imotov closed this as completed Aug 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants