You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After splitting or shrinking an index on windows 10, there is a high probability of fail, which is caused by file access denied exception. in shards relocation. It is easiest to reproduce on operations which involve 100+ shards.
Also:
-im running ES as an administrator
-antivirus software is turned off
-indexing service is turned off
-bitlocker is suspended
-I have plenty of free disk space
-Path lengths are less than 200 characters
-it works with splitting to 20, and shrinking to 10 without problem. Failing start at 200/20.
JVM version (java -version):
java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
It was also tested on Java 9, still failing.
OS version (uname -a if on a Unix-like system):
Windows 10
Version 1709
Build 16299.431 Description of the problem including expected versus actual behavior:
After AccessDeniedException is thrown, node tries to relocate shard again, and exception is thrown again. Infinite loop.
Expected: no exception, shards are relocated successfully.
Looks like this problem occurs in relocation, but I could not reproduce it with having many shards in index, and just changing replica count.
Replace elasticsearch.yml with the file provided in resources in clonned repository
Make 5 copies of ES (it fails on 5 nodes, but 3 should do it too)
Run those ES instances, they will cluster up.
Edit main class:
change node string to any node name from previously created cluster.
Application execution ends on linux, which means bug is not reproducible there Expect following error on one of your nodes: [2018-05-30T14:06:24,894][WARN ][o.e.i.c.IndicesClusterStateService] [mocny-node] [[testn][56]] marking and sending shard failed due to [failed recovery] org.elasticsearch.indices.recovery.RecoveryFailedException: [testn][56]: Recovery failed from {CSKrE5m}{CSKrE5mRQNuCY3uAe7kaXw}{FYRNyfarT-u901GidtbL5A}{127.0.0.1}{127.0.0.1:9302} into {mocny-node}{Rll4AqOKTk2edEAgygrkgA}{9L6cE39HSEKmEuitsXD0QA}{127.0.0.1}{127.0.0.1:9304} at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:288) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:635) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:844) [?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [CSKrE5m][127.0.0.1:9302][internal:index/shard/recovery/start_recovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [329.6kb] at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:419) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:173) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [mocny-node][127.0.0.1:9304][internal:index/shard/recovery/clean_files] Caused by: java.nio.file.AccessDeniedException: C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\recovery.5ZIwTzGEQOWfTSkbEPhV6Q._1.cfs -> C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\_1.cfs at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?] at java.nio.file.Files.move(Files.java:1413) ~[?:?] at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:441) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:565) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:559) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?]
The text was updated successfully, but these errors were encountered:
Thanks for the report @mleczo and for the effort you have put into helping us to reproduce this.
I think we also see this occur in our test suite, as discussed in #33857. The issue is that Elasticsearch expects it can hard-link an open file, then remove it, and then replace it with a different file, but this is not permitted on Windows.
A possible workaround is to try disabling rebalancing before the shrink and enabling it again only after all the shards of the target index are allocated and the source index has been deleted.
Describe the feature:
After splitting or shrinking an index on windows 10, there is a high probability of fail, which is caused by file access denied exception. in shards relocation. It is easiest to reproduce on operations which involve 100+ shards.
Also:
-im running ES as an administrator
-antivirus software is turned off
-indexing service is turned off
-bitlocker is suspended
-I have plenty of free disk space
-Path lengths are less than 200 characters
-it works with splitting to 20, and shrinking to 10 without problem. Failing start at 200/20.
This issue is not reproducible on linux.
Elasticsearch version :
Version: 6.2.4, Build: ccec39f/2018-04-12T20:37:28.497551Z, JVM: 10.0.1
Plugins installed: []
JVM version (
java -version
):java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
It was also tested on Java 9, still failing.
OS version (
uname -a
if on a Unix-like system):Windows 10
Version 1709
Build 16299.431
Description of the problem including expected versus actual behavior:
After AccessDeniedException is thrown, node tries to relocate shard again, and exception is thrown again. Infinite loop.
Expected: no exception, shards are relocated successfully.
Looks like this problem occurs in relocation, but I could not reproduce it with having many shards in index, and just changing replica count.
Steps to reproduce:
change node string to any node name from previously created cluster.
Application execution ends on linux, which means bug is not reproducible there
Expect following error on one of your nodes:
[2018-05-30T14:06:24,894][WARN ][o.e.i.c.IndicesClusterStateService] [mocny-node] [[testn][56]] marking and sending shard failed due to [failed recovery] org.elasticsearch.indices.recovery.RecoveryFailedException: [testn][56]: Recovery failed from {CSKrE5m}{CSKrE5mRQNuCY3uAe7kaXw}{FYRNyfarT-u901GidtbL5A}{127.0.0.1}{127.0.0.1:9302} into {mocny-node}{Rll4AqOKTk2edEAgygrkgA}{9L6cE39HSEKmEuitsXD0QA}{127.0.0.1}{127.0.0.1:9304} at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:288) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:635) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:844) [?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [CSKrE5m][127.0.0.1:9302][internal:index/shard/recovery/start_recovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [329.6kb] at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:419) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:173) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [mocny-node][127.0.0.1:9304][internal:index/shard/recovery/clean_files] Caused by: java.nio.file.AccessDeniedException: C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\recovery.5ZIwTzGEQOWfTSkbEPhV6Q._1.cfs -> C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\_1.cfs at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?] at java.nio.file.Files.move(Files.java:1413) ~[?:?] at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:441) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:565) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:559) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?]
The text was updated successfully, but these errors were encountered: