[Windows 10] file.AccessDeniedException on split or shrink index #30962

mleczo · 2018-05-30T12:39:38Z

Describe the feature:

After splitting or shrinking an index on windows 10, there is a high probability of fail, which is caused by file access denied exception. in shards relocation. It is easiest to reproduce on operations which involve 100+ shards.
Also:
-im running ES as an administrator
-antivirus software is turned off
-indexing service is turned off
-bitlocker is suspended
-I have plenty of free disk space
-Path lengths are less than 200 characters
-it works with splitting to 20, and shrinking to 10 without problem. Failing start at 200/20.

This issue is not reproducible on linux.

Elasticsearch version :
Version: 6.2.4, Build: ccec39f/2018-04-12T20:37:28.497551Z, JVM: 10.0.1
Plugins installed: []

JVM version (java -version):
java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)

It was also tested on Java 9, still failing.

OS version (uname -a if on a Unix-like system):
Windows 10
Version 1709
Build 16299.431
Description of the problem including expected versus actual behavior:
After AccessDeniedException is thrown, node tries to relocate shard again, and exception is thrown again. Infinite loop.

Expected: no exception, shards are relocated successfully.

Looks like this problem occurs in relocation, but I could not reproduce it with having many shards in index, and just changing replica count.

Steps to reproduce:

Clone this repository: https://github.com/mleczo/EsBugReproduction
Download and unzip ES 6.2.4
Replace elasticsearch.yml with the file provided in resources in clonned repository
Make 5 copies of ES (it fails on 5 nodes, but 3 should do it too)
Run those ES instances, they will cluster up.
Edit main class:
change node string to any node name from previously created cluster.

Application execution ends on linux, which means bug is not reproducible there
Expect following error on one of your nodes:
[2018-05-30T14:06:24,894][WARN ][o.e.i.c.IndicesClusterStateService] [mocny-node] [[testn][56]] marking and sending shard failed due to [failed recovery] org.elasticsearch.indices.recovery.RecoveryFailedException: [testn][56]: Recovery failed from {CSKrE5m}{CSKrE5mRQNuCY3uAe7kaXw}{FYRNyfarT-u901GidtbL5A}{127.0.0.1}{127.0.0.1:9302} into {mocny-node}{Rll4AqOKTk2edEAgygrkgA}{9L6cE39HSEKmEuitsXD0QA}{127.0.0.1}{127.0.0.1:9304} at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:288) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:635) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:844) [?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [CSKrE5m][127.0.0.1:9302][internal:index/shard/recovery/start_recovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [329.6kb] at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:419) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:173) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [mocny-node][127.0.0.1:9304][internal:index/shard/recovery/clean_files] Caused by: java.nio.file.AccessDeniedException: C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\recovery.5ZIwTzGEQOWfTSkbEPhV6Q._1.cfs -> C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\_1.cfs at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?] at java.nio.file.Files.move(Files.java:1413) ~[?:?] at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:441) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:565) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:559) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-05-30T16:04:19Z

Pinging @elastic/es-core-infra

DaveCTurner · 2018-10-15T06:55:17Z

Thanks for the report @mleczo and for the effort you have put into helping us to reproduce this.

I think we also see this occur in our test suite, as discussed in #33857. The issue is that Elasticsearch expects it can hard-link an open file, then remove it, and then replace it with a different file, but this is not permitted on Windows.

A possible workaround is to try disabling rebalancing before the shrink and enabling it again only after all the shards of the target index are allocated and the source index has been deleted.

I am closing this issue as a duplicate of #33857.

colings86 added >bug :Data Management/Indices APIs APIs to create and manage indices and templates labels May 30, 2018

ywelsch mentioned this issue Sep 20, 2018

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

Closed

DaveCTurner closed this as completed Oct 15, 2018

pxsalehi mentioned this issue Feb 6, 2023

[CI] SplitIndexIT testCreateSplitIndexToN failing #92183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows 10] file.AccessDeniedException on split or shrink index #30962

[Windows 10] file.AccessDeniedException on split or shrink index #30962

mleczo commented May 30, 2018 •

edited

Loading

elasticmachine commented May 30, 2018

DaveCTurner commented Oct 15, 2018

[Windows 10] file.AccessDeniedException on split or shrink index #30962

[Windows 10] file.AccessDeniedException on split or shrink index #30962

Comments

mleczo commented May 30, 2018 • edited Loading

elasticmachine commented May 30, 2018

DaveCTurner commented Oct 15, 2018

mleczo commented May 30, 2018 •

edited

Loading