Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows 10] file.AccessDeniedException on split or shrink index #30962

Closed
mleczo opened this issue May 30, 2018 · 2 comments
Closed

[Windows 10] file.AccessDeniedException on split or shrink index #30962

mleczo opened this issue May 30, 2018 · 2 comments
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates

Comments

@mleczo
Copy link

mleczo commented May 30, 2018

Describe the feature:

After splitting or shrinking an index on windows 10, there is a high probability of fail, which is caused by file access denied exception. in shards relocation. It is easiest to reproduce on operations which involve 100+ shards.
Also:
-im running ES as an administrator
-antivirus software is turned off
-indexing service is turned off
-bitlocker is suspended
-I have plenty of free disk space
-Path lengths are less than 200 characters
-it works with splitting to 20, and shrinking to 10 without problem. Failing start at 200/20.

This issue is not reproducible on linux.

Elasticsearch version :
Version: 6.2.4, Build: ccec39f/2018-04-12T20:37:28.497551Z, JVM: 10.0.1
Plugins installed: []

JVM version (java -version):
java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)

It was also tested on Java 9, still failing.

OS version (uname -a if on a Unix-like system):
Windows 10
Version 1709
Build 16299.431
Description of the problem including expected versus actual behavior:
After AccessDeniedException is thrown, node tries to relocate shard again, and exception is thrown again. Infinite loop.

Expected: no exception, shards are relocated successfully.

Looks like this problem occurs in relocation, but I could not reproduce it with having many shards in index, and just changing replica count.

Steps to reproduce:

  1. Clone this repository: https://github.com/mleczo/EsBugReproduction
  2. Download and unzip ES 6.2.4
  3. Replace elasticsearch.yml with the file provided in resources in clonned repository
  4. Make 5 copies of ES (it fails on 5 nodes, but 3 should do it too)
  5. Run those ES instances, they will cluster up.
  6. Edit main class:
    change node string to any node name from previously created cluster.

Application execution ends on linux, which means bug is not reproducible there
Expect following error on one of your nodes:
[2018-05-30T14:06:24,894][WARN ][o.e.i.c.IndicesClusterStateService] [mocny-node] [[testn][56]] marking and sending shard failed due to [failed recovery] org.elasticsearch.indices.recovery.RecoveryFailedException: [testn][56]: Recovery failed from {CSKrE5m}{CSKrE5mRQNuCY3uAe7kaXw}{FYRNyfarT-u901GidtbL5A}{127.0.0.1}{127.0.0.1:9302} into {mocny-node}{Rll4AqOKTk2edEAgygrkgA}{9L6cE39HSEKmEuitsXD0QA}{127.0.0.1}{127.0.0.1:9304} at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:288) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:635) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:844) [?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [CSKrE5m][127.0.0.1:9302][internal:index/shard/recovery/start_recovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:175) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [329.6kb] at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:419) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:173) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?] Caused by: org.elasticsearch.transport.RemoteTransportException: [mocny-node][127.0.0.1:9304][internal:index/shard/recovery/clean_files] Caused by: java.nio.file.AccessDeniedException: C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\recovery.5ZIwTzGEQOWfTSkbEPhV6Q._1.cfs -> C:\Users\tomasz_mielczarski\Downloads\todelete\resources\instance0\data\nodes\0\indices\ON8U4tNwQem0t3yWzX6hZA\56\index\_1.cfs at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?] at java.nio.file.Files.move(Files.java:1413) ~[?:?] at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43] at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:441) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:565) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:559) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1555) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) ~[elasticsearch-6.2.4.jar:6.2.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.2.4.jar:6.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:844) ~[?:?]

@colings86 colings86 added >bug :Data Management/Indices APIs APIs to create and manage indices and templates labels May 30, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@DaveCTurner
Copy link
Contributor

Thanks for the report @mleczo and for the effort you have put into helping us to reproduce this.

I think we also see this occur in our test suite, as discussed in #33857. The issue is that Elasticsearch expects it can hard-link an open file, then remove it, and then replace it with a different file, but this is not permitted on Windows.

A possible workaround is to try disabling rebalancing before the shrink and enabling it again only after all the shards of the target index are allocated and the source index has been deleted.

I am closing this issue as a duplicate of #33857.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates
Projects
None yet
Development

No branches or pull requests

4 participants