Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

Closed
alpar-t opened this issue Sep 19, 2018 · 30 comments
Closed

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

alpar-t opened this issue Sep 19, 2018 · 30 comments
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI v5.6.12 v6.7.0 v7.0.0-beta1

Comments

@alpar-t
Copy link
Contributor

alpar-t commented Sep 19, 2018

5.6

FAILURE 33.4s J0 | ShrinkIndexIT.testCreateShrinkIndexToN <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: timed out waiting for green state
   > 	at __randomizedtesting.SeedInfo.seed([FAAB0749B4C53E6F:EAEB4985116575FC]:0)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:933)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:899)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:888)
   > 	at org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testCreateShrinkIndexToN(ShrinkIndexIT.java:115)
   > 	at java.lang.Thread.run(Thread.java:748)

and

 Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\recovery.AWXo49VofFhOtlDMUmb5._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs
  1> 	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1> 	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1> 	at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1> 	at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:319) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:181) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:406) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:486) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[main/:?]
  1> 	at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:390) ~[main/:?]
  1> 	... 5 more

[...]

1> java.io.IOException: Could not remove the following files (in the order of attempts):
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_1.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_1.cfs
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index
  2> REPRODUCE WITH: gradlew :core:integTest -Dtests.seed=FAAB0749B4C53E6F -Dtests.class=org.elasticsearch.action.admin.indices.create.ShrinkIndexIT -Dtests.method="testCreateShrinkIndexToN" -Dtests.security.manager=true -Dtests.locale=zh -Dtests.timezone=Etc/GMT+10
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg
  1> 
  1> 	at org.apache.lucene.util.IOUtils.rm(IOUtils.java:329) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectoryUnderLock(NodeEnvironment.java:558) ~[main/:?]
  1> 	at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:542) ~[main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteIndexStoreIfDeletionAllowed(IndicesService.java:652) [main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:639) [main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:634) [main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteUnassignedIndex(IndicesService.java:602) [main/:?]
  1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:263) [main/:?]
  1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:192) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [main/:?]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [main/:?]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

It also happened on 6.4, 6.x and master

I was not able to reproduce it on a Windows 2012 VM.
There's no evidence of similar failures in the past, nor recent changes to the 5.6
that are obviously related.

@alpar-t alpar-t added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI v7.0.0 v6.5.0 v5.6.12 labels Sep 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@s1monw
Copy link
Contributor

s1monw commented Sep 19, 2018

@jasontedor @bleskes any ideas here? I looked but it's really puzzeling

@dnhatn dnhatn added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed :Delivery/Build Build or test infrastructure labels Sep 20, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch
Copy link
Contributor

ywelsch commented Sep 20, 2018

Relates #30962

@tvernum
Copy link
Contributor

tvernum commented Sep 20, 2018

It's failed again.

I'm going to mute it on Windows unless someone says they need to capture more failure logs.

@ywelsch
Copy link
Contributor

ywelsch commented Sep 20, 2018

I've reenabled the test on master (db32781) and added DEBUG logging for org.elasticsearch.index.store which will possibly tell us whether the move failed because a previous delete in renameTempFilesSafe (where we catch exception and log as debug) failed.
For the person on test triage: If this test fails again on CI, please link the latest failure here.

@romseygeek
Copy link
Contributor

@ywelsch
Copy link
Contributor

ywelsch commented Sep 21, 2018

Unfortunately that was on 5.6 (which does not have the extended logging, and which we forgot to mute). I've muted the test on 5.6 for now (a6aa773).

@romseygeek
Copy link
Contributor

@romseygeek
Copy link
Contributor

PartitionedRoutingIT.testShrinking seems to have a similar issue:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/16/console

@mayya-sharipova
Copy link
Contributor

Similar failures on 5.6, 6.x and master in the last 12 hours.

gradlew :core:integTest \
  -Dtests.seed=41732671CC43CCD8 \
  -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \
  -Dtests.method="testShrinking" \
  -Dtests.security.manager=true \
  -Dtests.locale=ar-MA \
  -Dtests.timezone=Europe/Tirane

seems to be caused by AccessDeniedException

Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J3\temp\org.elasticsearch.routing.PartitionedRoutingIT_22FF5B1F0934DA81-001\tempDir-002\data\nodes\0\indices\nCj_PB_1SvemMgYmkmpqNw\1\index\recovery.XyXJa1iLT_eQAR39YSXIKg._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J3\temp\org.elasticsearch.routing.PartitionedRoutingIT_22FF5B1F0934DA81-001\tempDir-002\data\nodes\0\indices\nCj_PB_1SvemMgYmkmpqNw\1\index\_0.cfs

@cbuescher
Copy link
Member

fyi. recent master change built was: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/19/console
Trying to get to the machine and take a look at the logs now.

@cbuescher
Copy link
Member

@droberts195
Copy link
Contributor

@jasontedor jasontedor assigned DaveCTurner and unassigned ywelsch Sep 27, 2018
@DaveCTurner
Copy link
Contributor

I spent some time trying to reproduce this by hand on one of the Windows CI workers. Although I never saw the test fail, I did see tests reporting the failures of recoveries for the reason quoted in the OP:

  1> org.elasticsearch.indices.recovery.RecoveryFailedException: [second_shrink][0]: Recovery failed from {node_sd2}{TO36W-4uR0GLz8RwgnmWXg}{vtq1HlXqTy-5ut_hiS2-TQ}{127.0.0.1}{127.0.0.1:52206} into {node_sd1}{VwtnN_EuS8aKva_oys857Q}{OCu18ksZRdKzsOChRaIFkw}{127.0.0.1}{127.0.0.1:52177}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:283) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:628) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  1>    at java.lang.Thread.run(Thread.java:844) [?:?]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:52206][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:99) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:51) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:108) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:105) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [7] files with total size of [8kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:422) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:99) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:51) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:108) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:105) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd1][127.0.0.1:52177][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_3825ACCB7B6BF7DB-001\tempDir-002\data\nodes\1\indices\4EzGadc6Rpa77FYUbJVhvA\0\index\recovery.uKRsdtvIQbmCSaJo9IlMzA._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_3825ACCB7B6BF7DB-001\tempDir-002\data\nodes\1\indices\4EzGadc6Rpa77FYUbJVhvA\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:40]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:40]
  1>    at java.nio.file.Files.move(Files.java:1413) ~[?:?]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:444) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:561) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:555) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]

Here is a copy of the console output, including repeated test runs (of which the stack trace above is the last). This tarball also includes a Process Monitor log from this run, filtered to just the relevant paths: 33857.tar.gz

Interesting things to note from the process monitor log are:

  1. There's no other process involved here. I'm pretty sure that if this were Windows Defender then it'd also appear in this log.
  2. There seem to be repeated attempts to remove _0.cfs which result in CANNOT DELETE
  3. Finally we can see the rename results in ACCESS DENIED.

I do not yet have a good idea for a next step on this.

@DaveCTurner
Copy link
Contributor

On repeated runs I do see occasional failures as per the OP (maybe 20% of the time, I've not done the stats in detail yet). I tried writing a test case that repeatedly hits the filesystem directly and could not reproduce this AccessDeniedException, so decided to turn my attention to all the Lucene-provided filesystem machinery that's in play here to see if I could rule that out as a source. The filesystem provider in use in a failing test is

HandleLimitFS(LeakFS(ShuffleFS(DisableFsyncFS(sun.nio.fs.WindowsFileSystemProvider@2d1a5acc))))

I was suspicious about that DisableFsyncFS in there (the rest look benign) so I added @SuppressFileSystems(value = "DisableFsyncFS") to the ShrinkIndexIT fixture, but still got failures. I then tried @SuppressFileSystems(value = "*") and still it fails - the filesystem provider in this case is confirmed to be a plain sun.nio.fs.WindowsFileSystemProvider. The story continues...

@danielmitterdorfer
Copy link
Member

We have more test failures on Windows with the same cause:

5.6: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/27/console:

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

[2018-09-27T18:57:01,933][WARN ][o.e.c.a.s.ShardStateAction] [node_s0] [index_2][0] received shard failed for shard id [[index_2][0]], allocation id [0Jyeijx5QoeBLzJ85Ym4uw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[index_2][0]: Recovery failed from {node_s1}{UFRWgSweS6uUDivYiTC1-w}{4v-Ak5QCQ62QpeiZyzwK8w}{local}{local[368]} into {node_s0}{Povm-pMfSAaqPh8OcDkjHg}{3UCCMKflQTOz9cF4-uejCA}{local}{local[366]}]; nested: RemoteTransportException[[node_s0][local[366]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [25] files with total size of [36.4kb]]; nested: RemoteTransportException[[node_s1][local[368]][internal:index/shard/recovery/clean_files]]; nested: AccessDeniedException[C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\recovery.AWYcZDdo0i__cHMcxiHN._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\_0.cfs]; ]
[...]
Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\recovery.AWYcZDdo0i__cHMcxiHN._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\_0.cfs
  at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:229) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:319) ~[main/:?]
  at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:181) ~[main/:?]
  at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:406) ~[main/:?]
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:486) ~[main/:?]
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[main/:?]
  at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[main/:?]
  at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[main/:?]
  at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:390) ~[main/:?]

6.x: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/27/console

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [index_4][1]: Recovery failed from {node_s0}{7C-fb4R9RK-WS4Jk1WYXuQ}{BijPdN92Q32zPDZ9t_ldiA}{127.0.0.1}{127.0.0.1:55085} into {node_s1}{RqyKMgp3QZ-gu0IVK_U0AA}{I5djfftGTtmtr6bQbOAUJA}{127.0.0.1}{127.0.0.1:55086}
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:624) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s0][127.0.0.1:55089][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]
  1> 	... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [13] files with total size of [19.3kb]
  1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:424) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]
  1> 	... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s1][127.0.0.1:55090][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.x+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_CAE88D19276CD2B2-001\tempDir-002\data\nodes\1\indices\H6TUqJpRTRedJr1z26jubw\1\index\recovery.fshzJJ1CQCelF3mD6b587Q._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.x+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_CAE88D19276CD2B2-001\tempDir-002\data\nodes\1\indices\H6TUqJpRTRedJr1z26jubw\1\index\_0.cfs
  1> 	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1> 	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:16]
  1> 	at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1> 	at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) ~[lucene-test-framework-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:334) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:452) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:557) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:551) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]

6.4: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-windows-compatibility/28/console

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [index_4][3]: Recovery failed from {node_sd1}{O8l8bMuJTv2FfmOwRxXWgg}{1m7utzhqQjWidDUawrikfQ}{127.0.0.1}{127.0.0.1:63944} into {node_sd2}{IBcohRX6RB2qHr4S6wDBUQ}{Fa3f-veRRFOWmrrADglJ9A}{127.0.0.1}{127.0.0.1:63945}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd1][127.0.0.1:63944][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [4] files with total size of [4.5kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [4] files with total size of [4.5kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:63945][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\2\indices\SI6IoZJ5R_Wb4Nqx06oDmA\3\index\recovery.CB9P345NRmmuu-tPzwjksg._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\2\indices\SI6IoZJ5R_Wb4Nqx06oDmA\3\index\_0.cfs

org.elasticsearch.action.admin.indices.create.SplitIndexIT#testCreateSplitIndexToN():

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [second_split][0]: Recovery failed from {node_sd2}{hcRrVZwPQ7e8MTakDgz_GQ}{wfXHyRV6RPW0sKTXdamhdA}{127.0.0.1}{127.0.0.1:58177} into {node_sd3}{VlMAJBDSTemm3IJZgPciLA}{zPecWbBiSWaagZwl7Q1l6g}{127.0.0.1}{127.0.0.1:58171}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:58177][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [5.6kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd3][127.0.0.1:58171][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\3\indices\xKk-zXSXQsGpeMmq5ysVpA\0\index\recovery._dztPVO1RBKeP7ZpRHIbiA._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\3\indices\xKk-zXSXQsGpeMmq5ysVpA\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:337) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:439) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:556) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:550) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]  

@DaveCTurner
Copy link
Contributor

Looking harder at the logs, what happens is:

  • we allocate two primary shards of second_shrink to one node
  • they recover from local shards (this is the shrink)
  • as soon as they start they are rebalanced to the other node
  • then we update the number of replicas to 1, allocating replicas on the first node

However we failed to delete the unassigned shards from the first node after the rebalance (at least twice, maybe it succeeds, we're still looking). For some reason when we try and allocate a shard on top of the old directory it then fails with the AccessDeniedException.

@DaveCTurner
Copy link
Contributor

Note that we're using MMapDirectory here. Setting index.store.type: simplefs yielded no failure in 10 iterations.

@DaveCTurner
Copy link
Contributor

I have tried to find a way to reproduce this with fewer moving parts, but so far this has failed. The following test passes.

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */
package org.elasticsearch.action.admin.indices.create;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.IndexOutput;
import org.elasticsearch.Version;
import org.elasticsearch.cluster.metadata.IndexMetaData;
import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.shard.ShardId;
import org.elasticsearch.index.store.Store;
import org.elasticsearch.test.DummyShardLock;
import org.elasticsearch.test.ESTestCase;
import org.elasticsearch.test.junit.annotations.TestLogging;
import org.elasticsearch.threadpool.TestThreadPool;
import org.elasticsearch.threadpool.ThreadPool;

import java.io.IOException;
import java.nio.file.Path;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;

import static org.elasticsearch.cluster.metadata.IndexMetaData.INDEX_NUMBER_OF_REPLICAS_SETTING;
import static org.elasticsearch.cluster.metadata.IndexMetaData.INDEX_NUMBER_OF_SHARDS_SETTING;
import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_INDEX_VERSION_CREATED;
import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_SHARDS;
import static org.elasticsearch.index.store.Store.INDEX_STORE_STATS_REFRESH_INTERVAL_SETTING;

@TestLogging("org.elasticsearch:TRACE,index.store.deletes:TRACE")
public class WindowsFilesystemTests extends ESTestCase {
    public void testWindowsFilesystem() {
        ThreadPool threadPool = new TestThreadPool("test");
        try {
            final Path tempDir1 = createTempDir();
            final IOContext ioContext = new IOContext();
            final int attemptCount = 1000;
            final CountDownLatch countDownLatch = new CountDownLatch(attemptCount);

            try (Directory rawDirectory = FSDirectory.open(tempDir1)) {
                logger.info("rawDirectory = {}", rawDirectory);

                final ShardId shardId = new ShardId("foo", "bar", 0);
                final Store store = new Store(shardId,
                    new IndexSettings(IndexMetaData.builder(shardId.getIndexName())
                        .settings(Settings.builder()
                            .put(INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), 1)
                            .put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 0)
                            .put(SETTING_INDEX_VERSION_CREATED.getKey(), Version.CURRENT)
                            .put(INDEX_STORE_STATS_REFRESH_INTERVAL_SETTING.getKey(), TimeValue.ZERO)).build(), Settings.EMPTY),
                    rawDirectory, new DummyShardLock(shardId));

                final AtomicBoolean queryStatsThreadCancelled = new AtomicBoolean();
                final Thread queryStatsThread = new Thread(() -> {
                    while (queryStatsThreadCancelled.get() == false) {
                        try {
                            store.stats();
                        } catch (IOException e) {
                            throw new AssertionError("unexpected", e);
                        }
                    }
                });
                queryStatsThread.start();

                final Directory directory = store.directory();
                logger.info("directory = {}", directory);

                for (int i = 0; i < attemptCount; i++) {
                    for (int j = 1; j <= 2; j++) {
                        try (IndexOutput indexOutput = directory.createOutput("test-" + i + "-" + j + ".dat", ioContext)) {
                            int longCount = randomIntBetween(100, 10000);
                            for (int k = 0; k < longCount; k++) {
                                indexOutput.writeLong(randomLong());
                            }
                        }
                    }
                }

                for (int i = 0; i < attemptCount; i++) {
                    final String fileName1 = "test-" + i + "-1.dat";
                    final String fileName2 = "test-" + i + "-2.dat";

                    threadPool.generic().execute(() -> {
                        try {
                            directory.deleteFile(fileName1);
                            Thread.sleep(randomIntBetween(1, 50));
                        } catch (IOException | InterruptedException e) {
                            countDownLatch.countDown();
                            throw new AssertionError("unexpected", e);
                        }

                        threadPool.generic().execute(() -> {
                            try {
                                directory.rename(fileName2, fileName1);
                            } catch (IOException e) {
                                countDownLatch.countDown();
                                throw new AssertionError("unexpected", e);
                            }

                            threadPool.generic().execute(() -> {
                                try {
                                    directory.deleteFile(fileName1);
                                } catch (IOException e) {
                                    throw new AssertionError("unexpected", e);
                                } finally {
                                    countDownLatch.countDown();
                                }
                            });
                        });
                    });
                }

                countDownLatch.await();
                queryStatsThreadCancelled.set(true);
                queryStatsThread.join();
            }
        } catch (IOException | InterruptedException e) {
            throw new AssertionError("exception", e);
        } finally {
            threadPool.shutdown();
        }
    }
}

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 28, 2018

One interesting side-question is why this happens more on tests to do with shrinking. I think this is the answer:

// we only try once and then give up with a shrink index
.put("index.allocation.max_retries", 1);

In other cases, I guess an allocation that fails like this will retry (up to 5 times) and subsequent attempts will succeed.


Correction: subsequent attempts to allocate a copy of the shrunk index onto the node containing the source index will, I think, all fail, because of #33857 (comment) (unless either the source index is removed or closed, or else all the shared segments are merged away, neither of which seems very likely in this test).

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 28, 2018

To test this hypothesis, I tried writing a test that repeatedly allocates a shard to a node and then removes it, hoping to hit an allocation failure, with max retries set to 0:

@ClusterScope(scope = TEST, numDataNodes = 2)
@TestLogging("org.elasticsearch:TRACE,index.store.deletes:TRACE")
public class ReliableRecoveryIT extends ESIntegTestCase {
    public void testReliableRecovery() {
        prepareCreate("index").setSettings(Settings.builder().put(indexSettings())
            .put(INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), 1)
            .put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1)
            .put(SETTING_ALLOCATION_MAX_RETRY.getKey(), 0))
            .get();

        ensureGreen("index");

        for (int i = 0; i < 20; i++) {
            internalCluster().client().prepareBulk()
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .setRefreshPolicy(IMMEDIATE)
                .get();
        }

        for (int i = 0; i < 10; i++) {
            logger.info("--> removing replica [{}]", i);
            internalCluster().client().admin().indices().prepareUpdateSettings("index").setSettings(
                Settings.builder().put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 0)).get();
            logger.info("--> adding replica [{}]", i);
            internalCluster().client().admin().indices().prepareUpdateSettings("index").setSettings(
                Settings.builder().put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1)).get();
            logger.info("--> awaiting green [{}]", i);
            ensureGreen("index");
        }
    }
}

This passes. I'm mystified.

@bleskes
Copy link
Contributor

bleskes commented Sep 28, 2018

One interesting side-question is why this happens more on tests to do with shrinking. I think this is the answer:

Good catch. We should clean these up IMO.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 28, 2018

Ok, I've got it 🎉

The following test passes on Windows (i.e. throws an AccessDeniedException on the rename). It seems that if you create a hard link to a file, and then delete the link, but the original file is still open, then you can't rename something else into the place where the hard link used to be.

    public void testWindowsFilesystem() throws IOException {
        final Path directoryPath = createTempDir();
        final IOContext ioContext = new IOContext();

        try (Directory directory = FSDirectory.open(directoryPath)) {
            logger.info("directory = {}", directory);

            final String fileName1 = "test-1.dat";
            final String fileName2 = "test-2.dat";
            final String fileName3 = "test-3.dat";

            Files.write(directoryPath.resolve(fileName1), new byte[]{0}); // must be nonempty
            Files.write(directoryPath.resolve(fileName2), new byte[0]); // empty file

            try (IndexInput ignored = directory.openInput(fileName1, ioContext)) {
                Files.createLink(directoryPath.resolve(fileName3), directoryPath.resolve(fileName1));
                directory.deleteFile(fileName3);
                expectThrows(AccessDeniedException.class, () -> directory.rename(fileName2, fileName3));
            }
        }
    }

More precisely, attempts to delete fileName3 while fileName1 is still open will fail with AccessDeniedException, but we suppress these inside FSDirectory.

This only happens when using MMapDirectory and not SimpleFSDirectory.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Oct 1, 2018

Still experiencing failure today on 6.4

Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-windows-compatibility/34/console

REPRODUCE WITH: gradlew :server:integTest
-Dtests.seed=544A239B8224F7C5
-Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT
-Dtests.method="testShrinking"
-Dtests.security.manager=true
-Dtests.locale=en-CA
-Dtests.timezone=SystemV/YST9YDT

and 6.x
Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/35/console

07:44:40 FAILURE 33.3s J2 | PartitionedRoutingIT.testShrinking <<< FAILURES!
07:44:40 > Throwable #1: java.lang.AssertionError: timed out waiting for green state
07:44:40 > at __randomizedtesting.SeedInfo.seed([544A239B8224F7C5:AE69E6F0A5B2602E]:0)
07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:967)
07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:923)
07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:912)
07:44:40 > at org.elasticsearch.routing.PartitionedRoutingIT.testShrinking(PartitionedRoutingIT.java:96)
07:44:40 > at java.lang.Thread.run(Thread.java:748)

@mayya-sharipova
Copy link
Contributor

I have muted PartitionedRoutingIT:: testShrinking test on 6.4 and 6.x on Windows

@hub-cap
Copy link
Contributor

hub-cap commented Oct 5, 2018

@original-brownbear
Copy link
Member

master still failing similarly:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/42/console
REPRODUCE WITH: gradlew :server:integTest \
  -Dtests.seed=7379D1534EE3AFC8 \
  -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \
  -Dtests.method="testShrinking" \
  -Dtests.security.manager=true \
  -Dtests.locale=pl-PL \
  -Dtests.timezone=US/Arizona \
  -Dcompiler.java=11 \
  -Druntime.java=8

@DaveCTurner DaveCTurner removed their assignment Oct 15, 2018
@DaveCTurner
Copy link
Contributor

We've worked out what's going on here so I've unassigned myself from this issue. Similar failures on Windows should be muted until we work out what to do with it - no need to report them here.

@ywelsch
Copy link
Contributor

ywelsch commented Feb 1, 2019

This is a Windows limitation with MMapDirectory we cannot do anything about -> Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test-failure Triaged test failures from CI v5.6.12 v6.7.0 v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests