ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

alpar-t · 2018-09-19T12:51:54Z

FAILURE 33.4s J0 | ShrinkIndexIT.testCreateShrinkIndexToN <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: timed out waiting for green state
   > 	at __randomizedtesting.SeedInfo.seed([FAAB0749B4C53E6F:EAEB4985116575FC]:0)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:933)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:899)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:888)
   > 	at org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testCreateShrinkIndexToN(ShrinkIndexIT.java:115)
   > 	at java.lang.Thread.run(Thread.java:748)

and

 Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\recovery.AWXo49VofFhOtlDMUmb5._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs
  1> 	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1> 	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1> 	at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1> 	at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:319) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:181) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:406) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:486) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[main/:?]
  1> 	at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:390) ~[main/:?]
  1> 	... 5 more

[...]

1> java.io.IOException: Could not remove the following files (in the order of attempts):
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_1.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_1.cfs
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index
  2> REPRODUCE WITH: gradlew :core:integTest -Dtests.seed=FAAB0749B4C53E6F -Dtests.class=org.elasticsearch.action.admin.indices.create.ShrinkIndexIT -Dtests.method="testCreateShrinkIndexToN" -Dtests.security.manager=true -Dtests.locale=zh -Dtests.timezone=Etc/GMT+10
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg
  1> 
  1> 	at org.apache.lucene.util.IOUtils.rm(IOUtils.java:329) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1> 	at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectoryUnderLock(NodeEnvironment.java:558) ~[main/:?]
  1> 	at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:542) ~[main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteIndexStoreIfDeletionAllowed(IndicesService.java:652) [main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:639) [main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:634) [main/:?]
  1> 	at org.elasticsearch.indices.IndicesService.deleteUnassignedIndex(IndicesService.java:602) [main/:?]
  1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:263) [main/:?]
  1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:192) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [main/:?]
  1> 	at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [main/:?]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [main/:?]
  1> 	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

It also happened on 6.4, 6.x and master

I was not able to reproduce it on a Windows 2012 VM.
There's no evidence of similar failures in the past, nor recent changes to the 5.6
that are obviously related.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-09-19T12:51:56Z

Pinging @elastic/es-core-infra

s1monw · 2018-09-19T13:38:47Z

@jasontedor @bleskes any ideas here? I looked but it's really puzzeling

elasticmachine · 2018-09-20T01:32:29Z

Pinging @elastic/es-distributed

ywelsch · 2018-09-20T07:20:43Z

Relates #30962

tvernum · 2018-09-20T07:23:14Z

It's failed again.

I'm going to mute it on Windows unless someone says they need to capture more failure logs.

Relates: #33857

ywelsch · 2018-09-20T16:19:07Z

I've reenabled the test on master (db32781) and added DEBUG logging for org.elasticsearch.index.store which will possibly tell us whether the move failed because a previous delete in renameTempFilesSafe (where we catch exception and log as debug) failed.
For the person on test triage: If this test fails again on CI, please link the latest failure here.

romseygeek · 2018-09-21T11:09:40Z

Another failure today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/14/console

Relates: #33857

ywelsch · 2018-09-21T11:36:48Z

Unfortunately that was on 5.6 (which does not have the extended logging, and which we forgot to mute). I've muted the test on 5.6 for now (a6aa773).

romseygeek · 2018-09-21T12:48:00Z

Now there's a master failure :)

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/14/console

romseygeek · 2018-09-22T09:10:16Z

PartitionedRoutingIT.testShrinking seems to have a similar issue:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/16/console

mayya-sharipova · 2018-09-24T07:28:35Z

Similar failures on 5.6, 6.x and master in the last 12 hours.

gradlew :core:integTest \
  -Dtests.seed=41732671CC43CCD8 \
  -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \
  -Dtests.method="testShrinking" \
  -Dtests.security.manager=true \
  -Dtests.locale=ar-MA \
  -Dtests.timezone=Europe/Tirane

seems to be caused by AccessDeniedException

Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J3\temp\org.elasticsearch.routing.PartitionedRoutingIT_22FF5B1F0934DA81-001\tempDir-002\data\nodes\0\indices\nCj_PB_1SvemMgYmkmpqNw\1\index\recovery.XyXJa1iLT_eQAR39YSXIKg._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J3\temp\org.elasticsearch.routing.PartitionedRoutingIT_22FF5B1F0934DA81-001\tempDir-002\data\nodes\0\indices\nCj_PB_1SvemMgYmkmpqNw\1\index\_0.cfs

cbuescher · 2018-09-24T10:43:42Z

fyi. recent master change built was: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/19/console
Trying to get to the machine and take a look at the logs now.

cbuescher · 2018-09-24T15:03:26Z

Another recent one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/18/console

droberts195 · 2018-09-27T08:23:35Z

I think this also affects SplitIndexIT.testSplitFromOneToN.

These two build logs contain the extra debugging info:

DaveCTurner · 2018-09-27T17:33:29Z

I spent some time trying to reproduce this by hand on one of the Windows CI workers. Although I never saw the test fail, I did see tests reporting the failures of recoveries for the reason quoted in the OP:

  1> org.elasticsearch.indices.recovery.RecoveryFailedException: [second_shrink][0]: Recovery failed from {node_sd2}{TO36W-4uR0GLz8RwgnmWXg}{vtq1HlXqTy-5ut_hiS2-TQ}{127.0.0.1}{127.0.0.1:52206} into {node_sd1}{VwtnN_EuS8aKva_oys857Q}{OCu18ksZRdKzsOChRaIFkw}{127.0.0.1}{127.0.0.1:52177}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:283) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:628) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  1>    at java.lang.Thread.run(Thread.java:844) [?:?]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:52206][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:99) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:51) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:108) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:105) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [7] files with total size of [8kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:422) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:99) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:51) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:108) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:105) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd1][127.0.0.1:52177][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_3825ACCB7B6BF7DB-001\tempDir-002\data\nodes\1\indices\4EzGadc6Rpa77FYUbJVhvA\0\index\recovery.uKRsdtvIQbmCSaJo9IlMzA._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_3825ACCB7B6BF7DB-001\tempDir-002\data\nodes\1\indices\4EzGadc6Rpa77FYUbJVhvA\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:40]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:40]
  1>    at java.nio.file.Files.move(Files.java:1413) ~[?:?]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:444) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:561) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:555) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]

Here is a copy of the console output, including repeated test runs (of which the stack trace above is the last). This tarball also includes a Process Monitor log from this run, filtered to just the relevant paths: 33857.tar.gz

Interesting things to note from the process monitor log are:

There's no other process involved here. I'm pretty sure that if this were Windows Defender then it'd also appear in this log.
There seem to be repeated attempts to remove _0.cfs which result in CANNOT DELETE
Finally we can see the rename results in ACCESS DENIED.

I do not yet have a good idea for a next step on this.

DaveCTurner · 2018-09-28T08:03:30Z

On repeated runs I do see occasional failures as per the OP (maybe 20% of the time, I've not done the stats in detail yet). I tried writing a test case that repeatedly hits the filesystem directly and could not reproduce this AccessDeniedException, so decided to turn my attention to all the Lucene-provided filesystem machinery that's in play here to see if I could rule that out as a source. The filesystem provider in use in a failing test is

HandleLimitFS(LeakFS(ShuffleFS(DisableFsyncFS(sun.nio.fs.WindowsFileSystemProvider@2d1a5acc))))

I was suspicious about that DisableFsyncFS in there (the rest look benign) so I added @SuppressFileSystems(value = "DisableFsyncFS") to the ShrinkIndexIT fixture, but still got failures. I then tried @SuppressFileSystems(value = "*") and still it fails - the filesystem provider in this case is confirmed to be a plain sun.nio.fs.WindowsFileSystemProvider. The story continues...

danielmitterdorfer · 2018-09-28T08:35:59Z

We have more test failures on Windows with the same cause:

5.6: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/27/console:

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

[2018-09-27T18:57:01,933][WARN ][o.e.c.a.s.ShardStateAction] [node_s0] [index_2][0] received shard failed for shard id [[index_2][0]], allocation id [0Jyeijx5QoeBLzJ85Ym4uw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[index_2][0]: Recovery failed from {node_s1}{UFRWgSweS6uUDivYiTC1-w}{4v-Ak5QCQ62QpeiZyzwK8w}{local}{local[368]} into {node_s0}{Povm-pMfSAaqPh8OcDkjHg}{3UCCMKflQTOz9cF4-uejCA}{local}{local[366]}]; nested: RemoteTransportException[[node_s0][local[366]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [25] files with total size of [36.4kb]]; nested: RemoteTransportException[[node_s1][local[368]][internal:index/shard/recovery/clean_files]]; nested: AccessDeniedException[C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\recovery.AWYcZDdo0i__cHMcxiHN._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\_0.cfs]; ]
[...]
Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\recovery.AWYcZDdo0i__cHMcxiHN._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\_0.cfs
  at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:229) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:319) ~[main/:?]
  at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:181) ~[main/:?]
  at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:406) ~[main/:?]
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:486) ~[main/:?]
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[main/:?]
  at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[main/:?]
  at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[main/:?]
  at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:390) ~[main/:?]

6.x: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/27/console

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [index_4][1]: Recovery failed from {node_s0}{7C-fb4R9RK-WS4Jk1WYXuQ}{BijPdN92Q32zPDZ9t_ldiA}{127.0.0.1}{127.0.0.1:55085} into {node_s1}{RqyKMgp3QZ-gu0IVK_U0AA}{I5djfftGTtmtr6bQbOAUJA}{127.0.0.1}{127.0.0.1:55086}
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:624) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s0][127.0.0.1:55089][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]
  1> 	... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [13] files with total size of [19.3kb]
  1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:424) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]
  1> 	... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s1][127.0.0.1:55090][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.x+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_CAE88D19276CD2B2-001\tempDir-002\data\nodes\1\indices\H6TUqJpRTRedJr1z26jubw\1\index\recovery.fshzJJ1CQCelF3mD6b587Q._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.x+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_CAE88D19276CD2B2-001\tempDir-002\data\nodes\1\indices\H6TUqJpRTRedJr1z26jubw\1\index\_0.cfs
  1> 	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1> 	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1> 	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1> 	at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:16]
  1> 	at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1> 	at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) ~[lucene-test-framework-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1> 	at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:334) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:452) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:557) ~[main/:?]
  1> 	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:551) ~[main/:?]
  1> 	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1> 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1> 	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]

6.4: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-windows-compatibility/28/console

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [index_4][3]: Recovery failed from {node_sd1}{O8l8bMuJTv2FfmOwRxXWgg}{1m7utzhqQjWidDUawrikfQ}{127.0.0.1}{127.0.0.1:63944} into {node_sd2}{IBcohRX6RB2qHr4S6wDBUQ}{Fa3f-veRRFOWmrrADglJ9A}{127.0.0.1}{127.0.0.1:63945}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd1][127.0.0.1:63944][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [4] files with total size of [4.5kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [4] files with total size of [4.5kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:63945][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\2\indices\SI6IoZJ5R_Wb4Nqx06oDmA\3\index\recovery.CB9P345NRmmuu-tPzwjksg._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\2\indices\SI6IoZJ5R_Wb4Nqx06oDmA\3\index\_0.cfs

org.elasticsearch.action.admin.indices.create.SplitIndexIT#testCreateSplitIndexToN():

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [second_split][0]: Recovery failed from {node_sd2}{hcRrVZwPQ7e8MTakDgz_GQ}{wfXHyRV6RPW0sKTXdamhdA}{127.0.0.1}{127.0.0.1:58177} into {node_sd3}{VlMAJBDSTemm3IJZgPciLA}{zPecWbBiSWaagZwl7Q1l6g}{127.0.0.1}{127.0.0.1:58171}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:58177][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [5.6kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd3][127.0.0.1:58171][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\3\indices\xKk-zXSXQsGpeMmq5ysVpA\0\index\recovery._dztPVO1RBKeP7ZpRHIbiA._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\3\indices\xKk-zXSXQsGpeMmq5ysVpA\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:337) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:439) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:556) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:550) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]

DaveCTurner · 2018-09-28T08:56:25Z

Looking harder at the logs, what happens is:

we allocate two primary shards of second_shrink to one node
they recover from local shards (this is the shrink)
as soon as they start they are rebalanced to the other node
then we update the number of replicas to 1, allocating replicas on the first node

However we failed to delete the unassigned shards from the first node after the rebalance (at least twice, maybe it succeeds, we're still looking). For some reason when we try and allocate a shard on top of the old directory it then fails with the AccessDeniedException.

DaveCTurner · 2018-09-28T09:39:22Z

Note that we're using MMapDirectory here. Setting index.store.type: simplefs yielded no failure in 10 iterations.

DaveCTurner · 2018-09-28T11:04:57Z

I have tried to find a way to reproduce this with fewer moving parts, but so far this has failed. The following test passes.

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */
package org.elasticsearch.action.admin.indices.create;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.IndexOutput;
import org.elasticsearch.Version;
import org.elasticsearch.cluster.metadata.IndexMetaData;
import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.shard.ShardId;
import org.elasticsearch.index.store.Store;
import org.elasticsearch.test.DummyShardLock;
import org.elasticsearch.test.ESTestCase;
import org.elasticsearch.test.junit.annotations.TestLogging;
import org.elasticsearch.threadpool.TestThreadPool;
import org.elasticsearch.threadpool.ThreadPool;

import java.io.IOException;
import java.nio.file.Path;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;

import static org.elasticsearch.cluster.metadata.IndexMetaData.INDEX_NUMBER_OF_REPLICAS_SETTING;
import static org.elasticsearch.cluster.metadata.IndexMetaData.INDEX_NUMBER_OF_SHARDS_SETTING;
import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_INDEX_VERSION_CREATED;
import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_SHARDS;
import static org.elasticsearch.index.store.Store.INDEX_STORE_STATS_REFRESH_INTERVAL_SETTING;

@TestLogging("org.elasticsearch:TRACE,index.store.deletes:TRACE")
public class WindowsFilesystemTests extends ESTestCase {
    public void testWindowsFilesystem() {
        ThreadPool threadPool = new TestThreadPool("test");
        try {
            final Path tempDir1 = createTempDir();
            final IOContext ioContext = new IOContext();
            final int attemptCount = 1000;
            final CountDownLatch countDownLatch = new CountDownLatch(attemptCount);

            try (Directory rawDirectory = FSDirectory.open(tempDir1)) {
                logger.info("rawDirectory = {}", rawDirectory);

                final ShardId shardId = new ShardId("foo", "bar", 0);
                final Store store = new Store(shardId,
                    new IndexSettings(IndexMetaData.builder(shardId.getIndexName())
                        .settings(Settings.builder()
                            .put(INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), 1)
                            .put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 0)
                            .put(SETTING_INDEX_VERSION_CREATED.getKey(), Version.CURRENT)
                            .put(INDEX_STORE_STATS_REFRESH_INTERVAL_SETTING.getKey(), TimeValue.ZERO)).build(), Settings.EMPTY),
                    rawDirectory, new DummyShardLock(shardId));

                final AtomicBoolean queryStatsThreadCancelled = new AtomicBoolean();
                final Thread queryStatsThread = new Thread(() -> {
                    while (queryStatsThreadCancelled.get() == false) {
                        try {
                            store.stats();
                        } catch (IOException e) {
                            throw new AssertionError("unexpected", e);
                        }
                    }
                });
                queryStatsThread.start();

                final Directory directory = store.directory();
                logger.info("directory = {}", directory);

                for (int i = 0; i < attemptCount; i++) {
                    for (int j = 1; j <= 2; j++) {
                        try (IndexOutput indexOutput = directory.createOutput("test-" + i + "-" + j + ".dat", ioContext)) {
                            int longCount = randomIntBetween(100, 10000);
                            for (int k = 0; k < longCount; k++) {
                                indexOutput.writeLong(randomLong());
                            }
                        }
                    }
                }

                for (int i = 0; i < attemptCount; i++) {
                    final String fileName1 = "test-" + i + "-1.dat";
                    final String fileName2 = "test-" + i + "-2.dat";

                    threadPool.generic().execute(() -> {
                        try {
                            directory.deleteFile(fileName1);
                            Thread.sleep(randomIntBetween(1, 50));
                        } catch (IOException | InterruptedException e) {
                            countDownLatch.countDown();
                            throw new AssertionError("unexpected", e);
                        }

                        threadPool.generic().execute(() -> {
                            try {
                                directory.rename(fileName2, fileName1);
                            } catch (IOException e) {
                                countDownLatch.countDown();
                                throw new AssertionError("unexpected", e);
                            }

                            threadPool.generic().execute(() -> {
                                try {
                                    directory.deleteFile(fileName1);
                                } catch (IOException e) {
                                    throw new AssertionError("unexpected", e);
                                } finally {
                                    countDownLatch.countDown();
                                }
                            });
                        });
                    });
                }

                countDownLatch.await();
                queryStatsThreadCancelled.set(true);
                queryStatsThread.join();
            }
        } catch (IOException | InterruptedException e) {
            throw new AssertionError("exception", e);
        } finally {
            threadPool.shutdown();
        }
    }
}

DaveCTurner · 2018-09-28T11:05:45Z

One interesting side-question is why this happens more on tests to do with shrinking. I think this is the answer:

elasticsearch/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java

Lines 726 to 727 in c4b8316

    
           // we only try once and then give up with a shrink index 
        
           .put("index.allocation.max_retries", 1);

In other cases, I guess an allocation that fails like this will retry (up to 5 times) and subsequent attempts will succeed.

Correction: subsequent attempts to allocate a copy of the shrunk index onto the node containing the source index will, I think, all fail, because of #33857 (comment) (unless either the source index is removed or closed, or else all the shared segments are merged away, neither of which seems very likely in this test).

DaveCTurner · 2018-09-28T12:08:28Z

To test this hypothesis, I tried writing a test that repeatedly allocates a shard to a node and then removes it, hoping to hit an allocation failure, with max retries set to 0:

@ClusterScope(scope = TEST, numDataNodes = 2)
@TestLogging("org.elasticsearch:TRACE,index.store.deletes:TRACE")
public class ReliableRecoveryIT extends ESIntegTestCase {
    public void testReliableRecovery() {
        prepareCreate("index").setSettings(Settings.builder().put(indexSettings())
            .put(INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), 1)
            .put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1)
            .put(SETTING_ALLOCATION_MAX_RETRY.getKey(), 0))
            .get();

        ensureGreen("index");

        for (int i = 0; i < 20; i++) {
            internalCluster().client().prepareBulk()
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .setRefreshPolicy(IMMEDIATE)
                .get();
        }

        for (int i = 0; i < 10; i++) {
            logger.info("--> removing replica [{}]", i);
            internalCluster().client().admin().indices().prepareUpdateSettings("index").setSettings(
                Settings.builder().put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 0)).get();
            logger.info("--> adding replica [{}]", i);
            internalCluster().client().admin().indices().prepareUpdateSettings("index").setSettings(
                Settings.builder().put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1)).get();
            logger.info("--> awaiting green [{}]", i);
            ensureGreen("index");
        }
    }
}

This passes. I'm mystified.

bleskes · 2018-09-28T12:10:39Z

One interesting side-question is why this happens more on tests to do with shrinking. I think this is the answer:

Good catch. We should clean these up IMO.

DaveCTurner · 2018-09-28T13:38:02Z

Ok, I've got it 🎉

The following test passes on Windows (i.e. throws an AccessDeniedException on the rename). It seems that if you create a hard link to a file, and then delete the link, but the original file is still open, then you can't rename something else into the place where the hard link used to be.

    public void testWindowsFilesystem() throws IOException {
        final Path directoryPath = createTempDir();
        final IOContext ioContext = new IOContext();

        try (Directory directory = FSDirectory.open(directoryPath)) {
            logger.info("directory = {}", directory);

            final String fileName1 = "test-1.dat";
            final String fileName2 = "test-2.dat";
            final String fileName3 = "test-3.dat";

            Files.write(directoryPath.resolve(fileName1), new byte[]{0}); // must be nonempty
            Files.write(directoryPath.resolve(fileName2), new byte[0]); // empty file

            try (IndexInput ignored = directory.openInput(fileName1, ioContext)) {
                Files.createLink(directoryPath.resolve(fileName3), directoryPath.resolve(fileName1));
                directory.deleteFile(fileName3);
                expectThrows(AccessDeniedException.class, () -> directory.rename(fileName2, fileName3));
            }
        }
    }

More precisely, attempts to delete fileName3 while fileName1 is still open will fail with AccessDeniedException, but we suppress these inside FSDirectory.

This only happens when using MMapDirectory and not SimpleFSDirectory.

mayya-sharipova · 2018-10-01T14:27:33Z

Still experiencing failure today on 6.4

Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-windows-compatibility/34/console

REPRODUCE WITH: gradlew :server:integTest
-Dtests.seed=544A239B8224F7C5
-Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT
-Dtests.method="testShrinking"
-Dtests.security.manager=true
-Dtests.locale=en-CA
-Dtests.timezone=SystemV/YST9YDT

and 6.x
Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/35/console

07:44:40 FAILURE 33.3s J2 | PartitionedRoutingIT.testShrinking <<< FAILURES!
07:44:40 > Throwable #1: java.lang.AssertionError: timed out waiting for green state
07:44:40 > at __randomizedtesting.SeedInfo.seed([544A239B8224F7C5:AE69E6F0A5B2602E]:0)
07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:967)
07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:923)
07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:912)
07:44:40 > at org.elasticsearch.routing.PartitionedRoutingIT.testShrinking(PartitionedRoutingIT.java:96)
07:44:40 > at java.lang.Thread.run(Thread.java:748)

mayya-sharipova · 2018-10-01T21:26:19Z

I have muted PartitionedRoutingIT:: testShrinking test on 6.4 and 6.x on Windows

hub-cap · 2018-10-05T03:19:01Z

saw again in 5.6 via https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/41/consoleText

original-brownbear · 2018-10-05T14:55:28Z

master still failing similarly:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/42/console

REPRODUCE WITH: gradlew :server:integTest \
  -Dtests.seed=7379D1534EE3AFC8 \
  -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \
  -Dtests.method="testShrinking" \
  -Dtests.security.manager=true \
  -Dtests.locale=pl-PL \
  -Dtests.timezone=US/Arizona \
  -Dcompiler.java=11 \
  -Druntime.java=8

DaveCTurner · 2018-10-15T06:18:30Z

We've worked out what's going on here so I've unassigned myself from this issue. Similar failures on Windows should be muted until we work out what to do with it - no need to report them here.

Relates: #33857

ywelsch · 2019-02-01T08:38:10Z

This is a Windows limitation with MMapDirectory we cannot do anything about -> Closing

alpar-t added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI v7.0.0 v6.5.0 v5.6.12 labels Sep 19, 2018

dnhatn added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed :Delivery/Build Build or test infrastructure labels Sep 20, 2018

tvernum added a commit that referenced this issue Sep 20, 2018

Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows

8d50c10

Relates: #33857

tvernum added a commit that referenced this issue Sep 20, 2018

Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows

919f0fe

Relates: #33857

tvernum added a commit that referenced this issue Sep 20, 2018

Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows

cb26ff4

Relates: #33857

jpountz assigned ywelsch Sep 20, 2018

ywelsch added a commit that referenced this issue Sep 21, 2018

Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows

a6aa773

Relates: #33857

s1monw mentioned this issue Sep 26, 2018

[CI] SplitIndexIT and ShrinkIndexIT failing on Windows #34080

Closed

jasontedor assigned DaveCTurner and unassigned ywelsch Sep 27, 2018

DaveCTurner removed their assignment Oct 15, 2018

DaveCTurner mentioned this issue Oct 15, 2018

[Windows 10] file.AccessDeniedException on split or shrink index #30962

Closed

colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018

kcm pushed a commit that referenced this issue Oct 30, 2018

Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows

2f2744e

Relates: #33857

jasontedor added v6.7.0 and removed v6.6.0 labels Dec 19, 2018

ywelsch closed this as completed Feb 1, 2019

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

pxsalehi mentioned this issue Feb 6, 2023

[CI] SplitIndexIT testCreateSplitIndexToN failing #92183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

alpar-t commented Sep 19, 2018

elasticmachine commented Sep 19, 2018

s1monw commented Sep 19, 2018

elasticmachine commented Sep 20, 2018

ywelsch commented Sep 20, 2018

tvernum commented Sep 20, 2018

ywelsch commented Sep 20, 2018

romseygeek commented Sep 21, 2018

ywelsch commented Sep 21, 2018

romseygeek commented Sep 21, 2018

romseygeek commented Sep 22, 2018

mayya-sharipova commented Sep 24, 2018

cbuescher commented Sep 24, 2018

cbuescher commented Sep 24, 2018

droberts195 commented Sep 27, 2018

DaveCTurner commented Sep 27, 2018

DaveCTurner commented Sep 28, 2018

danielmitterdorfer commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018 •

edited

Loading

DaveCTurner commented Sep 28, 2018 •

edited

Loading

bleskes commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018 •

edited

Loading

mayya-sharipova commented Oct 1, 2018 •

edited

Loading

mayya-sharipova commented Oct 1, 2018

hub-cap commented Oct 5, 2018

original-brownbear commented Oct 5, 2018

DaveCTurner commented Oct 15, 2018

ywelsch commented Feb 1, 2019

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

Comments

alpar-t commented Sep 19, 2018

elasticmachine commented Sep 19, 2018

s1monw commented Sep 19, 2018

elasticmachine commented Sep 20, 2018

ywelsch commented Sep 20, 2018

tvernum commented Sep 20, 2018

ywelsch commented Sep 20, 2018

romseygeek commented Sep 21, 2018

ywelsch commented Sep 21, 2018

romseygeek commented Sep 21, 2018

romseygeek commented Sep 22, 2018

mayya-sharipova commented Sep 24, 2018

cbuescher commented Sep 24, 2018

cbuescher commented Sep 24, 2018

droberts195 commented Sep 27, 2018

DaveCTurner commented Sep 27, 2018

DaveCTurner commented Sep 28, 2018

danielmitterdorfer commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018 • edited Loading

DaveCTurner commented Sep 28, 2018 • edited Loading

bleskes commented Sep 28, 2018

DaveCTurner commented Sep 28, 2018 • edited Loading

mayya-sharipova commented Oct 1, 2018 • edited Loading

mayya-sharipova commented Oct 1, 2018

hub-cap commented Oct 5, 2018

original-brownbear commented Oct 5, 2018

DaveCTurner commented Oct 15, 2018

ywelsch commented Feb 1, 2019

DaveCTurner commented Sep 28, 2018 •

edited

Loading

DaveCTurner commented Sep 28, 2018 •

edited

Loading

DaveCTurner commented Sep 28, 2018 •

edited

Loading

mayya-sharipova commented Oct 1, 2018 •

edited

Loading