Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SplitIndexIT testCreateSplitIndexToN failing #92183

Closed
slobodanadamovic opened this issue Dec 7, 2022 · 13 comments · Fixed by #93517
Closed

[CI] SplitIndexIT testCreateSplitIndexToN failing #92183

slobodanadamovic opened this issue Dec 7, 2022 · 13 comments · Fixed by #93517
Assignees
Labels
:Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@slobodanadamovic
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/bvb5puxxlbaou/tests/:server:internalClusterTest/org.elasticsearch.action.admin.indices.create.SplitIndexIT/testCreateSplitIndexToN

Reproduction line:

gradlew ':server:internalClusterTest' --tests "org.elasticsearch.action.admin.indices.create.SplitIndexIT.testCreateSplitIndexToN" -Dtests.seed=24641F2A0356AEBD -Dtests.locale=ar-MA -Dtests.timezone=SystemV/EST5 -Druntime.java=19

Applicable branches:
8.6

Reproduces locally?:
Didn't try

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.action.admin.indices.create.SplitIndexIT&tests.test=testCreateSplitIndexToN

Failure excerpt:

java.lang.AssertionError: timed out waiting for green state

  at __randomizedtesting.SeedInfo.seed([24641F2A0356AEBD:92C8F9F3E9C1145A]:0)
  at org.junit.Assert.fail(Assert.java:88)
  at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:966)
  at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:905)
  at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:894)
  at org.elasticsearch.action.admin.indices.create.SplitIndexIT.splitToN(SplitIndexIT.java:245)
  at org.elasticsearch.action.admin.indices.create.SplitIndexIT.testCreateSplitIndexToN(SplitIndexIT.java:78)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:578)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1589)

@slobodanadamovic slobodanadamovic added :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI labels Dec 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@slobodanadamovic
Copy link
Contributor Author

slobodanadamovic commented Dec 7, 2022

[2022-12-07T03:09:51,640][INFO ][o.e.a.a.i.c.SplitIndexIT ] [testCreateSplitIndexToN] ensureGreen timed out, cluster state:

The test timed out, but it seems to me that these exceptions are relevant:

  1> [2022-12-07T03:09:14,738][WARN ][o.e.i.IndicesService     ] [node_s2] [target/zDozxK_RTPCd927QhZfX2g] failed to delete index
  1> java.io.IOException: could not remove the following files (in the order of attempts):
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index\_2.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index\_2.cfs
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index\_1.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index\_1.cfs
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index\_0.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index\_0.cfs
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1\index
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\1
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index\_2.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index\_2.cfs
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index\_1.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index\_1.cfs
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index\_0.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index\_0.cfs
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0\index
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g\0
  1>    C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s2\indices\zDozxK_RTPCd927QhZfX2g
1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s0\indices\l7teugWcQ_GXYt5PKOYV_w\7\index\recovery.Lfot0VCSR16-vueYBDdx1A._0.cfs -> C:\Users\jenkins\workspace\platform-support\42\server\build\testrun\internalClusterTest\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_24641F2A0356AEBD-001\tempDir-002\node_s0\indices\l7teugWcQ_GXYt5PKOYV_w\7\index\_0.cfs
  1> 	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89)
  1> 	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
  1> 	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:317)
  1> 	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293)
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144)
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144)
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144)
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:144)
  1> 	at java.nio.file.Files.move(Files.java:1430)
  1> 	at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:272)
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:91)
  1> 	at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:91)
  1> 	at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:325)
  1> 	at org.elasticsearch.indices.recovery.MultiFileWriter.renameAllTempFiles(MultiFileWriter.java:223)
  1> 	at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$cleanFiles$6(RecoveryTarget.java:494)
  1> 	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:444)
  1> 	... 10 more

@masseyke
Copy link
Member

Looking at the failure history, I only see this happening on windows boxes. I'm not sure how this could be related to our test -- maybe something in the windows build environment?

@masseyke
Copy link
Member

I'm not to the bottom of this yet, but thought I'd write up what I've learned so far:
I think I've sort of reproduced it. I say sort of because I don't have access to a windows machine right now, and I think that you only get an AccessDeniedException when calling Files.delete() on a windows box. On my mac it doesn't complain. But I artificially slowed down closing the files (by adding in a Thread.sleep locally in IOUtils.close()), and I'm able to reliably see it call Files.delete() before the file closes. The reason is that the code that closes the index runs asynchronously and the code that delete shards does not wait for it.
The closing happens as a result of IndicesClusterStateService.applyClusterState. This runs first and usually the closing happens before the deleting, but the closing is done asynchronously.
The deleting happens as a result of IndicesStore.clusterChanged.

@masseyke
Copy link
Member

Here is the code that leads to file closure (several async calls):

startRecovery:173, PeerRecoveryTargetService (org.elasticsearch.indices.recovery)
startRecovery:3040, IndexShard (org.elasticsearch.index.shard)
createShard:853, IndicesService (org.elasticsearch.indices)
createShard:175, IndicesService (org.elasticsearch.indices)
createShard:571, IndicesClusterStateService (org.elasticsearch.indices.cluster)
createOrUpdateShard:510, IndicesClusterStateService (org.elasticsearch.indices.cluster)
createIndicesAndUpdateShards:495, IndicesClusterStateService (org.elasticsearch.indices.cluster)
applyClusterState:226, IndicesClusterStateService (org.elasticsearch.indices.cluster)
callClusterStateAppliers:538, ClusterApplierService (org.elasticsearch.cluster.service)
callClusterStateAppliers:524, ClusterApplierService (org.elasticsearch.cluster.service)
applyChanges:497, ClusterApplierService (org.elasticsearch.cluster.service)
runTask:428, ClusterApplierService (org.elasticsearch.cluster.service)
run:154, ClusterApplierService$UpdateTask (org.elasticsearch.cluster.service)
run:891, ThreadContext$ContextPreservingRunnable (org.elasticsearch.common.util.concurrent)
runAndClean:257, PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable (org.elasticsearch.common.util.concurrent)
run:223, PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable (org.elasticsearch.common.util.concurrent)
runWorker:1136, ThreadPoolExecutor (java.util.concurrent)
run:635, ThreadPoolExecutor$Worker (java.util.concurrent)
run:833, Thread (java.lang)

recoverToTarget:164, RecoverySourceHandler (org.elasticsearch.indices.recovery)
recover:163, PeerRecoverySourceService (org.elasticsearch.indices.recovery)
messageReceived:182, PeerRecoverySourceService$StartRecoveryTransportRequestHandler (org.elasticsearch.indices.recovery)
messageReceived:179, PeerRecoverySourceService$StartRecoveryTransportRequestHandler (org.elasticsearch.indices.recovery)
processMessageReceived:71, RequestHandlerRegistry (org.elasticsearch.transport)
doRun:296, InboundHandler$1 (org.elasticsearch.transport)
doRun:958, ThreadContext$ContextPreservingAbstractRunnable (org.elasticsearch.common.util.concurrent)
run:26, AbstractRunnable (org.elasticsearch.common.util.concurrent)
runWorker:1136, ThreadPoolExecutor (java.util.concurrent)
run:635, ThreadPoolExecutor$Worker (java.util.concurrent)
run:833, Thread (java.lang)

close:94, IOUtils (org.elasticsearch.core) [2]
close:114, IOUtils (org.elasticsearch.core)
close:72, IOUtils (org.elasticsearch.core)
close:1418, RecoverySourceHandler$3 (org.elasticsearch.indices.recovery)
close:97, IOUtils (org.elasticsearch.core) [1]
close:146, IOUtils (org.elasticsearch.core)
lambda$recoverToTarget$16:391, RecoverySourceHandler (org.elasticsearch.indices.recovery)
accept:-1, RecoverySourceHandler$$Lambda$4965/0x00000008016badf8 (org.elasticsearch.indices.recovery)
onResponse:165, ActionListener$2 (org.elasticsearch.action)
notifyListenerDirectly:113, ListenableFuture (org.elasticsearch.common.util.concurrent)
done:100, ListenableFuture (org.elasticsearch.common.util.concurrent)
set:131, BaseFuture (org.elasticsearch.common.util.concurrent)
onResponse:139, ListenableFuture (org.elasticsearch.common.util.concurrent)
onResponse:56, StepListener (org.elasticsearch.action)
completeFinalizationListener:1282, RecoverySourceHandler (org.elasticsearch.indices.recovery)
lambda$finalizeRecovery$43:1266, RecoverySourceHandler (org.elasticsearch.indices.recovery)
accept:-1, RecoverySourceHandler$$Lambda$5088/0x00000008016df730 (org.elasticsearch.indices.recovery)
onResponse:165, ActionListener$2 (org.elasticsearch.action)
onResponse:442, ActionListener$RunBeforeActionListener (org.elasticsearch.action)
onResponse:797, IndexShard$2$1 (org.elasticsearch.index.shard)
onResponse:788, IndexShard$2$1 (org.elasticsearch.index.shard)
onResponse:130, ActionListener$MappedActionListener (org.elasticsearch.action)
handleResponse:43, ActionListenerResponseHandler (org.elasticsearch.action)
handleResponse:1362, TransportService$ContextRestoreResponseHandler (org.elasticsearch.transport)
doHandleResponse:400, InboundHandler (org.elasticsearch.transport)
doRun:357, InboundHandler$2 (org.elasticsearch.transport)
doRun:958, ThreadContext$ContextPreservingAbstractRunnable (org.elasticsearch.common.util.concurrent)
run:26, AbstractRunnable (org.elasticsearch.common.util.concurrent)
runWorker:1136, ThreadPoolExecutor (java.util.concurrent)
run:635, ThreadPoolExecutor$Worker (java.util.concurrent)
run:833, Thread (java.lang)

And here is the code that leads to file deletes (also several async calls):

deleteShardIfExistElseWhere:213, IndicesStore (org.elasticsearch.indices.store)
clusterChanged:170, IndicesStore (org.elasticsearch.indices.store)
callClusterStateListener:558, ClusterApplierService (org.elasticsearch.cluster.service)
callClusterStateListeners:544, ClusterApplierService (org.elasticsearch.cluster.service)
applyChanges:504, ClusterApplierService (org.elasticsearch.cluster.service)
runTask:428, ClusterApplierService (org.elasticsearch.cluster.service)
run:154, ClusterApplierService$UpdateTask (org.elasticsearch.cluster.service)
run:891, ThreadContext$ContextPreservingRunnable (org.elasticsearch.common.util.concurrent)
runAndClean:257, PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable (org.elasticsearch.common.util.concurrent)
run:223, PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable (org.elasticsearch.common.util.concurrent)
runWorker:1136, ThreadPoolExecutor (java.util.concurrent)
run:635, ThreadPoolExecutor$Worker (java.util.concurrent)
run:833, Thread (java.lang)

allNodesResponded:278, IndicesStore$ShardActiveResponseHandler (org.elasticsearch.indices.store)
handleResponse:265, IndicesStore$ShardActiveResponseHandler (org.elasticsearch.indices.store)
handleResponse:236, IndicesStore$ShardActiveResponseHandler (org.elasticsearch.indices.store)
handleResponse:1362, TransportService$ContextRestoreResponseHandler (org.elasticsearch.transport)
doHandleResponse:400, InboundHandler (org.elasticsearch.transport)
handleResponse:349, InboundHandler (org.elasticsearch.transport)
messageReceived:144, InboundHandler (org.elasticsearch.transport)
inboundMessage:97, InboundHandler (org.elasticsearch.transport)
inboundMessage:808, TcpTransport (org.elasticsearch.transport)
accept:-1, Netty4MessageInboundHandler$$Lambda$2874/0x0000000801413e00 (org.elasticsearch.transport.netty4)
forwardFragments:150, InboundPipeline (org.elasticsearch.transport)
doHandleBytes:121, InboundPipeline (org.elasticsearch.transport)
handleBytes:86, InboundPipeline (org.elasticsearch.transport)
channelRead:63, Netty4MessageInboundHandler (org.elasticsearch.transport.netty4)
invokeChannelRead:444, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:420, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:412, AbstractChannelHandlerContext (io.netty.channel)
channelRead:280, LoggingHandler (io.netty.handler.logging)
invokeChannelRead:442, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:420, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:412, AbstractChannelHandlerContext (io.netty.channel)
channelRead:103, MessageToMessageDecoder (io.netty.handler.codec)
invokeChannelRead:444, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:420, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:412, AbstractChannelHandlerContext (io.netty.channel)
channelRead:1410, DefaultChannelPipeline$HeadContext (io.netty.channel)
invokeChannelRead:440, AbstractChannelHandlerContext (io.netty.channel)
invokeChannelRead:420, AbstractChannelHandlerContext (io.netty.channel)
fireChannelRead:919, DefaultChannelPipeline (io.netty.channel)
read:166, AbstractNioByteChannel$NioByteUnsafe (io.netty.channel.nio)
processSelectedKey:788, NioEventLoop (io.netty.channel.nio)
processSelectedKeysPlain:689, NioEventLoop (io.netty.channel.nio)
processSelectedKeys:652, NioEventLoop (io.netty.channel.nio)
run:562, NioEventLoop (io.netty.channel.nio)
run:997, SingleThreadEventExecutor$4 (io.netty.util.concurrent)
run:74, ThreadExecutorMap$2 (io.netty.util.internal)
run:833, Thread (java.lang)

visitFile:265, IOUtils$1 (org.elasticsearch.core)
visitFile:240, IOUtils$1 (org.elasticsearch.core)
walkFileTree:2812, Files (java.nio.file)
walkFileTree:2883, Files (java.nio.file)
rm:240, IOUtils (org.elasticsearch.core)
rm:224, IOUtils (org.elasticsearch.core)
deleteShardDirectoryUnderLock:718, NodeEnvironment (org.elasticsearch.env)
deleteShardDirectorySafe:667, NodeEnvironment (org.elasticsearch.env)
deleteShardStore:1080, IndicesService (org.elasticsearch.indices)
lambda$allNodesResponded$2:313, IndicesStore$ShardActiveResponseHandler (org.elasticsearch.indices.store)
accept:-1, IndicesStore$ShardActiveResponseHandler$$Lambda$5143/0x00000008016e6788 (org.elasticsearch.indices.store)
lambda$runOnApplierThread$0:298, ClusterApplierService (org.elasticsearch.cluster.service)
apply:-1, ClusterApplierService$$Lambda$5146/0x00000008016e6e28 (org.elasticsearch.cluster.service)
runTask:397, ClusterApplierService (org.elasticsearch.cluster.service)
run:154, ClusterApplierService$UpdateTask (org.elasticsearch.cluster.service)
run:891, ThreadContext$ContextPreservingRunnable (org.elasticsearch.common.util.concurrent)
runAndClean:257, PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable (org.elasticsearch.common.util.concurrent)
run:223, PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable (org.elasticsearch.common.util.concurrent)
runWorker:1136, ThreadPoolExecutor (java.util.concurrent)
run:635, ThreadPoolExecutor$Worker (java.util.concurrent)
run:833, Thread (java.lang)

@masseyke
Copy link
Member

Since this appears to be internal to the splitting logic, I'm turning it over to the distributed team. Let me know if you disagree.

@masseyke masseyke added :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team labels Jan 17, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@arteam arteam self-assigned this Jan 17, 2023
@csoulios
Copy link
Contributor

csoulios commented Jan 26, 2023

Failed again today:
https://gradle-enterprise.elastic.co/s/nc57sqz4ofd3g/

Test failed on Windows again

@craigtaverner
Copy link
Contributor

@pxsalehi
Copy link
Member

pxsalehi commented Feb 1, 2023

I'll mute it.

@pxsalehi
Copy link
Member

pxsalehi commented Feb 6, 2023

Thank @masseyke. This all seems to point to a long known issue with Windows:

@DaveCTurner seem to have spent some time getting to the bottom of this, among others.

Actually the test under this one, i.e. testSplitFromOneToN, does skip Windows. I'm just surprised that meanwhile this one kept working somehow and was not skipped. Although it does seem to have failed regularly on Windows. Considering these two tests share most of their code, I'd suggest we skip this test on Windows too.

@pxsalehi
Copy link
Member

pxsalehi commented Feb 6, 2023

For the record, the failures seem to have been mostly in the past 3/4 months, and on Windows, and the few I've checked all seem to have had the Caused by: java.nio.file.AccessDeniedException in their logs. Not sure why this wasn't happening before.
image

@kingherc
Copy link
Contributor

kingherc commented Feb 9, 2023

Happened again on 8.6: https://gradle-enterprise.elastic.co/s/s5a3nqhkhauhg

Probably missing backport of above fix on 8.6.

@kingherc kingherc reopened this Feb 9, 2023
pxsalehi added a commit to pxsalehi/elasticsearch that referenced this issue Feb 9, 2023
elasticsearchmachine pushed a commit that referenced this issue Feb 9, 2023
pxsalehi added a commit to pxsalehi/elasticsearch that referenced this issue Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants