[CI] ShrinkIndexIT.testShrinkIndexPrimaryTerm fails #28153

cbuescher · 2018-01-09T14:06:45Z

The test ShrinkIndexIT.testShrinkIndexPrimaryTerm() failed today on CI:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-darwin-compatibility/1526/console

The first this that looks like an error in the log is this:

13:29:05   1> [2018-01-09T23:27:23,148][WARN ][o.e.i.c.IndicesClusterStateService] [node_s1] [[target][0]] marking and sending shard failed due to [failed to create shard]
13:29:05   1> org.elasticsearch.index.IndexNotFoundException: no such index
13:29:05   1> 	at org.elasticsearch.index.shard.IndexShard.startRecovery(IndexShard.java:2075) ~[main/:?]
13:29:05   1> 	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:516) ~[main/:?]
13:29:05   1> 	at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:143) ~[main/:?]
13:29:05   1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:552) ~[main/:?]
13:29:05   1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:529) ~[main/:?]
13:29:05   1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:231) ~[main/:?]
13:29:05   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498) ~[main/:?]
13:29:05   1> 	at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_141]
13:29:05   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495) [main/:?]
13:29:05   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482) [main/:?]
13:29:05   1> 	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [main/:?]
13:29:05   1> 	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161) [main/:?]
13:29:05   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566) [main/:?]
13:29:05   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [main/:?]
13:29:05   1> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [main/:?]
13:29:05   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
13:29:05   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
13:29:05   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

The test seems to time out waiting for green state as a consequence:

13:29:07   1> [2018-01-09T23:29:04,372][INFO ][o.e.a.a.i.c.ShrinkIndexIT] [testShrinkIndexPrimaryTerm]: after test
13:29:07 FAILURE 90.4s J0 | ShrinkIndexIT.testShrinkIndexPrimaryTerm <<< FAILURES!
13:29:07    > Throwable #1: java.lang.AssertionError: timed out waiting for green state
13:29:07    > 	at __randomizedtesting.SeedInfo.seed([CD5C54E500E5EC92:E7D0617C1EBDF9DF]:0)
13:29:07    > 	at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:947)
13:29:07    > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:913)
13:29:07    > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:902)
13:29:07    > 	at org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testShrinkIndexPrimaryTerm(ShrinkIndexIT.java:167)
13:29:07    > 	at java.lang.Thread.run(Thread.java:748)

./gradlew :core:integTest -Dtests.seed=CD5C54E500E5EC92 -Dtests.class=org.elasticsearch.action.admin.indices.create.ShrinkIndexIT -Dtests.method="testShrinkIndexPrimaryTerm" -Dtests.security.manager=true -Dtests.locale=ar-AE -Dtests.timezone=Asia/Ust-Nera

The text was updated successfully, but these errors were encountered:

dnhatn · 2018-01-09T15:25:38Z

Log: testShrinkIndexPrimaryTerm.txt

dnhatn · 2018-01-26T17:08:31Z

Another instance on master https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=sles/1929/console

Log: testShrinkIndexPrimaryTerm.txt

If we have created 210 shards, we may need more than 30 seconds for all shards become green. Relates #28153

In some cases testShrinkIndexPrimaryTerm creates then 'mutates' 210 shards. If each shard opens more than 10 files (translog, lucene index), we exceeded the maximum allowed file handles. In our test, the number of file handles is limited to 2048 by HandleLimitFS. This commit reduces the number of shards in testShrinkIndexPrimaryTerm to avoid such errors. Closes #28153

mayya-sharipova · 2018-07-05T19:00:06Z

The same test failed on master Java 10 today.
Is not reproduced locally for me.

REPRODUCE WITH: ./gradlew :server:integTest \
  -Dtests.seed=2A7F48E21D6F8FBF \
  -Dtests.class=org.elasticsearch.action.admin.indices.create.ShrinkIndexIT \
  -Dtests.method="testShrinkIndexPrimaryTerm" \
  -Dtests.security.manager=true \
  -Dtests.locale=hy \
  -Dtests.timezone=America/Sitka

Console log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java10,ES_RUNTIME_JAVA=java10,nodes=virtual&&linux/158/console

Errors:

16:10:08 FAILURE 4.58s J1 | ShrinkIndexIT.testShrinkIndexPrimaryTerm <<< FAILURES!
16:10:08    > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s2
16:10:08    > Expected: <0L>
16:10:08    >      but: was <1322L>
16:10:08    > 	at __randomizedtesting.SeedInfo.seed([2A7F48E21D6F8FBF:F37D7B03379AF2]:0)
16:10:08    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
16:10:08    > 	at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2020)
16:10:08    > 	at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:93)
16:10:08    > 	at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2051)
16:10:08    > 	at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:592)
16:10:08    > 	at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2185)
16:10:08    > 	at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
16:10:08    > 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:10:08    > 	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
16:10:08    > 	at java.base/java.lang.Thread.run(Thread.java:844)

elasticmachine · 2018-07-05T19:00:22Z

Pinging @elastic/es-distributed

dnhatn · 2018-07-05T19:12:57Z

16:10:08 > Throwable #1: java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_s2
16:10:08 > Expected: <0L>
16:10:08 > but: was <1322L>

@mayya-sharipova I think this test failed because the accounting breaker was not reset correctly. This is a known issue and being tracked at #30290. I added this failure to #30290.

tlrx · 2019-02-27T16:31:38Z

This test failed again today:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+internalClusterTest/1370

With the same error as the one reported in the description of the issue:

FAILURE 30.6s J6 | ShrinkIndexIT.testShrinkIndexPrimaryTerm <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: timed out waiting for green state
   > 	at __randomizedtesting.SeedInfo.seed([6DE07894235EC66A:476C4D0D3D06D327]:0)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:975)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:931)
   > 	at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:920)
   > 	at org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testShrinkIndexPrimaryTerm(ShrinkIndexIT.java:221)
   > 	at java.lang.Thread.run(Thread.java:748)

It does not seem that the accounting breaker is in cause here.

consoleText.txt

ywelsch · 2019-05-07T13:43:16Z

This test continues to fail (newest failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+release-tests/308/console)
It seems to be just too brittle. I wonder if we should make it so that the test waits after each primary failure (i.e. call to failShard) for the primary term to be incremented, i.e., the failure to have taken effect. WDYT @dnhatn ?

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153

alpar-t · 2019-05-22T14:55:39Z

This failed again : https://elasticsearch-ci.elastic.co/view/Elasticsearch/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=java12,nodes=immutable&&linux&&docker/434/

dnhatn · 2019-05-22T14:59:13Z

java.lang.AssertionError: Accounting breaker not reset to 0 on node: node_sd2, are there still Lucene indices around?
Expected: <0L>
     but: was <4172L>

Closing this issue again since the actual failure is tracked #30290.

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes elastic#28153

cbuescher added >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI labels Jan 9, 2018

cbuescher assigned bleskes Jan 9, 2018

dnhatn self-assigned this Jan 9, 2018

dnhatn added a commit that referenced this issue Jan 26, 2018

Increase timeout for ensureGreen in testShrinkIndexPrimaryTerm

583085d

If we have created 210 shards, we may need more than 30 seconds for all shards become green. Relates #28153

dnhatn added a commit that referenced this issue Jan 26, 2018

Increase timeout for ensureGreen in testShrinkIndexPrimaryTerm

877c594

If we have created 210 shards, we may need more than 30 seconds for all shards become green. Relates #28153

dnhatn closed this as completed in 414b5de Jan 26, 2018

mayya-sharipova reopened this Jul 5, 2018

mayya-sharipova added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. label Jul 5, 2018

dnhatn closed this as completed Jul 5, 2018

tlrx reopened this Feb 27, 2019

tlrx mentioned this issue Feb 28, 2019

Backport support for replicating closed indices to 7.x #39506

Merged

ywelsch unassigned bleskes May 7, 2019

dnhatn mentioned this issue May 10, 2019

Adjust load and timeout in testShrinkIndexPrimaryTerm #42098

Merged

dnhatn closed this as completed in #42098 May 14, 2019

dnhatn added a commit that referenced this issue May 14, 2019

Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)

77d0c79

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153

dnhatn added a commit that referenced this issue May 15, 2019

Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)

a75e916

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153

dnhatn added a commit that referenced this issue May 15, 2019

Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)

66d555b

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153

dnhatn added a commit that referenced this issue May 16, 2019

Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)

f50346b

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153

dnhatn added a commit that referenced this issue May 16, 2019

Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)

5bca517

This test can create and shuffle 2*(3*5*7) = 210 shards which is quite heavy for our CI. This commit reduces the load, so we don't timeout on CI. Closes #28153

alpar-t reopened this May 22, 2019

dnhatn closed this as completed May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] ShrinkIndexIT.testShrinkIndexPrimaryTerm fails #28153

[CI] ShrinkIndexIT.testShrinkIndexPrimaryTerm fails #28153

cbuescher commented Jan 9, 2018 •

edited by dakrone

Loading

dnhatn commented Jan 9, 2018

dnhatn commented Jan 26, 2018

mayya-sharipova commented Jul 5, 2018 •

edited

Loading

elasticmachine commented Jul 5, 2018

dnhatn commented Jul 5, 2018

tlrx commented Feb 27, 2019

ywelsch commented May 7, 2019

alpar-t commented May 22, 2019

dnhatn commented May 22, 2019

[CI] ShrinkIndexIT.testShrinkIndexPrimaryTerm fails #28153

[CI] ShrinkIndexIT.testShrinkIndexPrimaryTerm fails #28153

Comments

cbuescher commented Jan 9, 2018 • edited by dakrone Loading

dnhatn commented Jan 9, 2018

dnhatn commented Jan 26, 2018

mayya-sharipova commented Jul 5, 2018 • edited Loading

elasticmachine commented Jul 5, 2018

dnhatn commented Jul 5, 2018

tlrx commented Feb 27, 2019

ywelsch commented May 7, 2019

alpar-t commented May 22, 2019

dnhatn commented May 22, 2019

cbuescher commented Jan 9, 2018 •

edited by dakrone

Loading

mayya-sharipova commented Jul 5, 2018 •

edited

Loading