[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

cbuescher · 2019-02-04T10:35:43Z

Build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+periodic/281/console

Reproduce

./gradlew :server:integTest \
  -Dtests.seed=28EA74D48DC5F83F \
  -Dtests.class=org.elasticsearch.discovery.ClusterDisruptionIT \
  -Dtests.method="testAckedIndexing" \
  -Dtests.security.manager=true \
  -Dtests.locale=zh-SG \
  -Dtests.timezone=Asia/Almaty \
  -Dcompiler.java=11 \
  -Druntime.java=8

I kept this running in a loop for about 15 repetiotions locally before I got the first failure. Looks like this then:

   > Throwable #1: java.lang.AssertionError: [test][1], node[L-28xZP7Q0WyfVS1JrcTHA], [R], s[STARTED], a[id=EX57TRsNTX-vc3tfmteydQ] seq_no_stats mismatch
   > Expected: <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=4}>
   >      but: was <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=1}>
   >    at __randomizedtesting.SeedInfo.seed([28EA74D48DC5F83F:A22BC027D1B81E74]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.test.InternalTestCluster.lambda$assertSeqNos$8(InternalTestCluster.java:1328)
   >    at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:848)
   >    at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:822)
   >    at org.elasticsearch.test.InternalTestCluster.assertSeqNos(InternalTestCluster.java:1298)
   >    at org.elasticsearch.discovery.AbstractDisruptionTestCase.beforeIndexDeletion(AbstractDisruptionTestCase.java:104)
   >    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:585)
   >    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2199)
   >    at java.lang.Thread.run(Thread.java:748)
   >    Suppressed: java.lang.AssertionError: [test][1], node[L-28xZP7Q0WyfVS1JrcTHA], [R], s[STARTED], a[id=EX57TRsNTX-vc3tfmteydQ] seq_no_stats mismatch
   > Expected: <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=4}>
   >      but: was <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=1}>
   >            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >            at org.elasticsearch.test.InternalTestCluster.lambda$assertSeqNos$8(InternalTestCluster.java:1328)
   >            at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
   >            ... 40 more

This is also what I see in the build logs.

The text was updated successfully, but these errors were encountered:

cbuescher · 2019-02-04T10:36:57Z

@ywelsch I'll keep this unmuted for now since I'm not sure this is an issue related to a particular test, let me know if you think this is specific enough so that muting would work.

We verify seq_no_stats is aligned between copies at the end of some disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due to a lagged global checkpoint on replicas. The global checkpoint on replicas is lagged because we sync the global checkpoint 30 seconds (by default) after the last replication operation. This change reduces the global checkpoint sync-internal to 1s in the disruption tests. Closes #38318 Closes #36789

cbuescher added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v6.6.1 labels Feb 4, 2019

cbuescher assigned ywelsch Feb 4, 2019

dnhatn assigned dnhatn and unassigned ywelsch Feb 10, 2019

jakelandis added v6.6.2 and removed v6.6.1 labels Feb 13, 2019

dnhatn mentioned this issue Feb 14, 2019

Reduce global checkpoint sync interval in disruption tests #38931

Merged

dnhatn closed this as completed in #38931 Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

cbuescher commented Feb 4, 2019

cbuescher commented Feb 4, 2019

[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

Comments

cbuescher commented Feb 4, 2019

cbuescher commented Feb 4, 2019