Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

Closed
cbuescher opened this issue Feb 4, 2019 · 1 comment · Fixed by #38931
Closed

[CI] ClusterDisruptionIT#testAckedIndexing failure on 6.6 #38318

cbuescher opened this issue Feb 4, 2019 · 1 comment · Fixed by #38931
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >test-failure Triaged test failures from CI v6.6.2

Comments

@cbuescher
Copy link
Member

Build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+periodic/281/console

Reproduce

./gradlew :server:integTest \
  -Dtests.seed=28EA74D48DC5F83F \
  -Dtests.class=org.elasticsearch.discovery.ClusterDisruptionIT \
  -Dtests.method="testAckedIndexing" \
  -Dtests.security.manager=true \
  -Dtests.locale=zh-SG \
  -Dtests.timezone=Asia/Almaty \
  -Dcompiler.java=11 \
  -Druntime.java=8

I kept this running in a loop for about 15 repetiotions locally before I got the first failure. Looks like this then:

   > Throwable #1: java.lang.AssertionError: [test][1], node[L-28xZP7Q0WyfVS1JrcTHA], [R], s[STARTED], a[id=EX57TRsNTX-vc3tfmteydQ] seq_no_stats mismatch
   > Expected: <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=4}>
   >      but: was <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=1}>
   >    at __randomizedtesting.SeedInfo.seed([28EA74D48DC5F83F:A22BC027D1B81E74]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.test.InternalTestCluster.lambda$assertSeqNos$8(InternalTestCluster.java:1328)
   >    at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:848)
   >    at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:822)
   >    at org.elasticsearch.test.InternalTestCluster.assertSeqNos(InternalTestCluster.java:1298)
   >    at org.elasticsearch.discovery.AbstractDisruptionTestCase.beforeIndexDeletion(AbstractDisruptionTestCase.java:104)
   >    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:585)
   >    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2199)
   >    at java.lang.Thread.run(Thread.java:748)
   >    Suppressed: java.lang.AssertionError: [test][1], node[L-28xZP7Q0WyfVS1JrcTHA], [R], s[STARTED], a[id=EX57TRsNTX-vc3tfmteydQ] seq_no_stats mismatch
   > Expected: <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=4}>
   >      but: was <SeqNoStats{maxSeqNo=4, localCheckpoint=4, globalCheckpoint=1}>
   >            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >            at org.elasticsearch.test.InternalTestCluster.lambda$assertSeqNos$8(InternalTestCluster.java:1328)
   >            at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
   >            ... 40 more

This is also what I see in the build logs.

@cbuescher cbuescher added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v6.6.1 labels Feb 4, 2019
@cbuescher
Copy link
Member Author

@ywelsch I'll keep this unmuted for now since I'm not sure this is an issue related to a particular test, let me know if you think this is specific enough so that muting would work.

@dnhatn dnhatn assigned dnhatn and unassigned ywelsch Feb 10, 2019
@jakelandis jakelandis added v6.6.2 and removed v6.6.1 labels Feb 13, 2019
dnhatn added a commit that referenced this issue Feb 15, 2019
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.

Closes #38318
Closes #36789
dnhatn added a commit that referenced this issue Feb 15, 2019
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.

Closes #38318
Closes #36789
dnhatn added a commit that referenced this issue Feb 15, 2019
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.

Closes #38318
Closes #36789
dnhatn added a commit that referenced this issue Feb 16, 2019
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.

Closes #38318
Closes #36789
dnhatn added a commit that referenced this issue Feb 16, 2019
We verify seq_no_stats is aligned between copies at the end of some
disruption tests. Sometimes, the assertion `assertSeqNos` is tripped due
to a lagged global checkpoint on replicas. The global checkpoint on
replicas is lagged because we sync the global checkpoint 30 seconds (by
default) after the last replication operation. This change reduces the
global checkpoint sync-internal to 1s in the disruption tests.

Closes #38318
Closes #36789
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >test-failure Triaged test failures from CI v6.6.2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants