Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

Closed
danielmitterdorfer opened this issue Dec 14, 2017 · 5 comments
Closed

UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

danielmitterdorfer opened this issue Dec 14, 2017 · 5 comments
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI

Comments

@danielmitterdorfer
Copy link
Member

test failure link: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/850/console (full build log and test-cluster logs)

failure:

14:43:59 FAILURE 33.8s J2 | UpdateByQueryBasicTests.testMultipleSources <<< FAILURES!
14:43:59    > Throwable #1: java.lang.AssertionError: Delete Index failed - not acked
14:43:59    > Expected: <true>
14:43:59    >      but: was <false>
14:43:59    > 	at __randomizedtesting.SeedInfo.seed([9604E02021D6042D:11ADCC34CB9CAA79]:0)
14:43:59    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
14:43:59    > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:133)
14:43:59    > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:129)
14:43:59    > 	at org.elasticsearch.test.TestCluster.wipeIndices(TestCluster.java:142)
14:43:59    > 	at org.elasticsearch.test.TestCluster.wipe(TestCluster.java:79)
14:43:59    > 	at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:578)
14:43:59    > 	at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2068)
14:43:59    > 	at java.lang.Thread.run(Thread.java:748)

So the test fails to delete the index that is used. Above the failure we see in the log:

14:43:59   1> [2017-12-14T14:43:25,919][INFO ][o.e.i.r.UpdateByQueryBasicTests] [UpdateByQueryBasicTests#testMultipleSources]: cleaning up after test
14:43:59   1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test1/GI9mScCtQfCfA1HbjOufZw] deleting index
14:43:59   1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test0/_LQPQ_6NQ7-Niuo1voVfRQ] deleting index
14:43:59   1> [2017-12-14T14:43:55,996][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [23] (timeout [30s], pending nodes: [{node_sd1}{JRpUXkrPT2K8ZVQyA9u6uw}{Qft0uWCtRmSiN19dSn_flg}{127.0.0.1}{127.0.0.1:9602}])

So I think the test may just be the victim and the root cause is in the test infrastructure. If you agree, please feel free to reassign.

reproduction line:

gradle :modules:reindex:test \
  -Dtests.seed=9604E02021D6042D \
  -Dtests.class=org.elasticsearch.index.reindex.UpdateByQueryBasicTests \
  -Dtests.method="testMultipleSources" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-PR \
  -Dtests.timezone=Africa/Monrovia

reproduces locally: no

failure frequency: this test failed once in CI within the last year

@danielmitterdorfer danielmitterdorfer added the >test Issues or PRs that are addressing/adding tests label Dec 14, 2017
@andyb-elastic
Copy link
Contributor

Hmm, I wasn't able to reproduce this at the method, class, or project level. I also don't see anything in the build or test cluster logs that indicates a cause. @danielmitterdorfer do you have any advice for further ruling out a problem with the test

@danielmitterdorfer
Copy link
Member Author

We do see the following in the log:

  1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test1/GI9mScCtQfCfA1HbjOufZw] deleting index
  1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test0/_LQPQ_6NQ7-Niuo1voVfRQ] deleting index
  1> [2017-12-14T14:43:55,996][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [23] (timeout [30s], pending nodes: [{node_sd1}{JRpUXkrPT2K8ZVQyA9u6uw}{Qft0uWCtRmSiN19dSn_flg}{127.0.0.1}{127.0.0.1:9602}])
[...]
  1> [2017-12-14T14:43:57,573][WARN ][o.e.c.s.MasterService    ] [node_sm0] cluster state update task [delete-index [[test1/GI9mScCtQfCfA1HbjOufZw], [test0/_LQPQ_6NQ7-Niuo1voVfRQ]]] took [31.5s] above the warn threshold of 30s
  1> [2017-12-14T14:43:57,739][WARN ][o.e.c.s.ClusterApplierService] [node_sd1] cluster state applier task [apply cluster state (from master [master {node_sm0}{A0mc7ckXQp2nqeAoajchpA}{1zRMtijCQlKk0lIE2NiwKw}{127.0.0.1}{127.0.0.1:9601} committed version [23]])] took [31.6s] above the warn threshold of 30s

So it seems that for some reason these nodes were hanging but this is all I can see in the log. Maybe it makes sense to increase the log level to get more info in case this test fails again?

@andyb-elastic
Copy link
Contributor

Gotcha, I'll increase the test log level since it can't hurt. Since this is the only time this test has failed I'm inclined to think it may be caused by something else

martijnvg added a commit that referenced this issue Dec 21, 2017
* es/master: (45 commits)
  Adapt scroll rest test after backport. relates #27842
  Move early termination based on index sort to TopDocs collector (#27666)
  Upgrade beats templates that we use for bwc testing. (#27929)
  ingest: upgraded ingest geoip's geoip2's dependencies.
  [TEST] logging for update by query test #27820
  Add elasticsearch-nio jar for base nio classes (#27801)
  Use full profile on JDK 10 builds
  Require Gradle 4.3
  Enable grok processor to support long, double and boolean (#27896)
  Add unreleased v6.1.2 version
  TEST: reduce blob size #testExecuteMultipartUpload
  Check index under the store metadata lock (#27768)
  Fixes DocStats to not report index size < -1 (#27863)
  Fixed test to be up to date with the new database files.
  Upgrade to Lucene 7.2.0. (#27910)
  Disable TestZenDiscovery in cloud providers integrations test
  Use `_refresh` to shrink the version map on inactivity (#27918)
  Make KeyedLock reentrant (#27920)
  ingest: Upgraded the geolite2 databases.
  [Test] Fix IndicesClientDocumentationIT (#27899)
  ...
martijnvg added a commit that referenced this issue Dec 21, 2017
* es/6.x: (43 commits)
  ingest: upgraded ingest geoip's geoip2's dependencies.
  [TEST] logging for update by query test #27820
  Use full profile on JDK 10 builds
  Require Gradle 4.3
  Add unreleased v6.1.2 version
  TEST: reduce blob size #testExecuteMultipartUpload
  Check index under the store metadata lock (#27768)
  Upgrade to Lucene 7.2.0. (#27910)
  Fixed test to be up to date with the new database files.
  Use `_refresh` to shrink the version map on inactivity (#27918)
  Make KeyedLock reentrant (#27920)
  Fixes DocStats to not report index size < -1 (#27863)
  Disable TestZenDiscovery in cloud providers integrations test
  ingest: Upgraded the geolite2 databases.
  [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (#27717)
  [Test] Fix IndicesClientDocumentationIT (#27899)
  Move uid lock into LiveVersionMap (#27905)
  Mute testRetentionPolicyChangeDuringRecovery
  Increase Gradle heap space to 1536m
  Move GlobalCheckpointTracker and remove SequenceNumbersService (#27837)
  ...
@clintongormley clintongormley added the >test-failure Triaged test failures from CI label Jan 11, 2018
@nik9000
Copy link
Member

nik9000 commented Jan 23, 2018

This looks like the node hanging for an unreasonable amount of time. The failure isn't particularly reindex based either. And we haven't seen another thing like this since. I'm going to close this and we'll reopen if we see something similar.

@nik9000 nik9000 closed this as completed Jan 23, 2018
@lcawl lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018
@jtibshirani
Copy link
Contributor

A recent failure popped up that may be related. The test timed out while attempting to process cluster events for 'put mapping'. I don't see general slowness or node hangs, but the logs are difficult to understand because lines from different outputs are interleaved.

Reproduction line:

./gradlew :modules:reindex:test --tests "org.elasticsearch.index.reindex.UpdateByQueryBasicTests.testMultipleSources" \
  -Dtests.seed=2CC9E93C207EC061 \
  -Dtests.security.manager=true \
  -Dtests.locale=ee-GH \
  -Dtests.timezone=US/Pacific-New \
  -Dcompiler.java=12 \
  -Druntime.java=11

Link to build logs: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=debian-9/320/console

Relevant excerpt:

 2> java.lang.AssertionError: 
    Expected: an empty iterable
  1> --------[test3][3], node[ijtePTJlRCOtyTK6Cmekyw], [P], s[STARTED], a[id=ZBh60YktReS7ukvj5RG8-g]
         but: [<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>,<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>,<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>,<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>]
        at __randomizedtesting.SeedInfo.seed([2CC9E93C207EC061:AB60C528CA346E35]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
  1> --------[test3][2], node[ijtePTJlRCOtyTK6Cmekyw], [R], s[STARTED], a[id=AofokK8sQv26mX0qWHqW-g]
  1> --------[test3][7], node[ijtePTJlRCOtyTK6Cmekyw], [P], s[STARTED], a[id=9rHRKFY3QsqIN5SIFwtymQ]
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
  1> --------[test3][8], node[ijtePTJlRCOtyTK6Cmekyw], [R], s[STARTED], a[id=os7XYhW1StKht1dCdlNHMg]
  1> --------[test3][5], node[ijtePTJlRCOtyTK6Cmekyw], [P], s[STARTED], a[id=u4X28Zq6RaedP9u9bRUO_Q]
  1> --------[test3][4], node[ijtePTJlRCOtyTK6Cmekyw], [R], s[STARTED], a[id=ZoCBY6WoTmi_zstDsmUoYQ]
        at org.elasticsearch.test.ESIntegTestCase.indexRandom(ESIntegTestCase.java:1521)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants