UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

danielmitterdorfer · 2017-12-14T15:23:50Z

test failure link: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/850/console (full build log and test-cluster logs)

failure:

14:43:59 FAILURE 33.8s J2 | UpdateByQueryBasicTests.testMultipleSources <<< FAILURES!
14:43:59    > Throwable #1: java.lang.AssertionError: Delete Index failed - not acked
14:43:59    > Expected: <true>
14:43:59    >      but: was <false>
14:43:59    > 	at __randomizedtesting.SeedInfo.seed([9604E02021D6042D:11ADCC34CB9CAA79]:0)
14:43:59    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
14:43:59    > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:133)
14:43:59    > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:129)
14:43:59    > 	at org.elasticsearch.test.TestCluster.wipeIndices(TestCluster.java:142)
14:43:59    > 	at org.elasticsearch.test.TestCluster.wipe(TestCluster.java:79)
14:43:59    > 	at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:578)
14:43:59    > 	at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2068)
14:43:59    > 	at java.lang.Thread.run(Thread.java:748)

So the test fails to delete the index that is used. Above the failure we see in the log:

14:43:59   1> [2017-12-14T14:43:25,919][INFO ][o.e.i.r.UpdateByQueryBasicTests] [UpdateByQueryBasicTests#testMultipleSources]: cleaning up after test
14:43:59   1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test1/GI9mScCtQfCfA1HbjOufZw] deleting index
14:43:59   1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test0/_LQPQ_6NQ7-Niuo1voVfRQ] deleting index
14:43:59   1> [2017-12-14T14:43:55,996][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [23] (timeout [30s], pending nodes: [{node_sd1}{JRpUXkrPT2K8ZVQyA9u6uw}{Qft0uWCtRmSiN19dSn_flg}{127.0.0.1}{127.0.0.1:9602}])

So I think the test may just be the victim and the root cause is in the test infrastructure. If you agree, please feel free to reassign.

reproduction line:

gradle :modules:reindex:test \
  -Dtests.seed=9604E02021D6042D \
  -Dtests.class=org.elasticsearch.index.reindex.UpdateByQueryBasicTests \
  -Dtests.method="testMultipleSources" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-PR \
  -Dtests.timezone=Africa/Monrovia

reproduces locally: no

failure frequency: this test failed once in CI within the last year

The text was updated successfully, but these errors were encountered:

andyb-elastic · 2017-12-18T21:38:53Z

Hmm, I wasn't able to reproduce this at the method, class, or project level. I also don't see anything in the build or test cluster logs that indicates a cause. @danielmitterdorfer do you have any advice for further ruling out a problem with the test

danielmitterdorfer · 2017-12-19T14:07:28Z

We do see the following in the log:

  1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test1/GI9mScCtQfCfA1HbjOufZw] deleting index
  1> [2017-12-14T14:43:25,996][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm0] [test0/_LQPQ_6NQ7-Niuo1voVfRQ] deleting index
  1> [2017-12-14T14:43:55,996][WARN ][o.e.d.z.PublishClusterStateAction] [node_sm0] timed out waiting for all nodes to process published state [23] (timeout [30s], pending nodes: [{node_sd1}{JRpUXkrPT2K8ZVQyA9u6uw}{Qft0uWCtRmSiN19dSn_flg}{127.0.0.1}{127.0.0.1:9602}])
[...]
  1> [2017-12-14T14:43:57,573][WARN ][o.e.c.s.MasterService    ] [node_sm0] cluster state update task [delete-index [[test1/GI9mScCtQfCfA1HbjOufZw], [test0/_LQPQ_6NQ7-Niuo1voVfRQ]]] took [31.5s] above the warn threshold of 30s
  1> [2017-12-14T14:43:57,739][WARN ][o.e.c.s.ClusterApplierService] [node_sd1] cluster state applier task [apply cluster state (from master [master {node_sm0}{A0mc7ckXQp2nqeAoajchpA}{1zRMtijCQlKk0lIE2NiwKw}{127.0.0.1}{127.0.0.1:9601} committed version [23]])] took [31.6s] above the warn threshold of 30s

So it seems that for some reason these nodes were hanging but this is all I can see in the log. Maybe it makes sense to increase the log level to get more info in case this test fails again?

andyb-elastic · 2017-12-20T23:09:49Z

Gotcha, I'll increase the test log level since it can't hurt. Since this is the only time this test has failed I'm inclined to think it may be caused by something else

* es/master: (45 commits) Adapt scroll rest test after backport. relates #27842 Move early termination based on index sort to TopDocs collector (#27666) Upgrade beats templates that we use for bwc testing. (#27929) ingest: upgraded ingest geoip's geoip2's dependencies. [TEST] logging for update by query test #27820 Add elasticsearch-nio jar for base nio classes (#27801) Use full profile on JDK 10 builds Require Gradle 4.3 Enable grok processor to support long, double and boolean (#27896) Add unreleased v6.1.2 version TEST: reduce blob size #testExecuteMultipartUpload Check index under the store metadata lock (#27768) Fixes DocStats to not report index size < -1 (#27863) Fixed test to be up to date with the new database files. Upgrade to Lucene 7.2.0. (#27910) Disable TestZenDiscovery in cloud providers integrations test Use `_refresh` to shrink the version map on inactivity (#27918) Make KeyedLock reentrant (#27920) ingest: Upgraded the geolite2 databases. [Test] Fix IndicesClientDocumentationIT (#27899) ...

* es/6.x: (43 commits) ingest: upgraded ingest geoip's geoip2's dependencies. [TEST] logging for update by query test #27820 Use full profile on JDK 10 builds Require Gradle 4.3 Add unreleased v6.1.2 version TEST: reduce blob size #testExecuteMultipartUpload Check index under the store metadata lock (#27768) Upgrade to Lucene 7.2.0. (#27910) Fixed test to be up to date with the new database files. Use `_refresh` to shrink the version map on inactivity (#27918) Make KeyedLock reentrant (#27920) Fixes DocStats to not report index size < -1 (#27863) Disable TestZenDiscovery in cloud providers integrations test ingest: Upgraded the geolite2 databases. [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (#27717) [Test] Fix IndicesClientDocumentationIT (#27899) Move uid lock into LiveVersionMap (#27905) Mute testRetentionPolicyChangeDuringRecovery Increase Gradle heap space to 1536m Move GlobalCheckpointTracker and remove SequenceNumbersService (#27837) ...

nik9000 · 2018-01-23T17:48:23Z

This looks like the node hanging for an unreasonable amount of time. The failure isn't particularly reindex based either. And we haven't seen another thing like this since. I'm going to close this and we'll reopen if we see something similar.

jtibshirani · 2019-04-09T21:09:03Z

A recent failure popped up that may be related. The test timed out while attempting to process cluster events for 'put mapping'. I don't see general slowness or node hangs, but the logs are difficult to understand because lines from different outputs are interleaved.

Reproduction line:

./gradlew :modules:reindex:test --tests "org.elasticsearch.index.reindex.UpdateByQueryBasicTests.testMultipleSources" \
  -Dtests.seed=2CC9E93C207EC061 \
  -Dtests.security.manager=true \
  -Dtests.locale=ee-GH \
  -Dtests.timezone=US/Pacific-New \
  -Dcompiler.java=12 \
  -Druntime.java=11

Link to build logs: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=debian-9/320/console

Relevant excerpt:

 2> java.lang.AssertionError: 
    Expected: an empty iterable
  1> --------[test3][3], node[ijtePTJlRCOtyTK6Cmekyw], [P], s[STARTED], a[id=ZBh60YktReS7ukvj5RG8-g]
         but: [<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>,<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>,<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>,<ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping) within 13s]>]
        at __randomizedtesting.SeedInfo.seed([2CC9E93C207EC061:AB60C528CA346E35]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
  1> --------[test3][2], node[ijtePTJlRCOtyTK6Cmekyw], [R], s[STARTED], a[id=AofokK8sQv26mX0qWHqW-g]
  1> --------[test3][7], node[ijtePTJlRCOtyTK6Cmekyw], [P], s[STARTED], a[id=9rHRKFY3QsqIN5SIFwtymQ]
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
  1> --------[test3][8], node[ijtePTJlRCOtyTK6Cmekyw], [R], s[STARTED], a[id=os7XYhW1StKht1dCdlNHMg]
  1> --------[test3][5], node[ijtePTJlRCOtyTK6Cmekyw], [P], s[STARTED], a[id=u4X28Zq6RaedP9u9bRUO_Q]
  1> --------[test3][4], node[ijtePTJlRCOtyTK6Cmekyw], [R], s[STARTED], a[id=ZoCBY6WoTmi_zstDsmUoYQ]
        at org.elasticsearch.test.ESIntegTestCase.indexRandom(ESIntegTestCase.java:1521)

danielmitterdorfer added the >test Issues or PRs that are addressing/adding tests label Dec 14, 2017

danielmitterdorfer assigned andyb-elastic Dec 14, 2017

andyb-elastic added a commit that referenced this issue Dec 20, 2017

[TEST] logging for update by query test #27820

8634326

andyb-elastic added a commit that referenced this issue Dec 20, 2017

[TEST] logging for update by query test #27820

63afabc

clintongormley added the >test-failure Triaged test failures from CI label Jan 11, 2018

nik9000 added the :Reindex API label Jan 23, 2018

nik9000 closed this as completed Jan 23, 2018

lcawl added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Reindex API labels Feb 13, 2018

ChrisHegarty unassigned andyb-elastic Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

danielmitterdorfer commented Dec 14, 2017

andyb-elastic commented Dec 18, 2017

danielmitterdorfer commented Dec 19, 2017

andyb-elastic commented Dec 20, 2017

nik9000 commented Jan 23, 2018

jtibshirani commented Apr 9, 2019

UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

UpdateByQueryBasicTests#testMultipleSources fails in CI #27820

Comments

danielmitterdorfer commented Dec 14, 2017

andyb-elastic commented Dec 18, 2017

danielmitterdorfer commented Dec 19, 2017

andyb-elastic commented Dec 20, 2017

nik9000 commented Jan 23, 2018

jtibshirani commented Apr 9, 2019