[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

martijnvg · 2019-02-13T08:25:34Z

ERROR   3.80s | CcrRollingUpgradeIT.testIndexFollowing <<< FAILURES!
   > Throwable #1: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:43255], URI [/follower_index3/_ccr/follow?wait_for_active_shards=1], status line [HTTP/1.1 400 Bad Request]
   > {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[upgraded-node-follower-0][127.0.0.1:36653][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"the leader and follower index settings must be identical"},"status":400}
   > 	at __randomizedtesting.SeedInfo.seed([A114A4C994A3D7F3:2ECAB4031417AA50]:0)
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+release-tests/x-pack/qa/rolling-upgrade-multi-cluster/build/testrun/v7.1.0#follower#twoThirdsUpgradedTestRunner/J0/temp/org.elasticsearch.upgrades.CcrRollingUpgradeIT_A114A4C994A3D7F3-001
   > 	at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260)
   > 	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238)
   > 	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212)
  2> NOTE: test params are: codec=Lucene80, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@3ab677d8), locale=ar-AE, timezone=SystemV/CST6
   > 	at org.elasticsearch.upgrades.CcrRollingUpgradeIT.followIndex(CcrRollingUpgradeIT.java:134)
  2> NOTE: Linux 3.10.0-957.5.1.el7.x86_64 amd64/Oracle Corporation 1.8.0_202 (64-bit)/cpus=16,threads=1,free=424806384,total=514850816
  2> NOTE: All tests run in this JVM: [CcrRollingUpgradeIT]
   > 	at org.elasticsearch.upgrades.CcrRollingUpgradeIT.testIndexFollowing(CcrRollingUpgradeIT.java:70)
   > 	at java.lang.Thread.run(Thread.java:748)
Completed [1/1] in 6.26s, 2 tests, 1 error, 1 skipped <<< FAILURES!

This rolling upgrade test only exists in master, and hasn't yet been backported.
I'm not yet able to reproduce this failure. Somehow the leader / follower settings are not aligned.

Build url: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/444/console

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-02-13T08:25:36Z

Pinging @elastic/es-distributed

Relates to #38835

martijnvg · 2019-02-13T09:14:27Z

I locally changed the TransportResumeFollowAction#validate(...) method to include the index settings of both follow and leader indices in the exception message:

Throwable #1: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:40807], URI [/follower_index3/_ccr/follow?wait_for_active_shards=1], status line [HTTP/1.1 400 Bad Request]
   > {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[upgraded-node-follower-0][127.0.0.1:33471][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"the leader [{\"index.number_of_shards\":\"1\"}] and follower index [{\"index.number_of_shards\":\"1\",\"index.version.upgraded\":\"8000099\"}] settings must be identical"},"status":400}

The follower index has the index.version.upgrade index setting and the leader index doesn't have that index setting. I suspect that the follower index got this setting when ccr restored the leader index on an upgraded node. The fix is that TransportResumeFollowAction#validate(...) should also ignore any index.version* settings.

The `index.version.upgraded` and `index.version.upgraded_string` are likely to be different between leader and follower index. In the event that a follower index gets restored on a upgraded node while the leader index is still on non-upgraded nodes. Closes elastic#38835

…ing (#38838) The `index.version.upgraded` and `index.version.upgraded_string` are likely to be different between leader and follower index. In the event that a follower index gets restored on a upgraded node while the leader index is still on non-upgraded nodes. Closes #38835

…ing (elastic#38838) The `index.version.upgraded` and `index.version.upgraded_string` are likely to be different between leader and follower index. In the event that a follower index gets restored on a upgraded node while the leader index is still on non-upgraded nodes. Closes elastic#38835

* Add rolling upgrade multi cluster test module (#38277) This test starts 2 clusters, each with 3 nodes. First the leader cluster is started and tests are run against it and then the follower cluster is started and tests execute against this two cluster. Then the follower cluster is upgraded, one node at a time. After that the leader cluster is upgraded, one node at a time. Every time a node is upgraded tests are ran while both clusters are online. (and either leader cluster has mixed node versions or the follower cluster) This commit only tests CCR index following, but could be used for CCS tests as well. In particular for CCR, unidirectional index following is tested during a rolling upgrade. During the test several indices are created and followed in the leader cluster before or while the follower cluster is being upgraded. This tests also verifies that attempting to follow an index in the upgraded cluster from the not upgraded cluster fails. After both clusters are upgraded following the index that previously failed should succeed. Relates to #37231 and #38037 * Filter out upgraded version index settings when starting index following (#38838) The `index.version.upgraded` and `index.version.upgraded_string` are likely to be different between leader and follower index. In the event that a follower index gets restored on a upgraded node while the leader index is still on non-upgraded nodes. Closes #38835

* Add rolling upgrade multi cluster test module (elastic#38277) This test starts 2 clusters, each with 3 nodes. First the leader cluster is started and tests are run against it and then the follower cluster is started and tests execute against this two cluster. Then the follower cluster is upgraded, one node at a time. After that the leader cluster is upgraded, one node at a time. Every time a node is upgraded tests are ran while both clusters are online. (and either leader cluster has mixed node versions or the follower cluster) This commit only tests CCR index following, but could be used for CCS tests as well. In particular for CCR, unidirectional index following is tested during a rolling upgrade. During the test several indices are created and followed in the leader cluster before or while the follower cluster is being upgraded. This tests also verifies that attempting to follow an index in the upgraded cluster from the not upgraded cluster fails. After both clusters are upgraded following the index that previously failed should succeed. Relates to elastic#37231 and elastic#38037 * Filter out upgraded version index settings when starting index following (elastic#38838) The `index.version.upgraded` and `index.version.upgraded_string` are likely to be different between leader and follower index. In the event that a follower index gets restored on a upgraded node while the leader index is still on non-upgraded nodes. Closes elastic#38835

* Add rolling upgrade multi cluster test module (#38277) This test starts 2 clusters, each with 3 nodes. First the leader cluster is started and tests are run against it and then the follower cluster is started and tests execute against this two cluster. Then the follower cluster is upgraded, one node at a time. After that the leader cluster is upgraded, one node at a time. Every time a node is upgraded tests are ran while both clusters are online. (and either leader cluster has mixed node versions or the follower cluster) This commit only tests CCR index following, but could be used for CCS tests as well. In particular for CCR, unidirectional index following is tested during a rolling upgrade. During the test several indices are created and followed in the leader cluster before or while the follower cluster is being upgraded. This tests also verifies that attempting to follow an index in the upgraded cluster from the not upgraded cluster fails. After both clusters are upgraded following the index that previously failed should succeed. Relates to #37231 and #38037 * Filter out upgraded version index settings when starting index following (#38838) The `index.version.upgraded` and `index.version.upgraded_string` are likely to be different between leader and follower index. In the event that a follower index gets restored on a upgraded node while the leader index is still on non-upgraded nodes. Closes #38835

martijnvg added >test-failure Triaged test failures from CI :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Feb 13, 2019

martijnvg self-assigned this Feb 13, 2019

martijnvg added a commit that referenced this issue Feb 13, 2019

muted test

c9f0ca0

Relates to #38835

martijnvg mentioned this issue Feb 13, 2019

Filter out upgraded version index settings when starting index following #38838

Merged

martijnvg closed this as completed in #38838 Feb 13, 2019

alpar-t mentioned this issue Feb 14, 2019

Upgrade to Gradle 5.2.1 #38880

Merged

pedrostil mentioned this issue Feb 25, 2019

Add support for S3 intelligent tiering (#38836) #39376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

martijnvg commented Feb 13, 2019 •

edited

Loading

elasticmachine commented Feb 13, 2019

martijnvg commented Feb 13, 2019

[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

Comments

martijnvg commented Feb 13, 2019 • edited Loading

elasticmachine commented Feb 13, 2019

martijnvg commented Feb 13, 2019

martijnvg commented Feb 13, 2019 •

edited

Loading