Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

Closed
martijnvg opened this issue Feb 13, 2019 · 2 comments · Fixed by #38838
Closed

[CI] CcrRollingUpgradeIT#testIndexFollowing failure #38835

martijnvg opened this issue Feb 13, 2019 · 2 comments · Fixed by #38838
Assignees
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI

Comments

@martijnvg
Copy link
Member

martijnvg commented Feb 13, 2019

ERROR   3.80s | CcrRollingUpgradeIT.testIndexFollowing <<< FAILURES!
   > Throwable #1: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:43255], URI [/follower_index3/_ccr/follow?wait_for_active_shards=1], status line [HTTP/1.1 400 Bad Request]
   > {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[upgraded-node-follower-0][127.0.0.1:36653][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"the leader and follower index settings must be identical"},"status":400}
   > 	at __randomizedtesting.SeedInfo.seed([A114A4C994A3D7F3:2ECAB4031417AA50]:0)
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+release-tests/x-pack/qa/rolling-upgrade-multi-cluster/build/testrun/v7.1.0#follower#twoThirdsUpgradedTestRunner/J0/temp/org.elasticsearch.upgrades.CcrRollingUpgradeIT_A114A4C994A3D7F3-001
   > 	at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260)
   > 	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238)
   > 	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212)
  2> NOTE: test params are: codec=Lucene80, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@3ab677d8), locale=ar-AE, timezone=SystemV/CST6
   > 	at org.elasticsearch.upgrades.CcrRollingUpgradeIT.followIndex(CcrRollingUpgradeIT.java:134)
  2> NOTE: Linux 3.10.0-957.5.1.el7.x86_64 amd64/Oracle Corporation 1.8.0_202 (64-bit)/cpus=16,threads=1,free=424806384,total=514850816
  2> NOTE: All tests run in this JVM: [CcrRollingUpgradeIT]
   > 	at org.elasticsearch.upgrades.CcrRollingUpgradeIT.testIndexFollowing(CcrRollingUpgradeIT.java:70)
   > 	at java.lang.Thread.run(Thread.java:748)
Completed [1/1] in 6.26s, 2 tests, 1 error, 1 skipped <<< FAILURES!

This rolling upgrade test only exists in master, and hasn't yet been backported.
I'm not yet able to reproduce this failure. Somehow the leader / follower settings are not aligned.

Build url: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/444/console

@martijnvg martijnvg added >test-failure Triaged test failures from CI :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Feb 13, 2019
@martijnvg martijnvg self-assigned this Feb 13, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

martijnvg added a commit that referenced this issue Feb 13, 2019
Relates to #38835
@martijnvg
Copy link
Member Author

I locally changed the TransportResumeFollowAction#validate(...) method to include the index settings of both follow and leader indices in the exception message:

Throwable #1: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:40807], URI [/follower_index3/_ccr/follow?wait_for_active_shards=1], status line [HTTP/1.1 400 Bad Request]
   > {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[upgraded-node-follower-0][127.0.0.1:33471][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"the leader [{\"index.number_of_shards\":\"1\"}] and follower index [{\"index.number_of_shards\":\"1\",\"index.version.upgraded\":\"8000099\"}] settings must be identical"},"status":400}

The follower index has the index.version.upgrade index setting and the leader index doesn't have that index setting. I suspect that the follower index got this setting when ccr restored the leader index on an upgraded node. The fix is that TransportResumeFollowAction#validate(...) should also ignore any index.version* settings.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 13, 2019
The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes elastic#38835
martijnvg added a commit that referenced this issue Feb 13, 2019
…ing (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 13, 2019
…ing (elastic#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes elastic#38835
martijnvg added a commit that referenced this issue Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit that referenced this issue Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 14, 2019
* Add rolling upgrade multi cluster test module (elastic#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to elastic#37231 and elastic#38037

* Filter out upgraded version index settings when starting index following (elastic#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes elastic#38835
martijnvg added a commit that referenced this issue Feb 14, 2019
* Add rolling upgrade multi cluster test module (#38277)

This test starts 2 clusters, each with 3 nodes.
First the leader cluster is started and tests are run against it and
then the follower cluster is started and tests execute against this two cluster.

Then the follower cluster is upgraded, one node at a time.
After that the leader cluster is upgraded, one node at a time.
Every time a node is upgraded tests are ran while both clusters are online.
(and either leader cluster has mixed node versions or the follower cluster)

This commit only tests CCR index following, but could be used for CCS tests as well.
In particular for CCR, unidirectional index following is tested during a rolling upgrade.
During the test several indices are created and followed in the leader cluster before or
while the follower cluster is being upgraded.

This tests also verifies that attempting to follow an index in the upgraded cluster
from the not upgraded cluster fails. After both clusters are upgraded following the
index that previously failed should succeed.

Relates to #37231 and #38037

* Filter out upgraded version index settings when starting index following (#38838)

The `index.version.upgraded` and `index.version.upgraded_string` are likely
to be different between leader and follower index. In the event that
a follower index gets restored on a upgraded node while the leader index
is still on non-upgraded nodes.

Closes #38835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants