Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexingIT.testUpdateSnapshotStatus fails due to a snapshot with the same name being present in master #47406

Closed
dakrone opened this issue Oct 1, 2019 · 6 comments · Fixed by #47458
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@dakrone
Copy link
Member

dakrone commented Oct 1, 2019

The failure looks like:

org.elasticsearch.backwards.IndexingIT > testUpdateSnapshotStatus FAILED
    org.elasticsearch.client.ResponseException: method [PUT], host [http://127.0.0.1:39507], URI [/_snapshot/repo/bwc-snapshot?wait_for_completion=true], status line [HTTP/1.1 400 Bad Request]
    {"error":{"root_cause":[{"type":"invalid_snapshot_name_exception","reason":"[repo:bwc-snapshot] Invalid snapshot name [bwc-snapshot], snapshot with the same name already exists"}],"type":"invalid_snapshot_name_exception","reason":"[repo:bwc-snapshot] Invalid snapshot name [bwc-snapshot], snapshot with the same name already exists"},"status":400}
        at __randomizedtesting.SeedInfo.seed([18868B88F75EAF0E:7D69A4C50DAE9DB9]:0)
        at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:253)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:231)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:237)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:205)
        at org.elasticsearch.backwards.IndexingIT.testUpdateSnapshotStatus(IndexingIT.java:267)

The reproduction line is:

./gradlew ':qa:mixed-cluster:v7.5.0#mixedClusterTest' --tests "org.elasticsearch.backwards.IndexingIT.testUpdateSnapshotStatus" -Dtests.seed=18868B88F75EAF0E -Dtests.security.manager=true -Dtests.locale=gsw -Dtests.timezone=Etc/GMT+12 -Dcompiler.java=12 -Druntime.java=11

But it did not reproduce for me.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob+fast+bwc/1496/consoleFull
https://gradle-enterprise.elastic.co/s/34ngdnkksq3ce

@dakrone dakrone added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Oct 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear original-brownbear self-assigned this Oct 1, 2019
@original-brownbear
Copy link
Member

Looks like it might be some REST test retry hitting us here again (the snapshot by that name but with a different UUID finishes concurrently to this failure here). Will look into it in more detail tomorrow.

@original-brownbear
Copy link
Member

This is an actual bug, tripping an assertion in a BwC test:

» ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.5.0-0] fatal error in thread [elasticsearch[v7.5.0-0][generic][T#3]], exiting
»  java.lang.AssertionError: Received null generation for shard state [SUCCESS]

on it

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 2, 2019
This relates to the effort towards elastic#46250. We added
tracking of the shard generation for successful
snapshots to `8.0`.
This assertion isn't correct though. While an `8.0`
master won't create an entry with sucess state and
a null shard generation it may still (on e.g. master
failover) send a success entry created by a 7.x master
with a `null` generation over the wire.

Closes elastic#47406
original-brownbear added a commit that referenced this issue Oct 3, 2019
This relates to the effort towards #46250. We added
tracking of the shard generation for successful
snapshots to `8.0`.
This assertion isn't correct though. While an `8.0`
master won't create an entry with sucess state and
a null shard generation it may still (on e.g. master
failover) send a success entry created by a 7.x master
with a `null` generation over the wire.

Closes #47406
@DaveCTurner
Copy link
Contributor

This looks to be affecting 7.x too: https://gradle-enterprise.elastic.co/s/c6xy2fgsavlhm

@DaveCTurner DaveCTurner reopened this Oct 25, 2019
@original-brownbear
Copy link
Member

original-brownbear commented Oct 25, 2019

This is a new issue:


» ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.5.0-1] fatal error in thread [elasticsearch[v7.5.0-1][generic][T#5]], exiting
--
»  java.lang.AssertionError: Received null generation for shard state [SUCCESS]
»  	at org.elasticsearch.cluster.SnapshotsInProgress$ShardSnapshotStatus.<init>(SnapshotsInProgress.java:357) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.SnapshotsInProgress.<init>(SnapshotsInProgress.java:517) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.AbstractNamedDiffable$CompleteNamedDiff.<init>(AbstractNamedDiffable.java:87) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.AbstractNamedDiffable.readDiffFrom(AbstractNamedDiffable.java:47) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.SnapshotsInProgress.readDiffFrom(SnapshotsInProgress.java:496) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.NamedDiffableValueSerializer.readDiff(NamedDiffableValueSerializer.java:56) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.NamedDiffableValueSerializer.readDiff(NamedDiffableValueSerializer.java:30) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.DiffableUtils$MapDiff.<init>(DiffableUtils.java:406) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.DiffableUtils$ImmutableOpenMapDiff.<init>(DiffableUtils.java:234) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.DiffableUtils.readImmutableOpenMapDiff(DiffableUtils.java:127) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.ClusterState$ClusterStateDiff.<init>(ClusterState.java:885) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.ClusterState.readDiffFrom(ClusterState.java:793) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.coordination.PublicationTransportHandler.handleIncomingPublishRequest(PublicationTransportHandler.java:418) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.cluster.coordination.PublicationTransportHandler.lambda$new$0(PublicationTransportHandler.java:100) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.0-SNAPSHOT.jar:7.6.0-SNAPSHOT]
»  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_221]
»  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_221]
»  	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]


There's some trouble in BwC between 7.6 and 7.5 for the shard generation handling it seems. Looking into it shortly.

EDIT: Turns out this was just a missing backport #48514

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Oct 25, 2019
This relates to the effort towards elastic#46250. We added
tracking of the shard generation for successful
snapshots to `8.0`.
This assertion isn't correct though. While an `8.0`
master won't create an entry with sucess state and
a null shard generation it may still (on e.g. master
failover) send a success entry created by a 7.x master
with a `null` generation over the wire.

Closes elastic#47406
original-brownbear added a commit that referenced this issue Oct 25, 2019
This relates to the effort towards #46250. We added
tracking of the shard generation for successful
snapshots to `8.0`.
This assertion isn't correct though. While an `8.0`
master won't create an entry with sucess state and
a null shard generation it may still (on e.g. master
failover) send a success entry created by a 7.x master
with a `null` generation over the wire.

Closes #47406
@original-brownbear
Copy link
Member

Fixed in 7.x via #48514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants