Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Concurrent Repo Modification to Fix Test #48433

Merged
merged 1 commit into from
Oct 24, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,10 @@ public void testRetentionWhileSnapshotInProgress() throws Exception {
SnapshotsStatusResponse s =
client().admin().cluster().prepareSnapshotStatus(REPO).setSnapshots(completedSnapshotName).get();
assertNull("expected no snapshot but one was returned", s.getSnapshots().get(0));
} catch (RepositoryException e) {
// Concurrent status calls and write operations may lead to failures in determining the current repository generation
// TODO: Remove this hack once tracking the current repository generation has been made consistent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue we could link to here so that we have something we can reference to see if it's safe to remove the hack?

Copy link
Member Author

@original-brownbear original-brownbear Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet I'm afraid. We have #38941 but the fix to the corruption issue may not fully resolve this situation yet.

Not even sure we have to solve this one with any kind of priority outside of tests:

What's basically happening here is that the snapshot status API breaks for a tiny window during snapshot delete and create (in practice that window will be a little more than the latency of one API request so it's really really hard to actually run into it and you'll probably run into other IO issues more often than this ... but as it turns out SLM tests are the only tests running these APIs in hot loops and are shaking these kinds of issues out). Maybe this is even an ok long term solution? (I'll gather some feedback on that tomorrow and will create an issue accordingly :))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this today during snapshot resiliency sync and I'll code up a fix for this shortly and will remove the todos :)

throw new AssertionError(e);
} catch (SnapshotMissingException e) {
// Great, we wanted it to be deleted!
}
Expand Down