-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot restore snapshot on new cluster #78320
Comments
I think that we should add an option to ignore (may be by default) system indices. In which case the |
Pinging @elastic/es-distributed (Team:Distributed) |
Pinging @elastic/es-data-management (Team:Data Management) |
There's a couple concerns here:
This restores the cluster state (cluster setttings, etc.) as well as system indices. The second way specifies the features1 which should have their state overwritten. The features present in the cluster depends on the installed plugins and can be viewed with the Get Features API, or get We can tell Elasticsearch that only
I've tested these both locally on a 7.15.0 cluster. While we have a workaround here, it's clear that the initial user experience isn't the best. I'm not sure how it would be best to improve it without changing the behavior in such a way that it acts dangerously by default, but at a bare minimum we can improve the error message here. Footnotes
|
I was restoring a snapshot that contained multiple indexes. Indeed, I've managed to restore the snapshot by manually specifying the indexes, but this is a workaround. |
While I agree that this would be ideal, there are things that make this difficult, if not impossible, to achieve without sacrificing other critical qualities. Especially when the cluster(s) in question have some amount of configuration already in place. At snapshot creation time we want to default to capturing as much data as possible: Not just user-created indices, but system-owned indices and global cluster state as well. That way, we can be sure that if the snapshot wasn't configured more precisely, we have whatever the user wanted to save. But when we go to restore that snapshot, we need to make sure that restoring the snapshot won't have any destructive effects on any data already in the cluster unless Elasticsearch has explicitly been told that's okay by an administrator. In order to make that happen without any arguments at all, we'd have to choose between:
Both options lead to obscure problems where the data in the system isn't what one would expect. If we encounter a situation where the only way forward is to drop data, we've found that it's best to just raise an error and ask a human rather than guessing what the best thing to do is. Unfortunately, while this frequently averts disaster, it does mean that some of our APIs are picky. All of this to say: I don't necessarily disagree with you that the current situation isn't too user-friendly and should be improved, but this is a hard problem to solve and figuring out how to improve it is likely to be challenging. We could simply omit this system index, but this error will happen any time you try to restore a snapshot that contains an index that's already present in the restoring cluster, so that would be a band-aid fix for one very particular instance of this problem - a similar situation can occur with the |
From a user perspective, I think the most common sense would be to merge the existing data with the snapshot data (by default). |
While that might seem intuitive, what's the intuitive behavior if those indices have document IDs that conflict? Do you take the one from the cluster or from the snapshot? Or do you merge them? If so, how does that logic work - do fields from the live index or the snapshot take priority? What about all the applications that are out there today that can't handle a foreign process merging documents into an index that they expect to have complete control over? Regardless of whether it would be intuitive, merging two indices is not something Elasticsearch is capable of at this time or at any time in the near future. The post you link to is effectively reindexing both indices, which will take vastly more time and resources than restoring a snapshot. Being able to merge indices in a more efficient way would be both challenging (see above) and consume a lot of our development resources which could be spent building something else. To bring this back around to the original issue: I think we can correct this behavior in the next major version to at least be a little more intuitive. For the 7.x series of releases, there's nothing we can do without breaking our backwards compatibility policy. But in 8.0 and later, I believe we can change the behavior on snapshot restoration to not include system indices unless they're requested via the |
As the user requested a snapshot restore, the snapshot takes priority, so overwrite. I'm guessing the case where only some fields differ in the same document is very rare, because docs should be immutable (add new doc instead of updating old) so this should cover most user cases (noop for overwrite same doc with same fields and values).
You provide additional options to the restore command so that they have control over it. But the default behavior should be made so that it works for most user cases. |
Given the proliferation of system indices over 6 and 7.X, perhaps it's
worth revisiting some of the longer held defaults snapshot and restore uses
(that seem to be causing the issue here)?
…On Sat, 2 Oct 2021, 23:02 cdalexndr, ***@***.***> wrote:
what's the intuitive behavior if those indices have document IDs that
conflict? Do you take the one from the cluster or from the snapshot? Or do
you merge them?
As the user requested a snapshot restore, the snapshot takes priority, so
overwrite. I'm guessing the case where only some fields differ in the same
document is very rare, because docs should be immutable (add new doc
instead of updating old) so this should cover most user cases (noop for
overwrite same doc with same fields and values).
What about all the applications that are out there today that can't handle
a foreign process merging documents into an index that they expect to have
complete control over?
You provide additional options to the restore command so that they have
control over it. But the default behavior should be made so that it works
for most user cases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#78320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYJQTVY6JTJFXWE7Y4ZNMDUE37GNANCNFSM5E2RBY7Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Elasticsearch version (
bin/elasticsearch --version
): Version: 7.14.1, Build: default/docker/66b55ebfa59c92c15db3f69a335d500018b3331e/2021-08-26T09:01:05.390870785Z, JVM: 16.0.2Plugins installed: []
JVM version (
java -version
): OpenJDK 64-Bit Server VM Temurin-16.0.2+7 (build 16.0.2+7, mixed mode, sharing)OS version (
uname -a
if on a Unix-like system): Linux d3463a9ac7de 4.9.0-14-amd64 #1 SMP Debian 4.9.246-2 (2020-12-17) x86_64 x86_64 x86_64 GNU/LinuxDescription of the problem including expected versus actual behavior:
Trying to restore snapshot on a new single node cluster throws error:
Steps to reproduce:
PUT /_snapshot/backup/%3Csnapshot-%7Bnow%2Fd%7D%3E
POST /_snapshot/backup/snapshot-2021.09.23/_restore
Provide logs (if relevant):
Discuss url: https://discuss.elastic.co/t/cannot-restore-snapshot-to-new-single-node-cluster/285025
The text was updated successfully, but these errors were encountered: