-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keystore loading behavior is inconsistent between reload and startup #43722
Comments
Pinging @elastic/es-core-infra |
You will find it hard to make this case. If we have a realm configured in
and then a keystore with
Then the user has made a mistake. They've named their realm "corp_ldap" in the YAML and "corp-ldap" in the keystore. We want to detect that and fail fast. We could (and in the past have) make the assumption that the user knows what they're doing, and in this case they might be planing to create a second "corp-ldap" realm in the future, but such leniency leads to far more problems than it solves. I understand why this causes problems. From an admin point of view, there's no reasonable option to update both files (YAML and Keystore) in a way that guarantees your node can be safely restarted. If you change the yml first, and the node is restarted before you change the keystore it will fail (which is the correct outcome, but not nice from an operations point of view), and likewise if you update them in the reverse.
I think we can make progress on this (or at least, even if we're not ready to make it a hard failure, it could report back the settings that were incorrect). However, there are 2 complexities: (1) On our side, it's a little trickier than it sounds. (2) On your side, I don't think this would actually do what you want. However, if you were able to differentiate between keystore changes that are part of a larger "config change" (where you change both the yml and the keystore) and those that are intended to modify a running node (e.g. a keystore change to update a password) then we probably could make some progress.
Other random thoughtsI wonder whether there's a lot of use case for the dynamically adding a setting to the keystore. Most of the cases that I can think of are for updating a value rather than adding it. Updating an existing setting is alaways a safer change because you know that the key was valid when the node started, so the only potential problem is that the value is wrong (which could still prevent node startup). How do you deal with the fact that not all keystore settings are dynamic? |
Indeed, they have, and we do! I agree, that ES should detect and fail fast situations when configuration is invalid, just not when there is some (yet) unused key in the keystore. I am not sure I understand what is the harm in letting people add things to keystore before the rest of configuration is in place (like in my example above, it could actually be handled by different people).
Credentials for snapshot repos is one example I can think of.
We don't (seems like that would require too much knowledge from us about every individual setting), it is up to the user to do a rolling restart after they change something that is not reloadable in ES. |
There is no reasonable heuristic that can tell the difference between a mistake and an intentional pre-configuration well enough to be of benefit to users. It presumes that a typo in a setting key, will will always cause a hard failure due to the correct key being missing. In some cases this is true, but it is not always true. Some settings are optional, it is intentional that the node will start without the setting being present. In such cases, a typo in a setting can only be detected because the incorrect setting exists, which is why we enforce that every setting has to be meaningful. This is not just an academic problem. I have certainly seen far more cases of people putting unsupported entries into the keystore because they made a mistake (either a typo in the key, or misreading the docs and getting the key name wrong, or putting non-secure settings into the keystore) than I have seen people intentionally put settings in there that they knew would be unused (yet). |
I think you are mixing concerns here. The concerns at play are:
The approach taken by you is that when incorrect settings are added to keystore, you fail the process startup. You think you are failing fast. You are actually violating the design boundary of keystore though. A keystore should have the capability of containing arbitrary keys. It is the process accessing it that should know how to handle them. Thus the keystore is becoming a validator. If you want to help users detect incorrect settings, give them a script to validate which settings are correct. Here is how the design of the tools should be:
Putting older entries, redundant entries in a keystore is a perfectly valid scenario. Imagine a multi-tenant keystore situation. The keys for all processes are added to the keystore for simplicity and each process picks what is necessary. Another mistake you are doing design wise is that there are two sources of truth for the elasticsearch process. If there are two sources of truth, eventually there will always be conflict (which is kind of why we are having this discussion). To have a single source of truth, it is preferable to rely on the YAML only and |
The elasticsearch keystore is not, and was never intended to be a general purpose keystore. It should not be shared between processes (and cannot be because it must be stored in the same directory as the YAML). There are other designs that we could have implemented, including the reference approach that you are advocating, however such a design would not have satisfied the constraints that we wanted for the keystore. |
I understand that we might be coming up with suggestions based on partial knowledge and hence not being able to figure out why something makes sense for you. I am not sure if you would be at the liberty of sharing more design details. I think the actionable points to know would be the following:
|
To expand on what @tvernum has been saying here, we have long wanted to add validation the keystore cli. The way validation of all settings works (elasticsearch.yml and the keystore) is by the code internally registering every allowed setting, and launching a node validates the type and data of each setting found. Unfortunately running this validation in the cli is difficult because plugins are allowed to add settings (and in fact much of the system, including all of xpack, is implemented through plugin apis). These plugins are loaded into separate classloaders when a node is launched, and we also are already locked down with a SecurityManager. Additionally, these plugins can do complex initialization to work correctly with the security manager. In the keystore cli tool, we do not run with a security manager, so loading the plugins is not currently an option. The long term plan is to extract the part of plugins that expose settings into a minimal api that would not trigger loading the complex setup the full plugin initialization needs.
This was considered when we developed our keystore, but it would have complicated how the keystore settings are ready by the system, especially around validation. It is not something we plan to change.
As described above, we want the keystore cli itself to do this validation, and we very much want to add it in a future release, but there is not yet any concrete timeline for when that will happen. |
I've opened a separate issue, #46148, which details the desire to have keystore validation. I'm closing this issue in favor of that one. |
If Elasticsearch finds a key it does not recognize in keystore on start up, it refuses to start. However, if keystore is reloaded dynamically, the errors are not reported.
This is a problem in the Cloud, because a user may add a value to keystore, and then, possibly months later the cluster would start bootlooping after an unrelated change.
Making the behavior consistent in either of two ways would solve the problem: either make reload API fail and return an error if settings are invalid, or allow startup process to continue even if a key is not recognized.
My preference would be the latter option: I think there are valid use cases when a user would want to add values to keystore before configuring the rest of functionality.
For example, a domain admin might add ldap secrets to the cluster, without having to coordinate with the cluster owner to configure rest of the settings.
/cc @tvernum
The text was updated successfully, but these errors were encountered: