Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TemplateUpgradeService runs updates under existing ThreadContext #30603

Closed
tvernum opened this issue May 15, 2018 · 8 comments
Closed

TemplateUpgradeService runs updates under existing ThreadContext #30603

tvernum opened this issue May 15, 2018 · 8 comments
Labels
:Core/Infra/Plugins Plugin API and infrastructure :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc)

Comments

@tvernum
Copy link
Contributor

tvernum commented May 15, 2018

The TemplateUpgradeService has a high level flow of:

  • receive ClusterChangedEvent
  • check preconditions (global block, master node, etc)
  • gather necessary updates from registered upgraders (plugins)
  • apply template updates on the generic thread pool.

However

  • the ClusterChangedEvent comes in with the same ThreadContext as the action that triggered the event (which might be a node join/leave, but it also might be a settings change or index create/delete over REST).
  • The generic threadpool execute preserves the ThreadContext from the calling code.

Consequently, the template update runs with a ThreadContext that matches the original triggering action.
If X-Pack Security is enabled, that means that update which should run as _system might attempt to run as the user which authenticated to the Rest API. That user may not have privileges to perform that update.

@tvernum tvernum added :Core/Infra/Plugins Plugin API and infrastructure :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) team-discuss labels May 15, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@jaymode jaymode self-assigned this May 15, 2018
jaymode added a commit to jaymode/elasticsearch that referenced this issue May 15, 2018
The TemplateUpgradeService is a system service that allows for plugins
to register templates that need to be upgraded. These template upgrades
should always happen in a system context as they are not a user
initiated action. For security integrations, the lack of running this
in a system context could lead to unexpected failures. The changes in
this commit set an empty system context for the execution of the
template upgrades performed by this service.

Closes elastic#30603
@jaymode
Copy link
Member

jaymode commented May 15, 2018

@tvernum I am removing the discuss label as I am not sure that discussion is needed about this issue. Please add it back and comment about what needs to be discussed if you still believe there is something to discuss.

@jaymode jaymode removed the discuss label May 15, 2018
@tvernum
Copy link
Contributor Author

tvernum commented May 16, 2018

@ywelsch And I discussed this just after I raised it, and believed that it would be good to discuss this more widely.
We were of the view that cluster state events should be published in a system context (which they probably are on non-master nodes, but the master node shortcuts its own update event).

jaymode added a commit that referenced this issue May 16, 2018
The TemplateUpgradeService is a system service that allows for plugins
to register templates that need to be upgraded. These template upgrades
should always happen in a system context as they are not a user
initiated action. For security integrations, the lack of running this
in a system context could lead to unexpected failures. The changes in
this commit set an empty system context for the execution of the
template upgrades performed by this service.

Relates #30603
@jaymode
Copy link
Member

jaymode commented May 16, 2018

Thanks for the background @tvernum. I moved forward with merging #30621 as an interim fix; the idea that cluster state events should be published in a context unrelated to a user makes sense to me. I would have concerns about making such a change in a bugfix (ie 6.3.x) release as there are unknowns as to what this change would affect.

One thing that I'd like to ensure we are on the same page with is that we currently use the term "system context" to mean a context that is used for normal actions (non internal:) that are executed by elasticsearch itself and not due to a specific user action.

@ywelsch ywelsch added the :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. label May 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

jaymode added a commit that referenced this issue May 16, 2018
The TemplateUpgradeService is a system service that allows for plugins
to register templates that need to be upgraded. These template upgrades
should always happen in a system context as they are not a user
initiated action. For security integrations, the lack of running this
in a system context could lead to unexpected failures. The changes in
this commit set an empty system context for the execution of the
template upgrades performed by this service.

Relates #30603
jaymode added a commit that referenced this issue May 16, 2018
The TemplateUpgradeService is a system service that allows for plugins
to register templates that need to be upgraded. These template upgrades
should always happen in a system context as they are not a user
initiated action. For security integrations, the lack of running this
in a system context could lead to unexpected failures. The changes in
this commit set an empty system context for the execution of the
template upgrades performed by this service.

Relates #30603
@jaymode jaymode removed their assignment May 16, 2018
@bleskes
Copy link
Contributor

bleskes commented May 17, 2018

IMO we should make all cluster state handling (appliers + cluster state update tasks) user agnostics - i.e., run with no user context. The reason is that it will take a lot of effort to define and maintain a model here. Cluster state appliers can be called on the master after committing a cluster state, where we can potentially carry things over easily, but they can also be called as a result of an incoming cluster state from the master. Another complication is that both the cluster state update task and the incoming cluster state queue support batching. If we want to make things "correct" we need to make sure that those batches only batch changes that are made by the same (or equivalent) user context. This means we'll need to transfer the user context that have caused a cluster state to change via the publishing mechanism and persist it in the pending cluster state queue. We will also need to change the batching in the cluster state update task executor to be aware of the thread contexts.

All in all - it's really complicated and I would really like to avoid it. So far I didn't hear a very compelling need, so I suggest we go with the model of "all cluster state changes/code runs under the system user". This doesn't mean that cluster state update tasks listeners can't restore the context when they get a response.

ywelsch pushed a commit to ywelsch/elasticsearch that referenced this issue May 23, 2018
The TemplateUpgradeService is a system service that allows for plugins
to register templates that need to be upgraded. These template upgrades
should always happen in a system context as they are not a user
initiated action. For security integrations, the lack of running this
in a system context could lead to unexpected failures. The changes in
this commit set an empty system context for the execution of the
template upgrades performed by this service.

Relates elastic#30603
ywelsch added a commit that referenced this issue Jun 18, 2018
This commit makes it so that cluster state update tasks always run under the system context, only
restoring the original context when the listener that was provided with the task is called. A notable
exception is the clusterStatePublished(...) callback which will still run under system context,
because it's defined on the executor-level, and not the task level, and only called once for the
combined batch of tasks and can therefore not be uniquely identified with a task / thread context.

Relates #30603
ywelsch added a commit that referenced this issue Jun 18, 2018
This commit makes it so that cluster state update tasks always run under the system context, only
restoring the original context when the listener that was provided with the task is called. A notable
exception is the clusterStatePublished(...) callback which will still run under system context,
because it's defined on the executor-level, and not the task level, and only called once for the
combined batch of tasks and can therefore not be uniquely identified with a task / thread context.

Relates #30603
@ywelsch
Copy link
Contributor

ywelsch commented Jun 26, 2018

Closed by #31241

@ywelsch ywelsch closed this as completed Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Plugins Plugin API and infrastructure :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc)
Projects
None yet
Development

No branches or pull requests

6 participants