Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support shard request cache for queries with DLS and FLS #70191

Merged
merged 56 commits into from
Jun 28, 2021

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Mar 10, 2021

Today the shard request cache is disabled by RequestInterceptor when it detects any DLS/FLS is involved in the query. We'd like to enable the cache with changes proposed by this PR. This is labelled as "draft" because we would like to validate the overall approach first before polishing it. In a nutshell, the changes are as the follows:

  1. Add a new cacheModifier field to ShardSearchRequest.
  2. The ShardSearchRequestInterceptor sets the value of cacheModifier by essentially serialising the DLS queries and FLS field definitions.
  3. The cacheModifier field becomes part of the ShardSearchRequeset#cacheKey method. Hence two otherwise identical queries will be cached with different keys if their cacheModifiers are different.

@tvernum came up with the overall idea and did most of the code changes. I have only done some tweaks so far. But I'll keep working on it and see it through (if we agree on the proposed changes).

Since this is just a draft, there are many things we glossed over. Other than tests and docs, here are a few things that we talked about didn't have time to work on:

  • How does this work with aliases? For example, two aliases pointing to the same concrete index but have different DLS/FLS applied to them.
  • We didn't attempt to normalise the DLS queries and FLS field definitions. Hence it is possible that two functional identical queries would result into different cache entries. The same thing can happen to FLS field definition as well. We generally consider this acceptable because it seems the overall cache stratgey is safety over efficiency.
  • The computation of CacheKey itself may need to be cached if it is proved to be expensive enough
  • We haven't evaluated whether increase the CacheKey size would negatively impact the existing cache
  • Cache efficiency - Queries with DLS/FLS will increase the variety of queries which could lead to higher cache eviction rate or even cache thrashing (thinking templated DLS query). But we are not sure whether the concern is justified because similar issue could already happen today if users send in large number of different queries.

Resolves: #44581

tvernum and others added 4 commits February 23, 2021 18:15
Historically we have disabled the shard request cache when FLS/DLS is
enabled.

This commit changes the behaviour so that caching is enabled, but
takes into account the FLS/DLS context of the request.
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this @ywangd . I left a comment on how we plug the cache modifier. I think we can make it cleaner with a clear extension in our plugins.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywangd Thanks for PR! Exciting development! Nice to see a progress on this! Your approach looks simple and understandable to me. But I wonder if what Jim suggested is a more clean way.

How does this work with aliases? For example, two aliases pointing to the same concrete index but have different DLS/FLS applied to them.

Wouldn't bytes in ShardSearchRequest.cacheModifier be different in this case for each different alias?

@ywangd
Copy link
Member Author

ywangd commented Mar 17, 2021

@elasticmachine update branch

@ywangd
Copy link
Member Author

ywangd commented Mar 21, 2021

@jimczi I updated the PR based on your suggestion of adding a new SearchPlugin extension point. A new class, DlsFlsRequestCacheDifferentiator, is added to support this new extension point. Its code logic is largely a copy/paste from the FieldAndDocumentLevelSecurityRequestInterceptor and also most other changes remain the same. Please let me know whether the new changes look right to you. Once we are in agreement on the high level approach, I'll work on this PR to make it formal. Thanks!

@ywangd
Copy link
Member Author

ywangd commented Mar 22, 2021

I just realised a BWC implication with the approach of SearchPlugin extesion point. The extesion works by hook into the cacheKey computation process without changing the ShardSearchRequest itself. This means, on a node where the change is available, it will send the ShardSearchRequest as is to a remote node. On the remote node, we have three possible scenarios:

  1. The remote node also has changes of this PR. This is the ideal case and everything should just work.
  2. The remote node does not have changes of this PR, but does have changes of Improve shard level request cache efficiency #69505 (v6.8.15 or v7.11.2+). Things will still work because the remote node will intercept the ShardSearchRequest to disable the request cache.
  3. The remote node does not have any of above changes (e.g. v7.11.0). This is where thing will break because the remote node does not intercept the ShardSearchRequest and the caching behaviour will behave as if there is no DLS/FLS.

Senario 3 basically says that we cannot backport changes of this PR into 7.x, because we support 7.x talking to any v7.1+ node and this means we will run into scenario 3. Since we support wire compatibility between current major and last minor of previous major, these changes should be safe with v8.0. Because it should only communicate to 7.last nodes and this is the safe secnario 2.

@tvernum
Copy link
Contributor

tvernum commented Mar 22, 2021

Senario 3 basically says that we cannot backport changes of this PR into 7.x

For non-CCS cases you should be able to work around this with a simple check on the ClusterState (DiscoveryNodes.getMinNodeVersion), this new behaviour should simply be disabled if the cluster has not completely upgraded to the latest a supported version.

I'm not sure how easy it is to do the same for CCS, but we can work something out.

@ywangd
Copy link
Member Author

ywangd commented Mar 22, 2021

Senario 3 basically says that we cannot backport changes of this PR into 7.x

For non-CCS cases you should be able to work around this with a simple check on the ClusterState (DiscoveryNodes.getMinNodeVersion), this new behaviour should simply be disabled if the cluster has not completely upgraded to the latest a supported version.

I'm not sure how easy it is to do the same for CCS, but we can work something out.

Right. I just had another thought on it and found out scenario 3 is not specific to the search plugin extension approach. The same thing can happen with the original approach of adding a cacheModifier to ShardSearchRequest. We need to handle the BWC issue regardless, which means we cannot delete the interceptor code 😞 because it needs to work depending on the minNodeVersion.

I'll comment on the CCS in a separate channel.

@ywangd
Copy link
Member Author

ywangd commented Jun 28, 2021

@tvernum I have updated the PR as discussed, i.e. disable caching for stored script, but otherwise building the cache key using the evaluated script result. The main changes are in DocumentPermissions and ShardSearchRequestInterceptor. Also added some more tests to cover the new scenarios.

@ywangd ywangd requested a review from tvernum June 28, 2021 05:18
public void buildCacheKey(StreamOutput out) throws IOException {
assert false == (queries == null && limitedByQueries == null) : "one of queries and limited-by queries must be non-null";
if (queries != null) {
assert evaluatedQueries != null : "queries are not evaluated";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels wrong. It means that it's only possible to call this method if you have called some other method first.

But in fact there's no reason for the caller to know that they should do that, or to even expect them to be linked.

It might be what we need to do right now, but I'm very reluctant to leave it this way.

The obvious answers are to either:

  1. Inject enough services into the constructor (or a builder) to evaluate the queries immediately
  2. Split this class into 2, so that there's a separate EvaluatedDocumentPermissions class that implements CacheKey.

I don't know if either of those are feasible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly non-ideal. But it is not really feasible to go with either of your options either. I thought about first one but it's cascading changes all over the place and it is questionable to inject ScriptService everywhere. The second option does not completely get rid of the "your must call some other method first" problem. It just moves it to a different place.

I had another idea initially but didn't go with it. But maybe it is a better approach after all: We can modify the buildCacheKey signature to take User and ScriptService. They are not really needed at code level in most places and entirely not needed in practice, that's why I didn't go with it because they seem to be superfluous and I was considering the problem you raised to be less of an issue (e.g. some object needs to be init after instantiation to be really usable). It's a balancing act, right now I am OK with either one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we agree to change the signature of buildCacheKey to pass in a DlsQueryEvaluationContext. Overall this change is the least intrusive.

@ywangd ywangd merged commit 4a8ff0f into elastic:master Jun 28, 2021
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 28, 2021
Shard level request cache is now generally supported for queries with DLS
and/or FLS. The request cache is now enabled following the same rule as a
regular search w/o DLS/FLS except following few scenarios where the request
cache will still be disabled:
1. DLS query uses a stored script
2. The search targets any remote indices
3. The cluster has any nodes older than v7.11.2

It is worth noting that the caching behaviour is overall safety over
efficiency. This means two functional equivalent set of DLS or FLS permissions
can possibly result into different cache entries. We consider this a better
tradeoff due to higher priorities for correctness and security.
ywangd added a commit that referenced this pull request Jun 28, 2021
…4618)

Shard level request cache is now generally supported for queries with DLS
and/or FLS. The request cache is now enabled following the same rule as a
regular search w/o DLS/FLS except following few scenarios where the request
cache will still be disabled:
1. DLS query uses a stored script
2. The search targets any remote indices
3. The cluster has any nodes older than v7.11.2

It is worth noting that the caching behaviour is overall safety over
efficiency. This means two functional equivalent set of DLS or FLS permissions
can possibly result into different cache entries. We consider this a better
tradeoff due to higher priorities for correctness and security.
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 29, 2021
The test can now be re-enabled since the changes from elastic#70191 are
now backported.
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 30, 2021
Shard level request cache is now generally supported after elastic#70191
ywangd added a commit that referenced this pull request Jul 6, 2021
Shard level request cache is now generally supported after #70191
This PR updates the security limitation to state the effect.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 6, 2021
Shard level request cache is now generally supported after elastic#70191
This PR updates the security limitation to state the effect.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 6, 2021
Shard level request cache is now generally supported after elastic#70191
This PR updates the security limitation to state the effect.
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jul 6, 2021
Shard level request cache is now generally supported after elastic#70191
This PR updates the security limitation to state the effect.
elasticsearchmachine added a commit that referenced this pull request Jul 6, 2021
Shard level request cache is now generally supported after #70191
This PR updates the security limitation to state the effect.

Co-authored-by: Yang Wang <yang.wang@elastic.co>
elasticsearchmachine added a commit that referenced this pull request Jul 6, 2021
Shard level request cache is now generally supported after #70191
This PR updates the security limitation to state the effect.

Co-authored-by: Yang Wang <yang.wang@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable shard request cache with DLS and FLS
7 participants