Security: don't call prepare index for reads #34246

jaymode · 2018-10-02T20:02:19Z

The security native stores follow a pattern where
SecurityIndexManager#prepareIndexIfNeededThenExecute wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, checkIndexVersionThenExecute,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

Relates #33205

The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. Relates elastic#33205

elasticmachine · 2018-10-02T20:02:21Z

Pinging @elastic/es-security

jaymode · 2018-10-15T20:24:08Z

@bizybot @tvernum if you have time, can you please take a look at this PR?

bizybot

LGTM, once the comments are addressed. Thank you.

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

bizybot · 2018-10-16T13:05:06Z

...in/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityIndexManager.java

+     */
+    public void checkIndexVersionThenExecute(final Consumer<Exception> consumer, final Runnable andThen) {
+        final State indexState = this.indexState; // use a local copy so all checks execute against the same state!
+        if (indexState.indexExists && indexState.isIndexUpToDate == false) {


I see we are invoking isAvailable() most places before calling checkIndexVersionThenExecute, but I think better to add a note to the public API documentation that the callers need to check or else the runnable will be executed. Unsure whether we should add the check in the checkIndexVersionThenExecute so if someone misses to handle it we throw exception.

I added it to the docs. I do not think we should check this value in the method as isAvailable is only a short circuiting mechanism and the way to handle it depends on the caller.

bizybot · 2018-10-16T13:16:51Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

+                                attemptCount.incrementAndGet();
+                                findTokenFromRefreshToken(refreshToken, listener, attemptCount);
+                            } else if (searchResponse.getHits().getHits().length < 1) {
+                                logger.info("could not find token document with refresh_token [{}]", refreshToken);


may be debug or trace?

This is not changed by this PR so I am leaving it as is.

albertzaharovits

LGTM
I feel that SecurityIndexManager should have it's own executeAsyncWithOrigin and handle the maybe-create-and-run checks inside.
With the tools at hand, it's good enough for me.

The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. Relates #33205

tvernum · 2018-10-17T00:49:44Z

Sorry - I did get part way through a review yesterday, but didn't finish it.

tvernum

I have a serious concern about the number of places where we're just treating shards not available as "everything is OK".
I didn't finish reviewing, since it seems to be a pervasive problem that we need to reach a concensus on.

tvernum · 2018-10-17T00:54:19Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

@@ -118,16 +118,15 @@ public void getUsers(String[] userNames, final ActionListener<Collection<User>>
            }
        };

-        if (securityIndex.indexExists() == false) {
-            // TODO remove this short circuiting and fix tests that fail without this!
+        if (securityIndex.isAvailable() == false) {


I think this is wrong.
If a primary shard is unavailable, then GET _xpack/security/users will return an empty list rather than an error.

tvernum · 2018-10-17T00:55:14Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

@@ -155,10 +154,10 @@ public void getUsers(String[] userNames, final ActionListener<Collection<User>>
    }

    void getUserCount(final ActionListener<Long> listener) {
-        if (securityIndex.indexExists() == false) {
+        if (securityIndex.isAvailable() == false) {


tvernum · 2018-10-17T00:57:28Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

@@ -182,11 +181,10 @@ public void onFailure(Exception e) {
     * Async method to retrieve a user and their password
     */
    private void getUserAndPassword(final String user, final ActionListener<UserAndPassword> listener) {
-        if (securityIndex.indexExists() == false) {
-            // TODO remove this short circuiting and fix tests that fail without this!
+        if (securityIndex.isAvailable() == false) {


This is OK (since the onFailure below returns null on error) but it means we lose any logging.

tvernum · 2018-10-17T00:58:53Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

@@ -459,24 +457,28 @@ public void onFailure(Exception e) {
    }

    public void deleteUser(final DeleteUserRequest deleteUserRequest, final ActionListener<Boolean> listener) {
-        securityIndex.prepareIndexIfNeededThenExecute(listener::onFailure, () -> {
-            DeleteRequest request = client.prepareDelete(SECURITY_INDEX_NAME,
+        if (securityIndex.isAvailable() == false) {


Arguably, this is incorrect too.
We claim the user doesn't exist and therefore doesn't need to be deleted, but we don't know that.

tvernum · 2018-10-17T00:59:35Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

@@ -498,11 +500,10 @@ void verifyPassword(String username, final SecureString password, ActionListener
    }

    void getReservedUserInfo(String username, ActionListener<ReservedUserInfo> listener) {
-        if (securityIndex.indexExists() == false) {
-            // TODO remove this short circuiting and fix tests that fail without this!
+        if (securityIndex.isAvailable() == false) {


I also think this is incorrect, since we have explicit isShardNotAvailableException handling below.

On second look, I think this is definitely a problem - it means if a primary shard is missing, then reserved users revert back to their default state. Disabled users would become enabled again and elastic would revert to accepting the bootstrap password.

tvernum · 2018-10-17T01:00:08Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

-            executeAsyncWithOrigin(client.threadPool().getThreadContext(), SECURITY_ORIGIN,
-                client.prepareSearch(SECURITY_INDEX_NAME)
+        if (securityIndex.isAvailable() == false) {
+            listener.onResponse(Collections.emptyMap());


As above, this would cause the list of resevered users to revert to their default enabled state if a shard is unavailable. I don't think we should do that.

tvernum · 2018-10-17T01:01:21Z

.../main/java/org/elasticsearch/xpack/security/authc/support/mapper/NativeRoleMappingStore.java

-                client.prepareDelete(SECURITY_INDEX_NAME, SECURITY_GENERIC_TYPE, getIdForName(request.getName()))
+    private void innerDeleteMapping(DeleteRoleMappingRequest request, ActionListener<Boolean> listener) {
+        if (securityIndex.isAvailable() == false) {
+            listener.onResponse(false);


Likewise. I don't think we can claim that the mapping doesn't exist just because there are unavailable shards.

tvernum · 2018-10-17T01:01:37Z

...ecurity/src/main/java/org/elasticsearch/xpack/security/authz/store/NativePrivilegeStore.java

@@ -88,13 +88,15 @@ public NativePrivilegeStore(Settings settings, Client client, SecurityIndexManag

    public void getPrivileges(Collection<String> applications, Collection<String> names,
                              ActionListener<Collection<ApplicationPrivilegeDescriptor>> listener) {
-        if (applications != null && applications.size() == 1 && names != null && names.size() == 1) {
+        if (securityIndexManager.isAvailable() == false) {
+            listener.onResponse(Collections.emptyList());


jaymode · 2018-10-17T01:37:10Z

@tvernum your concerns make sense and I think we can finally address some of these. The lack of a role retrieval result and authentication result prevented some of these changes previously. I will look at addressing your comments tomorrow. Do you mind finishing looking at this and leaving any other comments that you have?

tvernum · 2018-10-17T10:15:09Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

+            if (securityIndex.isAvailable() == false) {
+                logger.debug("security index is not available to find token from refresh token, retrying");
+                attemptCount.incrementAndGet();
+                findTokenFromRefreshToken(refreshToken, listener, attemptCount);


If I read this correctly, we will retry even if the security index doesn't exist, which seems unnecessary, although unlikely in practice - why would we have a refresh token but no security index?

security index could have been deleted after generation of a refresh token?

tvernum · 2018-10-17T10:18:53Z

x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java

-        final Instant now = clock.instant();
-        final BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
+        } else if (securityIndex.isAvailable() == false) {
+            listener.onResponse(Collections.emptyList());


I think it would be better to onFailure if there are unavailable shards, rather than treating it as "no matching tokens"

tvernum · 2018-10-17T10:29:01Z

...security/src/main/java/org/elasticsearch/xpack/security/authc/esnative/NativeUsersStore.java

@@ -498,11 +500,10 @@ void verifyPassword(String username, final SecureString password, ActionListener
    }

    void getReservedUserInfo(String username, ActionListener<ReservedUserInfo> listener) {
-        if (securityIndex.indexExists() == false) {
-            // TODO remove this short circuiting and fix tests that fail without this!
+        if (securityIndex.isAvailable() == false) {


On second look, I think this is definitely a problem - it means if a primary shard is missing, then reserved users revert back to their default state. Disabled users would become enabled again and elastic would revert to accepting the bootstrap password.

This reverts commit 0b4e8db as some issues have been identified with the changed handling of a primary shard of the security index not being available.

This reverts commit 9e3e7e1 as some issues have been identified with the changed handling of a primary shard of the security index not being available.

jaymode · 2018-10-17T17:00:29Z

I pushed 46c7b5e to revert this on master and 65eeced on 6.x. As discussed, I will open a new PR with fixes.

…#34246)"" This reverts commit 46c7b5e.

The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. This is the fixed version of elastic#34246, which was reverted. Relates elastic#33205

The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. This is the fixed version of #34246, which was reverted. Relates #33205

The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. Relates #33205

This reverts commit 0b4e8db as some issues have been identified with the changed handling of a primary shard of the security index not being available.

The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. This is the fixed version of #34246, which was reverted. Relates #33205

jaymode added >enhancement v7.0.0 :Security/Security Security issues without another label v6.5.0 labels Oct 2, 2018

jaymode requested review from bizybot, tvernum and albertzaharovits October 2, 2018 20:02

jaymode added 2 commits October 15, 2018 10:37

Merge branch 'master' into read_dont_prep_idx

97f5fcc

Merge branch 'master' into read_dont_prep_idx

42ee9af

bizybot approved these changes Oct 16, 2018

View reviewed changes

jaymode added 3 commits October 16, 2018 08:32

Merge branch 'master' into read_dont_prep_idx

facc0fc

add comment about why we don't use isAvailable

5f50fa2

javadoc about isAvailable being left to caller

6e9247c

albertzaharovits approved these changes Oct 16, 2018

View reviewed changes

jaymode merged commit 0b4e8db into elastic:master Oct 16, 2018

jaymode deleted the read_dont_prep_idx branch October 16, 2018 18:49

tvernum reviewed Oct 17, 2018

View reviewed changes

jaymode removed >enhancement v6.5.0 v7.0.0 labels Oct 17, 2018

jaymode added a commit to jaymode/elasticsearch that referenced this pull request Oct 17, 2018

Revert "Revert "Security: don't call prepare index for reads (elastic…

37501e7

…#34246)"" This reverts commit 46c7b5e.

jaymode mentioned this pull request Oct 17, 2018

Security: don't call prepare index for reads #34568

Merged

tomcallahan added the >non-issue label Oct 20, 2018

kcm pushed a commit that referenced this pull request Oct 30, 2018

Revert "Security: don't call prepare index for reads (#34246)"

751ef1e

This reverts commit 0b4e8db as some issues have been identified with the changed handling of a primary shard of the security index not being available.

jaymode mentioned this pull request Jan 15, 2019

Reduce the need to reload roles from index #33205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: don't call prepare index for reads #34246

Security: don't call prepare index for reads #34246

jaymode commented Oct 2, 2018

elasticmachine commented Oct 2, 2018

jaymode commented Oct 15, 2018

bizybot left a comment

bizybot Oct 16, 2018

jaymode Oct 16, 2018

bizybot Oct 16, 2018

jaymode Oct 16, 2018

albertzaharovits left a comment

tvernum commented Oct 17, 2018

tvernum left a comment

tvernum Oct 17, 2018

tvernum Oct 17, 2018

tvernum Oct 17, 2018

tvernum Oct 17, 2018

tvernum Oct 17, 2018

tvernum Oct 17, 2018

tvernum Oct 17, 2018 •

edited

Loading

tvernum Oct 17, 2018

tvernum Oct 17, 2018

jaymode commented Oct 17, 2018

tvernum Oct 17, 2018

jaymode Oct 17, 2018

tvernum Oct 17, 2018

tvernum Oct 17, 2018

jaymode commented Oct 17, 2018

Security: don't call prepare index for reads #34246

Security: don't call prepare index for reads #34246

Conversation

jaymode commented Oct 2, 2018

elasticmachine commented Oct 2, 2018

jaymode commented Oct 15, 2018

bizybot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albertzaharovits left a comment

Choose a reason for hiding this comment

tvernum commented Oct 17, 2018

tvernum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvernum Oct 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaymode commented Oct 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaymode commented Oct 17, 2018

tvernum Oct 17, 2018 •

edited

Loading