-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security: don't call prepare index for reads #34246
Conversation
The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. Relates elastic#33205
Pinging @elastic/es-security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, once the comments are addressed. Thank you.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
*/ | ||
public void checkIndexVersionThenExecute(final Consumer<Exception> consumer, final Runnable andThen) { | ||
final State indexState = this.indexState; // use a local copy so all checks execute against the same state! | ||
if (indexState.indexExists && indexState.isIndexUpToDate == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we are invoking isAvailable()
most places before calling checkIndexVersionThenExecute
, but I think better to add a note to the public API documentation that the callers need to check or else the runnable will be executed. Unsure whether we should add the check in the checkIndexVersionThenExecute
so if someone misses to handle it we throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it to the docs. I do not think we should check this value in the method as isAvailable is only a short circuiting mechanism and the way to handle it depends on the caller.
attemptCount.incrementAndGet(); | ||
findTokenFromRefreshToken(refreshToken, listener, attemptCount); | ||
} else if (searchResponse.getHits().getHits().length < 1) { | ||
logger.info("could not find token document with refresh_token [{}]", refreshToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be debug or trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not changed by this PR so I am leaving it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I feel that SecurityIndexManager
should have it's own executeAsyncWithOrigin
and handle the maybe-create-and-run checks inside.
With the tools at hand, it's good enough for me.
The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. Relates #33205
Sorry - I did get part way through a review yesterday, but didn't finish it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a serious concern about the number of places where we're just treating shards not available as "everything is OK".
I didn't finish reviewing, since it seems to be a pervasive problem that we need to reach a concensus on.
@@ -118,16 +118,15 @@ public void getUsers(String[] userNames, final ActionListener<Collection<User>> | |||
} | |||
}; | |||
|
|||
if (securityIndex.indexExists() == false) { | |||
// TODO remove this short circuiting and fix tests that fail without this! | |||
if (securityIndex.isAvailable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong.
If a primary shard is unavailable, then GET _xpack/security/users
will return an empty list rather than an error.
@@ -155,10 +154,10 @@ public void getUsers(String[] userNames, final ActionListener<Collection<User>> | |||
} | |||
|
|||
void getUserCount(final ActionListener<Long> listener) { | |||
if (securityIndex.indexExists() == false) { | |||
if (securityIndex.isAvailable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
@@ -182,11 +181,10 @@ public void onFailure(Exception e) { | |||
* Async method to retrieve a user and their password | |||
*/ | |||
private void getUserAndPassword(final String user, final ActionListener<UserAndPassword> listener) { | |||
if (securityIndex.indexExists() == false) { | |||
// TODO remove this short circuiting and fix tests that fail without this! | |||
if (securityIndex.isAvailable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK (since the onFailure below returns null on error) but it means we lose any logging.
@@ -459,24 +457,28 @@ public void onFailure(Exception e) { | |||
} | |||
|
|||
public void deleteUser(final DeleteUserRequest deleteUserRequest, final ActionListener<Boolean> listener) { | |||
securityIndex.prepareIndexIfNeededThenExecute(listener::onFailure, () -> { | |||
DeleteRequest request = client.prepareDelete(SECURITY_INDEX_NAME, | |||
if (securityIndex.isAvailable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably, this is incorrect too.
We claim the user doesn't exist and therefore doesn't need to be deleted, but we don't know that.
@@ -498,11 +500,10 @@ void verifyPassword(String username, final SecureString password, ActionListener | |||
} | |||
|
|||
void getReservedUserInfo(String username, ActionListener<ReservedUserInfo> listener) { | |||
if (securityIndex.indexExists() == false) { | |||
// TODO remove this short circuiting and fix tests that fail without this! | |||
if (securityIndex.isAvailable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this is incorrect, since we have explicit isShardNotAvailableException
handling below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look, I think this is definitely a problem - it means if a primary shard is missing, then reserved users revert back to their default state. Disabled users would become enabled again and elastic
would revert to accepting the bootstrap password.
executeAsyncWithOrigin(client.threadPool().getThreadContext(), SECURITY_ORIGIN, | ||
client.prepareSearch(SECURITY_INDEX_NAME) | ||
if (securityIndex.isAvailable() == false) { | ||
listener.onResponse(Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, this would cause the list of resevered users to revert to their default enabled state if a shard is unavailable. I don't think we should do that.
client.prepareDelete(SECURITY_INDEX_NAME, SECURITY_GENERIC_TYPE, getIdForName(request.getName())) | ||
private void innerDeleteMapping(DeleteRoleMappingRequest request, ActionListener<Boolean> listener) { | ||
if (securityIndex.isAvailable() == false) { | ||
listener.onResponse(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise. I don't think we can claim that the mapping doesn't exist just because there are unavailable shards.
@@ -88,13 +88,15 @@ public NativePrivilegeStore(Settings settings, Client client, SecurityIndexManag | |||
|
|||
public void getPrivileges(Collection<String> applications, Collection<String> names, | |||
ActionListener<Collection<ApplicationPrivilegeDescriptor>> listener) { | |||
if (applications != null && applications.size() == 1 && names != null && names.size() == 1) { | |||
if (securityIndexManager.isAvailable() == false) { | |||
listener.onResponse(Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise.
@tvernum your concerns make sense and I think we can finally address some of these. The lack of a role retrieval result and authentication result prevented some of these changes previously. I will look at addressing your comments tomorrow. Do you mind finishing looking at this and leaving any other comments that you have? |
if (securityIndex.isAvailable() == false) { | ||
logger.debug("security index is not available to find token from refresh token, retrying"); | ||
attemptCount.incrementAndGet(); | ||
findTokenFromRefreshToken(refreshToken, listener, attemptCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this correctly, we will retry even if the security index doesn't exist, which seems unnecessary, although unlikely in practice - why would we have a refresh token but no security index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security index could have been deleted after generation of a refresh token?
final Instant now = clock.instant(); | ||
final BoolQueryBuilder boolQuery = QueryBuilders.boolQuery() | ||
} else if (securityIndex.isAvailable() == false) { | ||
listener.onResponse(Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to onFailure
if there are unavailable shards, rather than treating it as "no matching tokens"
@@ -498,11 +500,10 @@ void verifyPassword(String username, final SecureString password, ActionListener | |||
} | |||
|
|||
void getReservedUserInfo(String username, ActionListener<ReservedUserInfo> listener) { | |||
if (securityIndex.indexExists() == false) { | |||
// TODO remove this short circuiting and fix tests that fail without this! | |||
if (securityIndex.isAvailable() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look, I think this is definitely a problem - it means if a primary shard is missing, then reserved users revert back to their default state. Disabled users would become enabled again and elastic
would revert to accepting the bootstrap password.
This reverts commit 0b4e8db as some issues have been identified with the changed handling of a primary shard of the security index not being available.
This reverts commit 9e3e7e1 as some issues have been identified with the changed handling of a primary shard of the security index not being available.
The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. This is the fixed version of elastic#34246, which was reverted. Relates elastic#33205
The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. This is the fixed version of #34246, which was reverted. Relates #33205
The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. Relates #33205
This reverts commit 0b4e8db as some issues have been identified with the changed handling of a primary shard of the security index not being available.
The security native stores follow a pattern where `SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls made for the security index. The reasoning behind this was to check if the security index had been upgraded to the latest version in a consistent manner. However, this has the potential side effect that a read will trigger the creation of the security index or an updating of its mappings, which can lead to issues such as failures due to put mapping requests timing out even though we might have been able to read from the index and get the data necessary. This change introduces a new method, `checkIndexVersionThenExecute`, that provides the consistent checking of the security index to make sure it has been upgraded. That is the only check that this method performs prior to running the passed in operation, which removes the possible triggering of index creation and mapping updates for reads. Additionally, areas where we do reads now check the availability of the security index and can short circuit requests. Availability in this context means that the index exists and all primaries are active. This is the fixed version of #34246, which was reverted. Relates #33205
The security native stores follow a pattern where
SecurityIndexManager#prepareIndexIfNeededThenExecute
wraps most callsmade for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method,
checkIndexVersionThenExecute
,that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
Relates #33205