Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: don't call prepare index for reads #34246

Merged
merged 6 commits into from
Oct 16, 2018

Conversation

jaymode
Copy link
Member

@jaymode jaymode commented Oct 2, 2018

The security native stores follow a pattern where
SecurityIndexManager#prepareIndexIfNeededThenExecute wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, checkIndexVersionThenExecute,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

Relates #33205

The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

Relates elastic#33205
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@jaymode
Copy link
Member Author

jaymode commented Oct 15, 2018

@bizybot @tvernum if you have time, can you please take a look at this PR?

Copy link
Contributor

@bizybot bizybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once the comments are addressed. Thank you.

*/
public void checkIndexVersionThenExecute(final Consumer<Exception> consumer, final Runnable andThen) {
final State indexState = this.indexState; // use a local copy so all checks execute against the same state!
if (indexState.indexExists && indexState.isIndexUpToDate == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we are invoking isAvailable() most places before calling checkIndexVersionThenExecute, but I think better to add a note to the public API documentation that the callers need to check or else the runnable will be executed. Unsure whether we should add the check in the checkIndexVersionThenExecute so if someone misses to handle it we throw exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the docs. I do not think we should check this value in the method as isAvailable is only a short circuiting mechanism and the way to handle it depends on the caller.

attemptCount.incrementAndGet();
findTokenFromRefreshToken(refreshToken, listener, attemptCount);
} else if (searchResponse.getHits().getHits().length < 1) {
logger.info("could not find token document with refresh_token [{}]", refreshToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be debug or trace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changed by this PR so I am leaving it as is.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I feel that SecurityIndexManager should have it's own executeAsyncWithOrigin and handle the maybe-create-and-run checks inside.
With the tools at hand, it's good enough for me.

@jaymode jaymode merged commit 0b4e8db into elastic:master Oct 16, 2018
@jaymode jaymode deleted the read_dont_prep_idx branch October 16, 2018 18:49
jaymode added a commit that referenced this pull request Oct 16, 2018
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

Relates #33205
@tvernum
Copy link
Contributor

tvernum commented Oct 17, 2018

Sorry - I did get part way through a review yesterday, but didn't finish it.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a serious concern about the number of places where we're just treating shards not available as "everything is OK".
I didn't finish reviewing, since it seems to be a pervasive problem that we need to reach a concensus on.

@@ -118,16 +118,15 @@ public void getUsers(String[] userNames, final ActionListener<Collection<User>>
}
};

if (securityIndex.indexExists() == false) {
// TODO remove this short circuiting and fix tests that fail without this!
if (securityIndex.isAvailable() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong.
If a primary shard is unavailable, then GET _xpack/security/users will return an empty list rather than an error.

@@ -155,10 +154,10 @@ public void getUsers(String[] userNames, final ActionListener<Collection<User>>
}

void getUserCount(final ActionListener<Long> listener) {
if (securityIndex.indexExists() == false) {
if (securityIndex.isAvailable() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

@@ -182,11 +181,10 @@ public void onFailure(Exception e) {
* Async method to retrieve a user and their password
*/
private void getUserAndPassword(final String user, final ActionListener<UserAndPassword> listener) {
if (securityIndex.indexExists() == false) {
// TODO remove this short circuiting and fix tests that fail without this!
if (securityIndex.isAvailable() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK (since the onFailure below returns null on error) but it means we lose any logging.

@@ -459,24 +457,28 @@ public void onFailure(Exception e) {
}

public void deleteUser(final DeleteUserRequest deleteUserRequest, final ActionListener<Boolean> listener) {
securityIndex.prepareIndexIfNeededThenExecute(listener::onFailure, () -> {
DeleteRequest request = client.prepareDelete(SECURITY_INDEX_NAME,
if (securityIndex.isAvailable() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, this is incorrect too.
We claim the user doesn't exist and therefore doesn't need to be deleted, but we don't know that.

@@ -498,11 +500,10 @@ void verifyPassword(String username, final SecureString password, ActionListener
}

void getReservedUserInfo(String username, ActionListener<ReservedUserInfo> listener) {
if (securityIndex.indexExists() == false) {
// TODO remove this short circuiting and fix tests that fail without this!
if (securityIndex.isAvailable() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this is incorrect, since we have explicit isShardNotAvailableException handling below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look, I think this is definitely a problem - it means if a primary shard is missing, then reserved users revert back to their default state. Disabled users would become enabled again and elastic would revert to accepting the bootstrap password.

executeAsyncWithOrigin(client.threadPool().getThreadContext(), SECURITY_ORIGIN,
client.prepareSearch(SECURITY_INDEX_NAME)
if (securityIndex.isAvailable() == false) {
listener.onResponse(Collections.emptyMap());
Copy link
Contributor

@tvernum tvernum Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this would cause the list of resevered users to revert to their default enabled state if a shard is unavailable. I don't think we should do that.

client.prepareDelete(SECURITY_INDEX_NAME, SECURITY_GENERIC_TYPE, getIdForName(request.getName()))
private void innerDeleteMapping(DeleteRoleMappingRequest request, ActionListener<Boolean> listener) {
if (securityIndex.isAvailable() == false) {
listener.onResponse(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise. I don't think we can claim that the mapping doesn't exist just because there are unavailable shards.

@@ -88,13 +88,15 @@ public NativePrivilegeStore(Settings settings, Client client, SecurityIndexManag

public void getPrivileges(Collection<String> applications, Collection<String> names,
ActionListener<Collection<ApplicationPrivilegeDescriptor>> listener) {
if (applications != null && applications.size() == 1 && names != null && names.size() == 1) {
if (securityIndexManager.isAvailable() == false) {
listener.onResponse(Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

@jaymode
Copy link
Member Author

jaymode commented Oct 17, 2018

@tvernum your concerns make sense and I think we can finally address some of these. The lack of a role retrieval result and authentication result prevented some of these changes previously. I will look at addressing your comments tomorrow. Do you mind finishing looking at this and leaving any other comments that you have?

if (securityIndex.isAvailable() == false) {
logger.debug("security index is not available to find token from refresh token, retrying");
attemptCount.incrementAndGet();
findTokenFromRefreshToken(refreshToken, listener, attemptCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, we will retry even if the security index doesn't exist, which seems unnecessary, although unlikely in practice - why would we have a refresh token but no security index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security index could have been deleted after generation of a refresh token?

final Instant now = clock.instant();
final BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
} else if (securityIndex.isAvailable() == false) {
listener.onResponse(Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to onFailure if there are unavailable shards, rather than treating it as "no matching tokens"

@@ -498,11 +500,10 @@ void verifyPassword(String username, final SecureString password, ActionListener
}

void getReservedUserInfo(String username, ActionListener<ReservedUserInfo> listener) {
if (securityIndex.indexExists() == false) {
// TODO remove this short circuiting and fix tests that fail without this!
if (securityIndex.isAvailable() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look, I think this is definitely a problem - it means if a primary shard is missing, then reserved users revert back to their default state. Disabled users would become enabled again and elastic would revert to accepting the bootstrap password.

jaymode added a commit that referenced this pull request Oct 17, 2018
This reverts commit 0b4e8db as some
issues have been identified with the changed handling of a primary
shard of the security index not being available.
jaymode added a commit that referenced this pull request Oct 17, 2018
This reverts commit 9e3e7e1 as some
issues have been identified with the changed handling of a primary
shard of the security index not being available.
@jaymode
Copy link
Member Author

jaymode commented Oct 17, 2018

I pushed 46c7b5e to revert this on master and 65eeced on 6.x. As discussed, I will open a new PR with fixes.

jaymode added a commit to jaymode/elasticsearch that referenced this pull request Oct 17, 2018
matriv pushed a commit to matriv/elasticsearch that referenced this pull request Oct 22, 2018
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

This is the fixed version of elastic#34246, which was reverted.

Relates elastic#33205
jaymode added a commit that referenced this pull request Oct 22, 2018
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

This is the fixed version of #34246, which was reverted.

Relates #33205
kcm pushed a commit that referenced this pull request Oct 30, 2018
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

Relates #33205
kcm pushed a commit that referenced this pull request Oct 30, 2018
This reverts commit 0b4e8db as some
issues have been identified with the changed handling of a primary
shard of the security index not being available.
kcm pushed a commit that referenced this pull request Oct 30, 2018
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.

This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.

Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.

This is the fixed version of #34246, which was reverted.

Relates #33205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Security/Security Security issues without another label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants