Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster status YELLOW after configuring Security Plugin in single-node clusters #3130

Open
williamtrelawny opened this issue Aug 8, 2023 · 14 comments
Assignees
Labels
bug Something isn't working good first issue These are recommended starting points for newcomers looking to make their first contributions. triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@williamtrelawny
Copy link

What is the bug?
Upon activating the Security plugin in a single node cluster, the cluster status will always be YELLOW because of unassigned shards in 3 different indices:

  • .plugins-ml-config
  • .opensearch-sap-pre-packaged-rules-config
  • .opensearch-sap-log-types-config

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Have single node cluster
  2. Configure Security plugin per instructions in docs.
  3. Observe Yellow cluster status after restarting
  4. Query /_cat/shards API to see unassigned shards

What is the expected behavior?
Default sharding for above indices should be 1 primary / 0 replicas to account for single-node clusters. Or perhaps some degree of intelligent sharding based on cluster size.

What is your host/environment?

  • OS: Debian 11.7
  • Version: Opensearch 2.9
  • Plugins: Security

Do you have any screenshots?

$ curl https://example.org:9200/_cat/shards?v -u admin
Enter host password for user 'admin':
index                                     shard prirep state
.plugins-ml-config                        0     p      STARTED
.plugins-ml-config                        0     r      UNASSIGNED
.opensearch-observability                 0     p      STARTED
.opensearch-sap-pre-packaged-rules-config 0     p      STARTED
.opensearch-sap-pre-packaged-rules-config 0     r      UNASSIGNED
.opensearch-sap-log-types-config          0     p      STARTED
.opensearch-sap-log-types-config          0     r      UNASSIGNED
.opendistro_security                      0     p      STARTED

Do you have any additional context?
Related to the sentiment behind opensearch-project/anomaly-detection#847, that plugins enabled on a single node Opensearch cluster should Just Work and maintain GREEN cluster status.

@williamtrelawny williamtrelawny added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels Aug 8, 2023
@peternied peternied removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Aug 9, 2023
@stephen-crawford
Copy link
Contributor

[Triage] Thank you for filing this issue @williamtrelawny. Looking at this issue, it seems like you have configured index configurations in such a way you have an unassigned shard. The Security plugin does not have any impact on sharding strategies.

Closing this issue.

@stuartwakefield
Copy link

I'm also having this issue with the default setup for running a single node in Docker. The instructions at https://hub.docker.com/r/opensearchproject/opensearch indicate that the following starts up a single node cluster:

$ docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" --name opensearch-node -d opensearchproject/opensearch:latest

However, the cluster remains in "yellow" status:

$ curl -sX GET "https://localhost:9200/_cluster/health" -ku admin:admin | jq -r .status
yellow

Many of the internal indices are being replicated but those replicas are not assigned to nodes, being a single node cluster.

$ curl https://localhost:9200/_cat/shards -ku admin:admin                       
.opensearch-observability                 0 p STARTED     0   208b 172.17.0.3 f9447c29352c
.plugins-ml-config                        0 p STARTED     1  3.8kb 172.17.0.3 f9447c29352c
.plugins-ml-config                        0 r UNASSIGNED                      
.opensearch-sap-pre-packaged-rules-config 0 p STARTED              172.17.0.3 f9447c29352c
.opensearch-sap-pre-packaged-rules-config 0 r UNASSIGNED                      
.opensearch-sap-log-types-config          0 p STARTED              172.17.0.3 f9447c29352c
.opensearch-sap-log-types-config          0 r UNASSIGNED                      
security-auditlog-2023.08.15              0 p STARTED     5 63.7kb 172.17.0.3 f9447c29352c
security-auditlog-2023.08.15              0 r UNASSIGNED                      
.opendistro_security                      0 p STARTED    10 74.8kb 172.17.0.3 f9447c29352c

My understanding of OpenSearch is insufficient for me to be able to configure these indices so that they are not replicated. Whilst I understand it may be possible to alter these indexes after the fact to use no replicas:

$ curl -XPUT https://localhost:9200/security-auditlog-2023.08.15/_settings -H 'Content-Type: application/json' -d '{"index":{"number_of_replicas":0}}' -ku admin:admin
{"acknowledged":true}

However, I have no idea what this will do to the cluster. This strategy also falls down when we try to modify the .plugins-ml-config index, in which I receive the following permissions related error:

{"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"},"status":403}

Ideally, we are looking for a very minimal OpenSearch image we can run integration tests against in automated tests. It may be that I'm unaware of many of the configuration settings that can help, but I'm also struggling to find comprehensive documentation of these things.

Right now we are using an Elasticsearch image instead as a workaround.

@williamtrelawny
Copy link
Author

[Triage] Thank you for filing this issue @williamtrelawny. Looking at this issue, it seems like you have configured index configurations in such a way you have an unassigned shard. The Security plugin does not have any impact on sharding strategies.

Closing this issue.

Please do not close this issue as it is not resolved. I have not made any changes to index parameters, sharding, replication, etc. at all. I have simply installed Openersearch and configured the Security plugin.

For whatever reason, 2 shards are created by default on all deployments of the Security Plugin, regardless of the number of nodes in the cluster.

If the Security Plugin does not affect sharding strategies, then why is it that other default indices not part of the Security Plugin do not have this issue?

Whether the root cause is within the Security Plugin code or not, the issue arises only after initializing the plugin, so from a procedural standpoint the issue does lie here.

And it definitely IS an issue if Opensearch w/ Security Plug-in does not work "out of the box" on a single node.

@davidlago
Copy link

If the Security Plugin does not affect sharding strategies, then why is it that other default indices not part of the Security Plugin do not have this issue?

I'm not sure I follow. The security plugin owns / creates index .opendistro_security , and based on the cluster health printouts above that one does not have unassigned shards. The ones who are suffering from this are others like .plugins-ml-config, opensearch-sap-pre-packaged-rules-config, opensearch-sap-log-types-config and security-auditlog-2023.08.15, none of which are owned by the security plugin.

If I'm understanding correctly, there is a state where the cluster is green and those indices (for example, .plugin-ml-config) are showing with no unallocated shards, and then after a step in the configuration of the security plugin, the problem starts and they start requiring additional shards to be allocated?

If that is the case, it would help a lot to get confirmation that that is the case (i.e. those are green to begin with) and then narrow down to a step in the security setup when these settings unexpectedly change.

@todvora
Copy link

todvora commented Aug 29, 2023

Hello,
I am having the very same issue as well. In my case it's also a single-node cluster (with discovery.type=single-node).

The problem and these indices are not coming from the security plugin but rather originate in opensearch-security-analytics and opensearch-ml plugins. When I remove these plugins, everything is running fine and the cluster status is green, because these 3 indices won't be created.

I think these two plugins should adapt the number_of_replicas when Opensearch is runing in single-instance mode.

Maybe we should move/reopen the issue in other repo(s)?

Thanks!

@dennisoelkers
Copy link

@scrawfor99: I think there was a misunderstanding when this issue was closed. The named indices are created with a replica configuration by default which makes it impossible to get them to green for a single-node setup. For anomaly detection, this was already acknowledged as requiring a change.

@peternied
Copy link
Member

@dennisoelkers Thanks for calling this out - OpenSearch-Project wide we should have a consistent philosophy. We should re-triage this issue with this context in mind

@peternied peternied reopened this Sep 26, 2023
@peternied peternied added the untriaged Require the attention of the repository maintainers and may need to be prioritized label Sep 26, 2023
@peternied
Copy link
Member

Per the cluster health documentation [link] it would suggest that the security plugin should allow for a configuration with no replicas.

OpenSearch expresses cluster health in three colors: green, yellow, and red. A green status means all primary shards and their replicas are allocated to nodes. A yellow status means all primary shards are allocated to nodes, but some replicas aren’t. A red status means at least one primary shard is not allocated to any

Personally, I have a bias towards multi-node clusters as I have experience with single machine sources of failure causing significant impact. As much as I think all clusters should be multi-node, that is a preference and pushing that preference via the cluster health check is not transparent to operators of OpenSearch.

@davidlago davidlago removed the untriaged Require the attention of the repository maintainers and may need to be prioritized label Oct 2, 2023
@stephen-crawford
Copy link
Contributor

[Triage] Given the feedback on this issue, we will make an action item here to allow for 1 node clusters to be green. We will need to change it so that 1 node clusters can be set to 1 so that it is green.

@stephen-crawford stephen-crawford added the triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. label Oct 9, 2023
@peternied peternied added the good first issue These are recommended starting points for newcomers looking to make their first contributions. label Oct 17, 2023
@samuelcostae
Copy link
Contributor

I will start looking into this. Can you assign it to me @scrawfor99 ?

@cthtrifork
Copy link
Contributor

cthtrifork commented Dec 15, 2023

.opensearch-sap-log-types-config also breaks upgrading older versions of OpenSearch (<2.10) to latest OpenSearch on multi clusters. When one of the data nodes are updated (rolling upgrade), it will apply the new index to the node. However the index can not be replicated to the "old nodes" which have not been upgraded. This breaks the upgrade as the STATUS now is YELLOW forever.

We had to do this:

PUT /.opensearch-sap-log-types-config/_settings
{
  "index" : {
    "auto_expand_replicas" : "false",
    "number_of_replicas" : 0
  }
}

To let the status go to GREEN and continue the upgrades of all the data nodes

p.s setting auto_expand_replicas: "0-all" seems unnecessary/aggressive as a default setting??
https://github.com/opensearch-project/security-analytics/blob/main/src/main/java/org/opensearch/securityanalytics/logtype/LogTypeService.java#L448

@nitinjagjivan
Copy link

Is it fixed? Looks good on v2.12.0

@LHozzan
Copy link

LHozzan commented Mar 20, 2024

We using 2.11.1 and I dont see the problem. I think, that problem was fixed in v 2.11.0.

@godber
Copy link

godber commented Jul 11, 2024

This problem appears to still exist in 2.12.0 and is not isolated to single node clusters, I am pretty sure we just saw this on a 20 node cluster when we expanded it to 40 nodes. We saw the following error message:

"explanation": "there are too many copies of the shard allocated to nodes with attribute [host.rack], there are [40] total configured shard copies for this shard id and [7] total attribute values, expected the allocated shard count per attribute [7] to be less than or equal to the upper bound of the required number of shards per attribute [6]"

So the problem is related to rack affinity for shard allocation.

I fixed it by following the advice above, but rather than disabling plugins and setting replicas to 0 I set it to the original number of replicas.

curl -XPUT -H 'Content-Type: application/json' http://es-foo.bar.lan/.opensearch-sap-log-types-config/_settings -d '{ "index" : {  "auto_expand_replicas" : "false", "number_of_replicas" : 20 } }'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue These are recommended starting points for newcomers looking to make their first contributions. triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests