Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add topology spread constrant option #343

Merged
merged 4 commits into from
Apr 21, 2022

Conversation

locmai
Copy link
Contributor

@locmai locmai commented Apr 18, 2022

What this PR does:

Provide the ability to configure Pod Topology Spread Constraints.

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

locmai added 2 commits April 18, 2022 15:44
Signed-off-by: Loc Mai <locmai0201@gmail.com>
Signed-off-by: Loc Mai <locmai0201@gmail.com>
@@ -468,6 +470,7 @@ ingester:
prometheus.io/port: '8080'

nodeSelector: {}
topologySpreadConstraints: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for reviewers: given that ingesters have a pod disruption budget of 1, would it make sense to revise the default podAntiAffinity config a few lines below this to make it a hard requirement that more than one ingester should never be scheduled to the same node? That behavior could also be implemented using topologySpreadConstraints constructs, which would eliminate the need for the podAntiAffinity config.

This is desirable because presently it is possible to get two pods on one node, and then when node patching happens, the process gets stuck because it can't evict two ingester pods in order to patch the node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you be open to create an seperate PR after this is merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to revise the default podAntiAffinity config a few lines below this to make it a hard requirement that more than one ingester should never be scheduled to the same node?

We do this for our production deployments with an override. We should probably pick a clear strategy for this chart on whether the chart defaults represent good production values or whether they are more tuned for testing. It is helpful to be able to deploy multiple ingester pods to a single node for testing.

Copy link
Collaborator

@nschad nschad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh nice but you have to use helm-docs@1.8.1. apparently they changed some things

https://github.com/norwoodj/helm-docs/releases/tag/v1.8.1

@nschad nschad linked an issue Apr 19, 2022 that may be closed by this pull request
Signed-off-by: Loc Mai <locmai0201@gmail.com>
@locmai locmai requested a review from nschad April 19, 2022 07:31
@locmai
Copy link
Contributor Author

locmai commented Apr 19, 2022

@nschad Thanks for taking a look, I just pushed a new one to solve that.

@nschad nschad requested a review from kd7lxl April 19, 2022 07:40
@locmai
Copy link
Contributor Author

locmai commented Apr 21, 2022

Hi @nschad , are we waiting for @kd7lxl to review this before it could get merged?

Copy link
Collaborator

@kd7lxl kd7lxl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't realize you were waiting on me. Looks good.

@kd7lxl kd7lxl merged commit cb85c4c into cortexproject:master Apr 21, 2022
@nschad nschad mentioned this pull request May 12, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for topology spread constraints
4 participants