Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add topologySpreadConstraints #2091

Conversation

jbhalodia-slack
Copy link
Contributor

@jbhalodia-slack jbhalodia-slack commented Jul 22, 2024

Purpose of this PR

Its good to spread the Spark Operator pods across the cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.

Proposed changes:

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Production workloads should use enable topologySpreadConstraints to make sure their workloads are running in HA and are resilient to node or AZ specific failures.

Checklist

Before submitting your PR, please review the following:

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

Github Issue: #2086
Slack Thread: https://cloud-native.slack.com/archives/C074588U7EG/p1721240818494049

ChenYi015 and others added 4 commits July 22, 2024 12:30
* Update docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove docs and update README

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

* PR comments

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

---------

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
@jbhalodia-slack jbhalodia-slack force-pushed the jigar/set-topologySpreadConstraints branch from 0728f55 to e119dcd Compare July 22, 2024 16:30
@google-oss-prow google-oss-prow bot added size/XXL and removed size/L labels Jul 22, 2024
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
@jbhalodia-slack jbhalodia-slack force-pushed the jigar/set-topologySpreadConstraints branch from dc1427c to 2c4b7d2 Compare July 22, 2024 17:28
@google-oss-prow google-oss-prow bot added size/M and removed size/L labels Jul 22, 2024
@jbhalodia-slack jbhalodia-slack changed the title Set topologySpreadConstraints Add topologySpreadConstraints Jul 22, 2024
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
@jbhalodia-slack jbhalodia-slack force-pushed the jigar/set-topologySpreadConstraints branch from 0c0ba32 to 00a26df Compare July 22, 2024 17:55
@@ -17,21 +17,22 @@ tests:

- it: Should render spark operator podDisruptionBudget if podDisruptionBudget.enable is true
set:
replicaCount: 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PDB tests were failing on the master branch so these are fixes to get them to pass.

@jbhalodia-slack
Copy link
Contributor Author

Hi @vara-bonthu @andreyvelich @ChenYi015 @yuchaoran2011, could you please review this PR? 🙇‍♂️

Copy link
Contributor

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jbhalodia-slack
/approve

@yuchaoran2011 @ChenYi015 Please review

Copy link
Contributor

@ChenYi015 ChenYi015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
/lgtm

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015, vara-bonthu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ChenYi015,vara-bonthu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 4108f54 into kubeflow:master Jul 26, 2024
7 checks passed
ChenYi015 pushed a commit to ChenYi015/spark-operator that referenced this pull request Aug 1, 2024
* Update README and documentation (kubeflow#2047)

* Update docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove docs and update README

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add PodDisruptionBudget to chart (kubeflow#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

* PR comments

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

---------

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README after master merger

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
(cherry picked from commit 4108f54)
google-oss-prow bot pushed a commit that referenced this pull request Aug 1, 2024
* Update helm docs (#2081)

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
(cherry picked from commit eca3fc8)

* Update the process to build api-docs, generate CRD manifests and code (#2046)

* Update .gitignore

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update .dockerignore

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update Makefile

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update the process to generate api docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update the workflow to generate api docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Use controller-gen to generate CRD and deep copy related methods

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update helm chart CRDs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update workflow for building spark operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update README.md

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 779ea3d)

* Add topologySpreadConstraints (#2091)

* Update README and documentation (#2047)

* Update docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove docs and update README

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add PodDisruptionBudget to chart (#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

* PR comments

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

---------

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README after master merger

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
(cherry picked from commit 4108f54)

* Use controller-runtime to reconsturct spark operator (#2072)

* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update helm charts

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update examples

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 0dc641b)

---------

Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Co-authored-by: jbhalodia-slack <jbhalodia@salesforce.com>
YanivKunda pushed a commit to YanivKunda/spark-operator that referenced this pull request Aug 5, 2024
* Update README and documentation (kubeflow#2047)

* Update docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove docs and update README

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add PodDisruptionBudget to chart (kubeflow#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

* PR comments

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

---------

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README after master merger

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
sigmarkarl pushed a commit to spotinst/spark-on-k8s-operator that referenced this pull request Aug 7, 2024
* Update README and documentation (kubeflow#2047)

* Update docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove docs and update README

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add PodDisruptionBudget to chart (kubeflow#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

* PR comments

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

---------

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README after master merger

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
jbhalodia-slack added a commit to jbhalodia-slack/spark-operator that referenced this pull request Oct 4, 2024
…ubeflow#2108)

* Update helm docs (kubeflow#2081)

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
(cherry picked from commit eca3fc8)

* Update the process to build api-docs, generate CRD manifests and code (kubeflow#2046)

* Update .gitignore

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update .dockerignore

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update Makefile

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update the process to generate api docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update the workflow to generate api docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Use controller-gen to generate CRD and deep copy related methods

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update helm chart CRDs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update workflow for building spark operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update README.md

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 779ea3d)

* Add topologySpreadConstraints (kubeflow#2091)

* Update README and documentation (kubeflow#2047)

* Update docs

Signed-off-by: Yi Chen <github@chenyicn.net>

* Remove docs and update README

Signed-off-by: Yi Chen <github@chenyicn.net>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Add PodDisruptionBudget to chart (kubeflow#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

* PR comments

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>

---------

Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README after master merger

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

* Update README

Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
Signed-off-by: jbhalodia-slack <jbhalodia@salesforce.com>
Signed-off-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Signed-off-by: Carlos Sánchez Páez <sanchezpaezcarlos33@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
(cherry picked from commit 4108f54)

* Use controller-runtime to reconsturct spark operator (kubeflow#2072)

* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update helm charts

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update examples

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>
(cherry picked from commit 0dc641b)

---------

Co-authored-by: Carlos Sánchez Páez <karlossanpa@gmail.com>
Co-authored-by: jbhalodia-slack <jbhalodia@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants