Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark test to compare EvenPodsSpreadPriority and SelectorSpreadingPriority #84606

Merged
merged 1 commit into from
Nov 6, 2019

Conversation

alculquicondor
Copy link
Member

@alculquicondor alculquicondor commented Oct 31, 2019

/kind feature

What this PR does / why we need it:
The added tests exercise EvenPodsSpreadPriority using constraints that have a similar effect to the hardcoded parameters to the SelectorSpreadingPriority algorithm.

Current results:

BenchmarkTestDefaultEvenPodsSpreadPriority/100nodes-56       2000      918838 ns/op
BenchmarkTestDefaultEvenPodsSpreadPriority/1000nodes-56      300	   5137881 ns/op
BenchmarkTestSelectorSpreadingPriority/100nodes-56           10000     181157 ns/op
BenchmarkTestSelectorSpreadingPriority/1000nodes-56          1000      1654661 ns/o

Note that EvenPodsSpreadPriority is rouglhy 5x slower than SelectorSpreadingPriority.

Which issue(s) this PR fixes:
Part of #80639

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 31, 2019
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and k82cn October 31, 2019 15:12
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 31, 2019
@alculquicondor
Copy link
Member Author

/assign @Huang-Wei

@Huang-Wei
Copy link
Member

@alculquicondor There are some things I don't quite follow:

  1. In test BenchmarkTestSelectorSpreadingPriority, it has one selector in the Service; while in BenchmarkTestDefaultEvenPodsSpreadPriority, there are two constraints, hence two selectors. I don't think the result of them is comparable.
  2. It seems we can only define one default global "topologyKey", and then apply it to all selectors? If so, it's not the same semantics of EvenPodsSpead. In EvenPodsSpread's case, users can define different topologyKey on different topologySpreadConstraint.

@alculquicondor
Copy link
Member Author

  1. In test BenchmarkTestSelectorSpreadingPriority, it has one selector in the Service; while in BenchmarkTestDefaultEvenPodsSpreadPriority, there are two constraints, hence two selectors. I don't think the result of them is comparable.

Yes, EvenPodsSpreadPriority has two constraints, but the test is using the same selector for both.

  1. It seems we can only define one default global "topologyKey", and then apply it to all selectors? If so, it's not the same semantics of EvenPodsSpead. In EvenPodsSpread's case, users can define different topologyKey on different topologySpreadConstraint.

SelectorSpreadingPriority uses 2 values: the node name and the zone. That's what we are trying to replicate with the 2 constraints for EvenPodsSpread. We are not concerning about flexibility in this case. We just want to ensure that using those 2 exact constraints, we get similar performance for the same set of nodes and pods.

@Huang-Wei
Copy link
Member

Huang-Wei commented Oct 31, 2019

One reason why SelectorSpreadingPriority is quicker is b/c its map function only counts the matching number for filteredNodes. However, that's inaccurate - we need to take the matching pods on unqualified nodes (i.e., they failed at predicates/filter phase) into consideration.

And in the test, only 10% of the all nodes are qualified nodes. Edited: I misread, it's 100%.

@alculquicondor
Copy link
Member Author

I noticed that in the algorithm. However, we are using 100% in this test, to be more fair.

@Huang-Wei
Copy link
Member

However, we are using 100% in this test, to be more fair.

@alculquicondor I misread the percentage, updated in #84606 (comment)


Essentially, it's not about the percentage of how many nodes are "filtered", it's about whether or not we have a pre-calculation phase for a particular Priority.

If we can skip the pre-calculation phase, for sure it's much faster. In other words, it's a "pure" map-reduce implementable Priority.
Otherwise, we have to run the pre-calculation for all the nodes, then it's slower.

@alculquicondor
Copy link
Member Author

They are different algorithms and SelectorSpreadPriority is skipping calculations by only going through the filtered nodes. However, if you consider the 100% of nodes, they are essentially doing the same calculations.

SelectorSpread does the "precalculation" in Map, whereas EvenPodsSpread does it in metadata. But ultimately overall (at 100% nodes), they are doing the same.

@ahg-g
Copy link
Member

ahg-g commented Nov 4, 2019

I think we can merge the benchmark for now and iterate over it as we try to bridge the gap.

…adPriority

Signed-off-by: Aldo Culquicondor <acondor@google.com>
@alculquicondor
Copy link
Member Author

I just rebased onto the Map/Reduce implementation. Note that I'm including metadata calculation in the benchmarks.

@alculquicondor
Copy link
Member Author

/test pull-kubernetes-e2e-gce-device-plugin-gpu

@alculquicondor
Copy link
Member Author

/test pull-kubernetes-e2e-gce

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 3c4ae1c into kubernetes:master Nov 6, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants