Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add configurable throughput for clients #91

Closed
wants to merge 2 commits into from

Conversation

sonasingh46
Copy link

@sonasingh46 sonasingh46 commented Aug 27, 2021

Signed-off-by: Ashutosh Kumar sonasingh46@gmail.com

What type of PR is this?

/kind bug

What this PR does / why we need it:
This change adds two new flags "kube-api-qps" and "kube-api-burst" to configure the QPS and Burst of clients to the API server as the default value is not effective at larger scale.
This change also uses a separate client for leader election go routine.
This change also replace the native go flag library with github.com/spf13/pflag
Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

None

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 27, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @sonasingh46!

It looks like this is your first PR to kubernetes-csi/external-health-monitor 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-health-monitor has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @sonasingh46. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 27, 2021
@xing-yang
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2021
@xing-yang
Copy link
Contributor

Please add a release note.


retryIntervalStart = flag.Duration("retry-interval-start", time.Second, "Initial retry interval of failed pv monitoring. It doubles with each failure, up to retry-interval-max. Default is 1 second.")
retryIntervalMax = flag.Duration("retry-interval-max", 5*time.Minute, "Maximum retry interval of failed pv monitoring. Default is 5 minutes.")
kubeAPIQPS = flag.Float32("kube-api-qps", 5, "QPS to use while communicating with the kubernetes apiserver. Defaults to 5.0.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are trying to decrease the API call frequency, I wonder if we should set the default lower than 5.0.
@pohly @msau42 what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. You would have to run a scalability experiment to determine which default works better. Also note that client-side throttling is being replaced by API priority and fairness.

IMHO more important than tweaks like this is to look at why health monitor causes API traffic. Wasn't it because it establishes watches for objects that change a lot (nodes, pods)? Then throttling won't help at all because it is applied to outbound requests, not incoming watch updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API priority and fairness is now beta. Have you run any experiments that can demonstrate that this change is still effective?

@xing-yang
Copy link
Contributor

xing-yang commented Aug 27, 2021

Can you make the following workqueue ratelimited as well? Also check if there's anything else in the NodeWatcher that should be configurable.
https://github.com/kubernetes-csi/external-health-monitor/blob/v0.4.0/pkg/controller/node_watcher.go#L86

@sonasingh46
Copy link
Author

sonasingh46 commented Aug 28, 2021

@xing-yang

Can you make the following workqueue ratelimited as well? Also check if there's anything else in the NodeWatcher that should be configurable.

Sure, raised a commit for this. I do not see anything more configurable in the node watcher. Though there is DefaultNodeNotReadyTimeDuration flag but I am not sure if this should be configurable.
https://github.com/kubernetes-csi/external-health-monitor/blob/master/pkg/controller/node_watcher.go#L40

@pohly
Copy link
Contributor

pohly commented Aug 30, 2021

Fixes ##76

I have my doubts whether it actually fixes any of that... please remove that and manually close the issue after it has been confirmed that the issue is gone.

@sonasingh46
Copy link
Author

I have my doubts whether it actually fixes any of that... please remove that and manually close the issue after it has been confirmed that the issue is gone.

Done

fengzixu pushed a commit to fengzixu/external-health-monitor that referenced this pull request Oct 9, 2021
a0f195c Merge pull request kubernetes-csi#106 from msau42/fix-canary
7100c12 Only set staging registry when running canary job
b3c65f9 Merge pull request kubernetes-csi#99 from msau42/add-release-process
e53f3e8 Merge pull request kubernetes-csi#103 from msau42/fix-canary
d129462 Document new method for adding CI jobs are new K8s versions
e73c2ce Use staging registry for canary tests
2c09846 Add cleanup instructions to release-notes generation
60e1cd3 Merge pull request kubernetes-csi#98 from pohly/kubernetes-1-19-fixes
0979c09 prow.sh: fix E2E suite for Kubernetes >= 1.18
3b4a2f1 prow.sh: fix installing Go for Kubernetes 1.19.0
1fbb636 Merge pull request kubernetes-csi#97 from pohly/go-1.15
82d108a switch to Go 1.15
d8a2530 Merge pull request kubernetes-csi#95 from msau42/add-release-process
843bddc Add steps on promoting release images
0345a83 Merge pull request kubernetes-csi#94 from linux-on-ibm-z/bump-timeout
1fdf2d5 cloud build: bump timeout in Prow job
41ec6d1 Merge pull request kubernetes-csi#93 from animeshk08/patch-1
5a54e67 filter-junit: Fix gofmt error
0676fcb Merge pull request kubernetes-csi#92 from animeshk08/patch-1
36ea4ff filter-junit: Fix golint error
f5a4203 Merge pull request kubernetes-csi#91 from cyb70289/arm64
43e50d6 prow.sh: enable building arm64 image
0d5bd84 Merge pull request kubernetes-csi#90 from pohly/k8s-staging-sig-storage
3df86b7 cloud build: k8s-staging-sig-storage
c5fd961 Merge pull request kubernetes-csi#89 from pohly/cloud-build-binfmt
db0c2a7 cloud build: initialize support for running commands in Dockerfile
be902f4 Merge pull request kubernetes-csi#88 from pohly/multiarch-windows-fix
340e082 build.make: optional inclusion of Windows in multiarch images
5231f05 build.make: properly declare push-multiarch
4569f27 build.make: fix push-multiarch ambiguity
17dde9e Merge pull request kubernetes-csi#87 from pohly/cloud-build
bd41690 cloud build: initial set of shared files
9084fec Merge pull request kubernetes-csi#81 from msau42/add-release-process
6f2322e Update patch release notes generation command
0fcc3b1 Merge pull request kubernetes-csi#78 from ggriffiths/fix_csi_snapshotter_rbac_version_set
d8c76fe Support local snapshot RBAC for pull jobs
c1bdf5b Merge pull request kubernetes-csi#80 from msau42/add-release-process
ea1f94a update release tools instructions
152396e Merge pull request kubernetes-csi#77 from ggriffiths/snapshotter201_update
7edc146 Update snapshotter to version 2.0.1

git-subtree-dir: release-tools
git-subtree-split: a0f195cc2ddc2a1f07d4d3e46fc08187db358f94
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021
@xing-yang
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2021
@xing-yang
Copy link
Contributor

Although this may not have fixed #76, it is still useful to make QPS and Burst settings configurable. While API server priority and fairness focuses on providing a granular rate limit mechanism on in-flight requests on the server side, QPS and Burst settings serve as a rate limiting mechanism on Kube API client side.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 6, 2022
@k8s-ci-robot
Copy link
Contributor

@sonasingh46: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xing-yang
Copy link
Contributor

/reopen

@k8s-ci-robot
Copy link
Contributor

@xing-yang: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Sep 12, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sonasingh46
Once this PR has been reviewed and has the lgtm label, please ask for approval from msau42 by writing /assign @msau42 in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@sonasingh46: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-csi-external-health-monitor d2eaf25 link true /test pull-kubernetes-csi-external-health-monitor

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants