Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add delay to prevent ssm rate limits if instance count is greater than max throughput #2305

Closed
wants to merge 8 commits into from

Conversation

devsh4
Copy link
Contributor

@devsh4 devsh4 commented Aug 2, 2022

Fix for bug #1841 which prevents hitting AWS SSM rate limits by adding a delay between subsequent putParameter calls.

Note: 25ms delay is based on the AWS service quotas as they only allow 40 requests per second to the parameter store by default.

[Tests]:

  • Pre commit checks ✅
  • Ran build locally ✅
  • Ran linter ✅
  • Ran jest tests ✅

Related PR (never got merged): #1843

@devsh4 devsh4 changed the title Issue 1841: Add delay to prevent ssm rate limits if instance count is greater than max throughput Fix: 1841 - Add delay to prevent ssm rate limits if instance count is greater than max throughput Aug 2, 2022
@devsh4 devsh4 changed the title Fix: 1841 - Add delay to prevent ssm rate limits if instance count is greater than max throughput fix 1841 - Add delay to prevent ssm rate limits if instance count is greater than max throughput Aug 2, 2022
@devsh4 devsh4 changed the title fix 1841 - Add delay to prevent ssm rate limits if instance count is greater than max throughput fix: Add delay to prevent ssm rate limits if instance count is greater than max throughput Aug 2, 2022
@npalm npalm self-requested a review August 2, 2022 17:41
@devsh4
Copy link
Contributor Author

devsh4 commented Aug 2, 2022

@npalm Looks like the jest step is failing because the coverage is dropping below 92%, however all the tests itself are passing.

Jest: "global" coverage threshold for branches (92%) not met: 91.37%

@npalm
Copy link
Member

npalm commented Aug 3, 2022

Guess this is caused since condional delay black is not touched in the test. This requires to add a test with adding 40 runners. Right now not more time to dig in further. Will check later.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 3, 2022

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Sep 3, 2022
@devsh4
Copy link
Contributor Author

devsh4 commented Sep 3, 2022

@npalm Any chance you can look at this? I tried adding some tests with 40 runners but it didn't help.

@npalm
Copy link
Member

npalm commented Sep 3, 2022

Yes will do. But will be after next week.

@github-actions github-actions bot removed the Stale label Sep 4, 2022
@npalm
Copy link
Member

npalm commented Sep 16, 2022

@GuptaNavdeep1983 would you have tim to have a look on this PR as well?

@devsh4
Copy link
Contributor Author

devsh4 commented Oct 4, 2022

Any update on this? Looking to upgrade the module version but cannot until we have this PR merged in.

@npalm
Copy link
Member

npalm commented Oct 5, 2022

Sorry no updates, bit restricted by time. Sorry.

@GuptaNavdeep1983
Copy link
Contributor

@devsh4 can you have a look at the recent changes?

if (instances.length >= ssmParameterStoreMaxThroughput) {
isDelay = true;
}
const throttled = throttle(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lodash throttle function seems not delay but discoarding calls duing the thrutttle time. I think for delay API call sw do not nee a extra labrary. We can use simple the setTimeout fucntion to avoid we hitting the API to hard.

Not sure why we start only throttling above 40, maybe better to start earlier. There is a unit test for creating 2 instances. Which failes when you set the limit 40 to 2. This because putParameter is never called.

Copy link
Contributor Author

@devsh4 devsh4 Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @npalm, we should revert this to what implementation I had before as I had tried something similar before and it didn't work.

@GuptaNavdeep1983 can you please revert this to the original change I had made? See more context below.

@npalm
Copy link
Member

npalm commented Oct 10, 2022

Ther refered AWS document only mention GetParameters to be rate limitted. The Lambda only calls putParatameter. I can't see a specific limit for the putParamaters. Hoever I have seen RateLimits as well when we try to scale aggressive. AWS also used a bucket to rate limite. As soon you bucket is empty ou got rate limitted. The moudle makes several AWS calls. PutParameter wil certainly tak a part in this.

But not sure if delaying the SSM put parameter calal will solve the problem. Once solving the proble here, maybe call the GetParamter from the instance start can be the next limit.

@devsh4
Copy link
Contributor Author

devsh4 commented Oct 11, 2022

AWS service quotas

@npalm yes I believe putParameter call does play a role in hitting rate limits. We are running the forked version of this repo with the initial change I made here and it has been running seamlessly since the past 6 months for spinning up more than ~250 runners at once without hitting rate limits.

@npalm
Copy link
Member

npalm commented Oct 12, 2022

AWS service quotas

@npalm yes I believe putParameter call does play a role in hitting rate limits. We are running the forked version of this repo with the initial change I made here and it has been running seamlessly since the past 6 months for spinning up more than ~250 runners at once without hitting rate limits.

Can you refer to the fork you running here (in case it is OS)?

@devsh4
Copy link
Contributor Author

devsh4 commented Oct 14, 2022

@npalm Sure forked version is here, it's linked above as well. FYI we are using Linux runners.

@devsh4
Copy link
Contributor Author

devsh4 commented Nov 8, 2022

@npalm Any update on this?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants