Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce 'suspend_time' in example to minimize destroy leaving behind compute nodes #292

Conversation

nick-stroud
Copy link
Collaborator

@nick-stroud nick-stroud commented May 11, 2022

The default suspend_time is 5 minutes. This is a long time to ask our users to wait for nodes to be cleaned up in a 15 min tutorial. If we do not ask them to wait then there is a high likelihood that destroy will fail, requiring manual cleanup. It was decided that 1 minute was a reasonable time for the user to wait for cleanup.

Tested:
I manually tested to confirm the cleanup time. Nodes start to be deleted after 1 min but take another 3 minutes to actually be deleted fully.

Submission Checklist

  • Have you installed and run this change against pre-commit? pre-commit install
  • Are all tests passing? make tests
  • If applicable, have you written additional unit tests to cover this
    change?
  • Is unit test coverage still above 80%?
  • Have you updated any application documentation such as READMEs and user
    guides?
  • Have you followed the guidelines in our Contributing document?

@tpdownes
Copy link
Member

There is a recommendation here that the default values do not honor. This value dishonors it "more".

Not unreasonable to suspect the advice doesn't apply in our use case.

Can you pose this question to the SchedMD team? My own worry would be that, if you set the value too low, you might not be guaranteed of having a scheduling "round" attempt to schedule to the idle VM. In which case, the value might as well be 0. If this is not a concern (due to some sort of synchronization of clocks) and they see why lowering the value here is OK, then let's lower it.

@tpdownes tpdownes assigned nick-stroud and unassigned tpdownes May 11, 2022
@nick-stroud nick-stroud removed their assignment May 11, 2022
@nick-stroud
Copy link
Collaborator Author

There is a recommendation here that the default values do not honor. This value dishonors it "more".

Not unreasonable to suspect the advice doesn't apply in our use case.

Slurm team has confirmed that 60s is a reasonable value to use here.

@tpdownes
Copy link
Member

Slurm team has confirmed that 60s is a reasonable value to use here.

Who am I to say no? LGTM.

@nick-stroud nick-stroud merged commit 26a7153 into GoogleCloudPlatform:develop May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants