Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix a bug that was causing clustermgtd to fail when a field returned by the command scontrol show nodes has a value that contains the character =. #639

Merged

Conversation

gmarciani
Copy link
Contributor

Description of changes

Fix a bug that was causing clustermgtd to fail when a field returned by the command scontrol show nodes has a value that contains the character =.

In particular, before this change when the field Reason is set to a string containing =, then clustermgtd would fail with the below error casuing the nodes management loop to be interrupted.

Unable to get partition/node info from slurm, no other action can be performed. Sleeping... Exception: too many values to unpack (expected 2)

Tests

  • Unit Test

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

a field returned by the command `scontrol show nodes`
has a value that contains the character `=`.

Signed-off-by: Giacomo Marciani <mgiacomo@amazon.com>
@gmarciani gmarciani requested review from a team as code owners July 10, 2024 14:29
@gmarciani gmarciani enabled auto-merge (rebase) July 10, 2024 16:54
"NodeName=multiple-dy-c5xlarge-4\n"
"NodeAddr=multiple-dy-c5xlarge-4\n"
"NodeHostName=multiple-dy-c5xlarge-4\n"
"NodeName=multiple-dy-c5xlarge-3\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the name from multiple-dy-c5xlarge-4 to multiple-dy-c5xlarge-3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason, just fake unit test data

@gmarciani gmarciani merged commit e8e7874 into aws:develop Jul 10, 2024
14 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/3110/fix-show-partitions-0710-1 branch July 10, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants