-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in hpc-cluster-slurm.yaml #4
Fix bug in hpc-cluster-slurm.yaml #4
Conversation
Default values for terraform variables have been taking precedence over global variables, so the default for the login-node was taken rather than the global variable, unlike the controller which had not default set. To remedy this: * The default was removed from zone for the login node * Precendence has been modified to pull from globals before defaults In addition to this, a few other fixes have been included: * Updating the omnia branch name to include the version * Better error handling in the expand step * Tests to cover new behavior in applyGlobalVariables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the few comments here.
I suggest we merge this into develop first. We test it a little more and then merge develop into main.
} | ||
if err := bc.applyGlobalVariables(); err != nil { | ||
log.Fatal(err) | ||
} | ||
bc.expandVariables() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pick one behavior for error handling. expandVariables handles its own errors, whereas the other two are handled here...
I think that if the errors that happened in combineLabels and in aplyGlobalVariables are unrecoverable, then we should call logFatal within these functions, otherwise, I see no good reason to propagate the error until expand, but not further. It just feels inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was done primarily to help with testing some of these lower level function, where we can return an error and verify that it is what we expected. I think we need to select a level at which it makes sense to actually handle the errors, which is probably in the function called by bc.expand() and bc.validate(), where we can have the most intuitive error messages, but keep the high level functions in config.go clean. What are your thoughts on that?
I also added a task for improving error handling in the future, possibly using a tool like github.com/pkg/errors to process and wrap errors between functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bugs me big time, but ok to push as is, since it is an urgent patch. Then we should agree on how to handle errors.
My big inclination is towards "intent" (Unrecoverable errors should fail immediately), but I see that testing can be harder (but not impossible).
resources/scripts/omnia-install/scripts/ghpc-install/scripts/install_omnia.yml
Outdated
Show resolved
Hide resolved
resources/scripts/omnia-install/scripts/ghpc-install/scripts/install_omnia.yml
Outdated
Show resolved
Hide resolved
This commit creates a more explicit flow showing the precedence for sourcing a setting from explicit to global to default, and failing if it can't be found anywhere. Updating tests to reflect this as well.
Moved templates from their own directory to within reswriter as is required by internal embed tooling and removed a test that is not properly self-contained.
* linter fixes for go and terraform Adding package comments and various other go-lint related fixes. This also updates the python script for omnia install and fixes an outdated branch name for omnia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, as long as we capture follow actions such as added unit tests and homogeneous error testing approach
} | ||
if err := bc.applyGlobalVariables(); err != nil { | ||
log.Fatal(err) | ||
} | ||
bc.expandVariables() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bugs me big time, but ok to push as is, since it is an urgent patch. Then we should agree on how to handle errors.
My big inclination is towards "intent" (Unrecoverable errors should fail immediately), but I see that testing can be harder (but not impossible).
Uses `deployment_name` for the generated directory
Add startup script example using seperate instance template module
OFE - Terraform 1.4 fix, reservations and validation of compatible disk types
Default values for terraform variables have been taking precedence over
global variables, so the default for the login-node was taken rather
than the global variable, unlike the controller which had not default
set.
To remedy this:
In addition to this, a few other fixes have been included:
.tflint.hcl
was added to cover pre-commit tests