-
Notifications
You must be signed in to change notification settings - Fork 192
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
SlurmScheduler
: always raise for non-zero exit code (#4332)
The `SlurmScheduler` intentionally ignored non-zero exit codes returned by SLURM when asking the status for a number of job ids. This was put in place because SLURM will return a non-zero exit code not only in case of actual errors in attempting to retrieve the status of the requested jobs but also when specifying just a single job that no longer is active. Since the latter is not really an error, yet is difficult to distinguish from a "real" error, the exit code was ignored. However, this could lead to the plugin sometimes incorrectly ignoring a real problem and assuming a job was completed when it was in fact still active. The solution is to use the weird behavior of SLURM that when asking for more than one job, it will never return a non-zero status, even when one or more jobs have finished. That is why, when asking for the status of a single job, we duplicate the job id, such that even when it is no longer active, the exit status will still be zero.
- Loading branch information
Showing
2 changed files
with
65 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters