-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slurm-send-mail causes error if array job cancelled before any tasks start #141
Comments
Hi, Thanks for reporting the issue. I will take a look and get back to you. |
Note: there is an unmasked e-mail address in your last log file snippet. Edit: I have edited your message to remove it |
Bug confirmed - I have created a new integration test case that demonstrates the bug in the current version of Slurm-Mail. I am working on a fix. |
Interestingly, the Slurm job ID for a cancelled job array that never dispatched is of the form
|
Issue fixed in release 4.21. Thanks again for reporting the issue. |
Thanks for the quick response! |
Many thanks for the sponsorship - that's my first one ever! |
Versions
OS version: Rocky Linux 9.4
Slurm version: 22.05.9-1
Slurm Mail version: 4.20
Describe the bug
We have seen that if a user submits an array job, and cancels it before any tasks start, the slurm-send-mail program will generate an error message in the /var/log/slurm-mail/slurm-send-mail.log log file, and slurm-mail file for the job is not deleted. In our case, thousands of files accumulated over several months and slurm-send-email continually trying to reprocess them. We ended up just deleting these older slurm-email files.
To replicate this, I submitted a simple shell script as an array job and immediately canceled the job:
and the following messages were seen in the slurm-send-mail.log file.
It seemed like the "jobs" python list was empty in this situation, so I was able to fix (or at least avoid) the issue by modifying line 360 in /usr/lib/python3.9/site-packages/slurmmail/cli.p from:
to:
With this change in place, the problematic slurm-email file was processed, no errors arose, and no email was sent, which I think is fine.
Further testing shows that slurm-email is working as it should.
While the change I made works, there emay be a more intelligent way to deal with the situation.
Logs
Same as above example...
Thanks for all of your work in putting out a great Email too for SLURM!!
The text was updated successfully, but these errors were encountered: