You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the collector naively takes a list of job states from the config file and tries to collect all corresponding jobs.
But obviously not all job states make sense in this context. E.g. Pending makes no sense.
I think that we should document a list of job states that can be sensibly used with this collector.
Second part
There are job states that are a little more involved, like Cancelled. You might want to account for cancelled jobs when they were cancelled after running for a few days.
On the other hand there is no guarantee that a cancelled job was ever started. In this case the start_time is Unknown and tokenization of the sacct output will fail at
I believe we need to define which fields might be missing for what job states and have the collector ignore certain entries. (Like a Cancelled job with no start_time instead of crashing.
I didn't think the list through (https://slurm.schedmd.com/sacct.html#SECTION_JOB-STATE-CODES). Cancelled might be the only problem.
The text was updated successfully, but these errors were encountered:
Fist part of the problem
Currently the collector naively takes a list of job states from the config file and tries to collect all corresponding jobs.
But obviously not all job states make sense in this context. E.g.
Pending
makes no sense.I think that we should document a list of job states that can be sensibly used with this collector.
Second part
There are job states that are a little more involved, like
Cancelled
. You might want to account for cancelled jobs when they were cancelled after running for a few days.On the other hand there is no guarantee that a cancelled job was ever started. In this case the
start_time
isUnknown
and tokenization of the sacct output will fail atAUDITOR/collectors/slurm/src/sacctcaller.rs
Line 206 in 256395c
I believe we need to define which fields might be missing for what job states and have the collector ignore certain entries. (Like a
Cancelled
job with nostart_time
instead of crashing.I didn't think the list through (https://slurm.schedmd.com/sacct.html#SECTION_JOB-STATE-CODES).
Cancelled
might be the only problem.The text was updated successfully, but these errors were encountered: