toil stats is way slow #3089

diekhans · 2020-06-19T02:24:57Z

this is on a smalish cactus run (10 mammals)

I```
NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes.
Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).

real 83m12.232s
user 0m47.406s
sys 2m12.397s




┆Issue is synchronized with this [Jira Story](https://ucsc-cgl.atlassian.net/browse/TOIL-571)
┆Epic: Improve debugging experience
┆Issue Number: TOIL-571

The text was updated successfully, but these errors were encountered:

adamnovak · 2020-06-19T18:46:05Z

That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem? We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.

…

On 6/18/20, Mark Diekhans ***@***.***> wrote: this is on a smalish cactus run (10 mammals) I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore). real 83m12.232s user 0m47.406s sys 2m12.397s ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: #3089

adamnovak · 2020-06-19T18:46:51Z

Also look at the user and sys times. That real time is almost all burned waiting on IO.

…

On 6/19/20, Adam Novak ***@***.***> wrote: That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem? We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests. On 6/18/20, Mark Diekhans ***@***.***> wrote: > this is on a smalish cactus run (10 mammals) > > I``` > NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This > may > take a couple of minutes. > Of the 145093 jobs considered, there are 8627 jobs with children, 135948 > jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and > 0 > jobs with log files currently in > FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore). > > real 83m12.232s > user 0m47.406s > sys 2m12.397s > > ``` > > -- > You are receiving this because you are subscribed to this thread. > Reply to this email directly or view it on GitHub: > #3089

diekhans · 2020-06-19T20:13:19Z

file system is luster with the state splatted on the file system, I doubt if it is worth trying to speed it up. maybe just change the message to say: This may take a couple of hours. Adam Novak <notifications@github.com> writes:

…

That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem? We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.

adamnovak · 2020-06-19T23:57:28Z

Probably we want to say it will take a certain amount of time per job, or "a while".

…

On 6/19/20, Mark Diekhans ***@***.***> wrote: file system is luster with the state splatted on the file system, I doubt if it is worth trying to speed it up. maybe just change the message to say: This may take a couple of hours. Adam Novak ***@***.***> writes: > That's a rate of 1 job processed every 300 ms, which seems pretty fast > if there's any networking involved. How fast is this filesystem? > > We might need to move over to some kind of async IO here to up > throughput when there's latency involved in IO requests. -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #3089 (comment)

unito-bot assigned DailyDreaming Mar 5, 2024

DailyDreaming mentioned this issue Apr 29, 2024

Warn user about wait times for stats gathering with a large quantity of jobs. #4893

Merged

19 tasks

DailyDreaming closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

toil stats is way slow #3089

toil stats is way slow #3089

diekhans commented Jun 19, 2020 •

edited by unito-bot

Loading

adamnovak commented Jun 19, 2020 via email

adamnovak commented Jun 19, 2020 via email

diekhans commented Jun 19, 2020 via email

adamnovak commented Jun 19, 2020 via email

toil stats is way slow #3089

toil stats is way slow #3089

Comments

diekhans commented Jun 19, 2020 • edited by unito-bot Loading

adamnovak commented Jun 19, 2020 via email

adamnovak commented Jun 19, 2020 via email

diekhans commented Jun 19, 2020 via email

adamnovak commented Jun 19, 2020 via email

diekhans commented Jun 19, 2020 •

edited by unito-bot

Loading