-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
toil stats is way slow #3089
Comments
That's a rate of 1 job processed every 300 ms, which seems pretty fast
if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up
throughput when there's latency involved in IO requests.
…On 6/18/20, Mark Diekhans ***@***.***> wrote:
this is on a smalish cactus run (10 mammals)
I```
NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may
take a couple of minutes.
Of the 145093 jobs considered, there are 8627 jobs with children, 135948
jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0
jobs with log files currently in
FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).
real 83m12.232s
user 0m47.406s
sys 2m12.397s
```
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#3089
|
Also look at the user and sys times. That real time is almost all
burned waiting on IO.
…On 6/19/20, Adam Novak ***@***.***> wrote:
That's a rate of 1 job processed every 300 ms, which seems pretty fast
if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up
throughput when there's latency involved in IO requests.
On 6/18/20, Mark Diekhans ***@***.***> wrote:
> this is on a smalish cactus run (10 mammals)
>
> I```
> NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This
> may
> take a couple of minutes.
> Of the 145093 jobs considered, there are 8627 jobs with children, 135948
> jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and
> 0
> jobs with log files currently in
> FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).
>
> real 83m12.232s
> user 0m47.406s
> sys 2m12.397s
>
> ```
>
> --
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub:
> #3089
|
file system is luster
with the state splatted on the file system, I doubt if it is worth
trying to speed it up.
maybe just change the message to say:
This may take a couple of hours.
Adam Novak <notifications@github.com> writes:
… That's a rate of 1 job processed every 300 ms, which seems pretty fast
if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up
throughput when there's latency involved in IO requests.
|
Probably we want to say it will take a certain amount of time per job,
or "a while".
…On 6/19/20, Mark Diekhans ***@***.***> wrote:
file system is luster
with the state splatted on the file system, I doubt if it is worth
trying to speed it up.
maybe just change the message to say:
This may take a couple of hours.
Adam Novak ***@***.***> writes:
> That's a rate of 1 job processed every 300 ms, which seems pretty fast
> if there's any networking involved. How fast is this filesystem?
>
> We might need to move over to some kind of async IO here to up
> throughput when there's latency involved in IO requests.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#3089 (comment)
|
19 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
this is on a smalish cactus run (10 mammals)
I```
NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes.
Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).
real 83m12.232s
user 0m47.406s
sys 2m12.397s
The text was updated successfully, but these errors were encountered: