Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strelka2 very slow and trashing disk on ext4 #89

Open
derijkp opened this issue Jan 30, 2019 · 11 comments
Open

strelka2 very slow and trashing disk on ext4 #89

derijkp opened this issue Jan 30, 2019 · 11 comments
Assignees

Comments

@derijkp
Copy link

derijkp commented Jan 30, 2019

Running strelka2 on my ext4 file system leads to disk trashing and slowness.
The excessive disk access is caused by the ext4 journalling process. This type of problems has been seen before with programs calling fsync many times.
I tested by (hardcoded) removing the fsync calls in strelka, and the disk trashing indeed stops. I suppose that the fsyncs are used to provide consistency of the "so far" data in case of a crash, but having to wipe and restart a crashed analysis completely (which happens seldom) is a better option than constant disk trashing and slowness.
A relatively easy solution could thus be to make fsyncing an option (so it can be turned of on filesystems that do not deal well with it).

Regards,

Peter

@ctsa ctsa self-assigned this Feb 5, 2019
@ctsa
Copy link
Contributor

ctsa commented Feb 5, 2019

Thanks Peter, The fsync is used by pyflow to keep its logs up to date in the event of an error. I was not aware this could cause such significant complications. I will add the disable option as an improvement item to for the pyflow API.

@amizeranschi
Copy link

I think I'm facing a related problem. I'm running Strelka2 through bcbio_nextgen on an NFS file system and it seems to run significantly slower compared to other tools. After canceling the run and attempting to delete the working directory, it takes a lot of time to remove files such as these:

deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/586/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/586/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/586/
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/584/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/584/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/584/
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/582/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/582/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/582/
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/580/taskWrapperParameters.pickle
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/580/pyflowTaskWrapper.signal.txt
deleting testingVC-merged/work/bcbiotx/tmpeYQMtc/testingVC-6_44316087_48702894-work/workspace/pyflow.data/logs/tmp/taskWrapperLogs/002/580/

@ctsa I'm guessing that writing these files to NFS is what's causing Strelka2 to run so slow in my case. Is there a way to avoid creating these files?

@abenjak
Copy link

abenjak commented Apr 8, 2019

In case this helps:
(tumorBAM=11G, normalBAM=29G, using -j 6, on a i7-8750H CPU)
I ran strelka-2.9.10 on my internal 1TB ext4 disk. It finished in 2 h.
I ran the same job on my tmpfs partition, it finished in 13 min.

@amizeranschi
Copy link

Thanks for the comment. It could be worth trying to run things inside a tmpfs partition, if the temporary data could fit in there. How much RAM do you have available and how large is the tmpfs partition that you used?

@abenjak
Copy link

abenjak commented Apr 9, 2019

Sorry, it turns out I did not run it on tmpfs, but on an SDD drive (my bad, I forgot to mount the /tmp as tmpfs on my new laptop, which I normally do).

I re-run it now on a 20GB tmpfs and it needed 18 min (slower than on the SDD? I did not check the partition usage, is it possible that 20GB was not enough and it went swapping?).

In either case, running Strelka2 on an SDD or tmpfs is much faster than on HDD. Is there an option to define a temporary directory? This would be very practical because I wouldn't need to configure the run on unusual locations and then moving the results to my actual working directory.

Cheers,
Andrej

@amizeranschi
Copy link

I was running Strelka2 through Bcbio-nextgen, which does offer a way to set the TMP location: https://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#temporary-directory.

@serge2016
Copy link

I have the same symptoms. Any progress here? It there any other solution?

@skchronicles
Copy link

@ctsa
Hey Chris, are there any updates to this issue? I am also experiencing a similar issue were pyflow is generating hundreds of thousand log files.

Here is a small snippet:
image

I am not sure on the exact number of log files it has generated, but it appears to be pretty significant. I have another find command that has been running for over 30 mins.

@haochenz96
Copy link

@ctsa I have the same issue! Is there any update here?

@skchronicles
Copy link

Hey @ctsa, I am just checking in to see if you have time to look into this issue, or if you can pass it along to another @Illumina team member.

Thank you for your time.

Best Regards,
@skchronicles

@amizeranschi
Copy link

@skchronicles I think it's quite safe to assume that this software has been abandoned for a while now. Same for Manta, abandoned in July 2019. In general, Illumina now seem to be putting all their efforts into Dragen.

Is there any reason why you'd want to use strealka2 so badly, instead of other variant callers such as those from GATK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants