-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strelka2 very slow and trashing disk on ext4 #89
Comments
Thanks Peter, The fsync is used by pyflow to keep its logs up to date in the event of an error. I was not aware this could cause such significant complications. I will add the disable option as an improvement item to for the pyflow API. |
I think I'm facing a related problem. I'm running Strelka2 through bcbio_nextgen on an NFS file system and it seems to run significantly slower compared to other tools. After canceling the run and attempting to delete the working directory, it takes a lot of time to remove files such as these:
@ctsa I'm guessing that writing these files to NFS is what's causing Strelka2 to run so slow in my case. Is there a way to avoid creating these files? |
In case this helps: |
Thanks for the comment. It could be worth trying to run things inside a tmpfs partition, if the temporary data could fit in there. How much RAM do you have available and how large is the tmpfs partition that you used? |
Sorry, it turns out I did not run it on tmpfs, but on an SDD drive (my bad, I forgot to mount the /tmp as tmpfs on my new laptop, which I normally do). I re-run it now on a 20GB tmpfs and it needed 18 min (slower than on the SDD? I did not check the partition usage, is it possible that 20GB was not enough and it went swapping?). In either case, running Strelka2 on an SDD or tmpfs is much faster than on HDD. Is there an option to define a temporary directory? This would be very practical because I wouldn't need to configure the run on unusual locations and then moving the results to my actual working directory. Cheers, |
I was running Strelka2 through Bcbio-nextgen, which does offer a way to set the TMP location: https://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#temporary-directory. |
I have the same symptoms. Any progress here? It there any other solution? |
@ctsa I am not sure on the exact number of log files it has generated, but it appears to be pretty significant. I have another |
@ctsa I have the same issue! Is there any update here? |
Hey @ctsa, I am just checking in to see if you have time to look into this issue, or if you can pass it along to another @Illumina team member. Thank you for your time. Best Regards, |
@skchronicles I think it's quite safe to assume that this software has been abandoned for a while now. Same for Manta, abandoned in July 2019. In general, Illumina now seem to be putting all their efforts into Dragen. Is there any reason why you'd want to use strealka2 so badly, instead of other variant callers such as those from GATK? |
Running strelka2 on my ext4 file system leads to disk trashing and slowness.
The excessive disk access is caused by the ext4 journalling process. This type of problems has been seen before with programs calling fsync many times.
I tested by (hardcoded) removing the fsync calls in strelka, and the disk trashing indeed stops. I suppose that the fsyncs are used to provide consistency of the "so far" data in case of a crash, but having to wipe and restart a crashed analysis completely (which happens seldom) is a better option than constant disk trashing and slowness.
A relatively easy solution could thus be to make fsyncing an option (so it can be turned of on filesystems that do not deal well with it).
Regards,
Peter
The text was updated successfully, but these errors were encountered: