-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run times slow with 76 genomes #298
Comments
Current run time nearing 50 days or more. |
My first question is if the instruction set in the installed version of pggb is older, leading to less wide vector instructions and slower processing. We had been dealing with this before. I'm not sure of the state of the binaries accessible from dockerhub and conda. |
My installed version, which I pulled from docker hub using Singularity is version 0.5.3 from February 10. |
Docker/Singularity should be about ~30% slower than building from GitHub source (at least on our cluster). Can you also share your current (I suppose very long) PGGB .log file? |
Surprisingly chromosome 1 has just completed after running for 50 days. I can provide that log file. Chromosome 1 would usually complete first. I expect the other chromosomes will take an additional week or two. I can provide the log file but it's 1Gb in size. How could I get it to you? |
I just gzipped the log file. Now down to 25Mb. What's the limit for file attachments on here? |
LOL! Now it is a nice size! I think sharing it on GitHub could work. Or you could put the file temporarily on Google Drive or similar. I would like to check if your bottlenecks are in wfmash mapping and/or alignment, the GFA->ODGI conversion (it happens in smoothxg), the PO alignment in smoothxg, etc... |
I've attached the gzipped log file. Hopefully no issues with the attachment. barley_pangenome_1H.fasta.4f79ff6.371d99c.2f0e65c.smooth.03-29-2023_07:39:12.log.gz |
The log doesn't look 100% healthy, but I can see that the 1st round of "path fragments embedding" took ~18 days! I suppose the other 2 rounds took similar times too. That's surprising.
|
Hi @brettChapman, sorry for the extremely long wait. I worked on If you can work also with GitHub branches, it would be helpful if you could run the same |
@brettChapman were you lucky enough to try the updated |
Yes, I've used the latest version now and found smoothxg ran a lot faster. Recently we've gotten access to a larger cluster paying at a higher cost, with SSD and 2TB RAM. We've found our PGGB jobs ran significantly faster, cutting months off the run time. Previous systems we've had access to have been limited to mechanical drives and limited RAM, but these were public funded resources. |
Thanks for the update! Saving months seems to be hot enough for the environment and global warming xD |
Hi
Continuing on discussion from waveygang/wfmash#171
PGGB is running slow with 76 haplotypes, ran per chromosome on assembled pseudomolecules, on genomes which are around 4-5Gb in size.
-s 100Kbp
-p 93
-k 316
poa_params="asm20"
poa_length_target="700,900,1100"
transclose_batch=10000000
Remaining parameters default.
The text was updated successfully, but these errors were encountered: