-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use on large cohorts #176
Comments
Hello, I am experiencing a similar problem. Were you able to find a way to speed things up at the graph merging step? Thanks! |
Hello, I was not. I run independent sets of samples in parallel, but in the end I'm not able to speed the step. |
I'm having a similar issue, I also had a similar issue with the Spladder build function and i'm curious to know how fast your build stage was? My took 3 days for 129 samples using 400GB RAM on a HPC, setting a --parallel parameter didn't seem to actually effect anything either |
Yeah, mine took about 2 days for 83 samples on a HPC system. I also don't think the --parallel parameter had any impact. |
Dear all, thanks for the valuable feedback. One of the currently slowest parts in the pipeline is the progressive merging of individual single-sample graphs, which is hard to parallelize over genes. This is on my to-do list since a while. As an intermediate solution, SplAdder offers the possibility for splitting up the merge procedure into chunks and running them in parallel, using the option Below, I have pasted the bit that I added to the docs. Best, ======================================================== Applying this to our 100 sample example
Per default, the chunk size is 10, but can be adapted via the |
Hello Andre, thank you very much for this cool tool!! Just to clear up my doubts about the necessity graph merging in my particular case, could someone please ELI5 when might we be interested in this merging of multiple samples? What is the eventual point of that action? To better analyze, for instance, if a certain splicing event is more prevalent in one sample as compared to in another sample?? And why necessarily this is only recommended for large cohorts? If this confers advantage in case of between-sample comparisons, it should probably be utilized even in a case of simplest setup of 2 samples: |
I have a large amount of samples (101) and I'm trying to run the build mode in several parts (as instructed in SplAdder Documentation, chapter 4).
I have finished the single graphs step by running each sample in parallel and it was fast. Now, I'm trying to run the merge graph part and it's taking more than 5 days to run half the number of samples. I have tried to increase the number of cores, but the job only uses one. I have tried to run the command in parallel for the merge graph part and to include the --parallel parameter (used on the test mode) but they do not work.
In sum, I would like to know if there is any spladder parameter for build mode that would increase the speed or parallelize this step.
At first, I used this command for each single graph: spladder build -o spladder_out -a annotation.gtf -b file.bam --merge-strat single --no-extract-ase (and it worked without problems)
Right now, I'm using this command: spladder build -o spladder_out -a annotation.gtf -b bamfiles_path.txt --merge-strat merge_graphs --no-extract-ase
Thank you very much,
André.
The text was updated successfully, but these errors were encountered: