-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metagenomics Mode (MetaFlye and Binning) #160
Comments
@LeeBergstrand For now, I'd suggest to leave genome binning out of the MVP. I think the pipeline should already be compatible with metagenomes up until the end of the circularization step (i.e., before annotation), although I'd need to double check this just to make sure. This level of metagenome compatibility might be enough for the MVP -- users can use rotary to assembly metagenomes with properly closed circular contigs, and then they can handle genome binning themselves. Once the MVP is out, we could consider a meta-mode for rotary as an extension. How does this sound? |
P.S. The current config file already has a way to turn meta mode on or off for Flye, so that aspect is already addressed. Meta mode is sometimes helpful for genome assemblies (e.g., if you're not sure if the culture is pure... I wonder if it might also help with assembling differentially abundant plasmids). |
@jmtsuji This sounds good to me. To me, it's a low priority at this time. |
Here are some things to think about down the road:
|
@LeeBergstrand Good points. My guess is that existing genome binners (e.g., MetaBAT2) should work fine with Illumina, Nanopore, or hybrid data. MetaBAT2 just uses coverage info of the contigs (obtained from BAM files) and the contig sequences themselves to guide genome binning, in my understanding. So long as read mapping is accurate and the contigs are error-free, I think genome binning from a mix of different read types should be OK. It would be worthwhile to check this carefully later on, though. |
@jmtsuji This is becoming more and more of an issue for me. We are finding out that more and more of the genomes we are processing are actually co-cultures even though they are originally thought to be single strain. |
@LeeBergstrand Thanks for picking up this thread again. Yeah, it sounds like adding some basic genome binning could be helpful even for "pure culture" genome work. We would probably just need some basic binning rules for rotary -- for example, map the reads to the assembled contigs (within the same sample), then just run 1 genome binner and split out the contigs. Then, the annotation module could be run on each bin separately. This might be pretty simple to implement. (Later on, we could always consider adding more genome binners and aggregating their results to improve binning accuracy, but I am not sure if this would improve things much given that the cultures should generally have a pretty simple microbial community.) One potential issue we would need to address is how to handle binning of true isolates. The last time I tested binning tools carefully (a few years ago), they generally errored out if they could not produce at least 2 bins. We should see if this is still the case. If so, then we would need some strategy (e.g., based on CheckM2 scores) to figure out if the raw contigs are likely for a single isolate and then skip binning if that is the case. Also, we could consider changing the default Flye mode to @LeeBergstrand Any thoughts? |
@jmtsuji, Questions:
Where would the optimal place to put binning be? |
Right now, a vital issue is that Rotary needs to understand the concept of sub-samples (bins). We use the following design pattern throughout Rotary: rule annotation:
input:
summaries=expand("{sample}/{sample}_annotation_summary.zip",sample=SAMPLE_NAMES), In this pattern, we frequently use the SAMPLE_NAMES variable. However, this will not work when there are bins. This issue is going to require significant refactoring to fix this issue. I suggest waiting until we refactor things into pipeline-independent modules before pursuing binning. That way, you can call the annotation module on the bins or the single genomes. |
Another option is that Rotary has a meta-mode but we do things in two steps. You run rotary in normal mode and we give you a list of genomes that are contamianted via CheckM. Then you take these samples and manually do a second run with them in meta-mode. The meta-mode in the config turns flye-meta and binning on and off depending on the flag. It really depends on where binning happens. It will be easier to add a bunch of bin wild cards the later in the pipeline the binning occurs. There is also some modularization tools that might help here to. |
We have previously discussed adding metagenomics compatibility by running fly in meta mode and doing genome binning.
The text was updated successfully, but these errors were encountered: