Metagenomics Mode (MetaFlye and Binning) #160

LeeBergstrand · 2024-04-19T22:47:30Z

We have previously discussed adding metagenomics compatibility by running fly in meta mode and doing genome binning.

jmtsuji · 2024-04-22T02:37:08Z

@LeeBergstrand For now, I'd suggest to leave genome binning out of the MVP. I think the pipeline should already be compatible with metagenomes up until the end of the circularization step (i.e., before annotation), although I'd need to double check this just to make sure. This level of metagenome compatibility might be enough for the MVP -- users can use rotary to assembly metagenomes with properly closed circular contigs, and then they can handle genome binning themselves. Once the MVP is out, we could consider a meta-mode for rotary as an extension. How does this sound?

jmtsuji · 2024-04-22T02:38:52Z

P.S. The current config file already has a way to turn meta mode on or off for Flye, so that aspect is already addressed. Meta mode is sometimes helpful for genome assemblies (e.g., if you're not sure if the culture is pure... I wonder if it might also help with assembling differentially abundant plasmids).

LeeBergstrand · 2024-04-22T19:12:36Z

@LeeBergstrand For now, I'd suggest to leave genome binning out of the MVP. I think the pipeline should already be compatible with metagenomes up until the end of the circularization step (i.e., before annotation), although I'd need to double check this just to make sure. This level of metagenome compatibility might be enough for the MVP -- users can use rotary to assembly metagenomes with properly closed circular contigs, and then they can handle genome binning themselves. Once the MVP is out, we could consider a meta-mode for rotary as an extension. How does this sound?

@jmtsuji This sounds good to me. To me, it's a low priority at this time.

LeeBergstrand · 2024-04-22T19:14:50Z

Here are some things to think about down the road:

Are there binners that work with Nanopore or Hybrid data?
Would we only be binning off the Illumina data?

jmtsuji · 2024-04-23T02:34:47Z

@LeeBergstrand Good points. My guess is that existing genome binners (e.g., MetaBAT2) should work fine with Illumina, Nanopore, or hybrid data. MetaBAT2 just uses coverage info of the contigs (obtained from BAM files) and the contig sequences themselves to guide genome binning, in my understanding. So long as read mapping is accurate and the contigs are error-free, I think genome binning from a mix of different read types should be OK. It would be worthwhile to check this carefully later on, though.

LeeBergstrand · 2024-09-27T22:48:55Z

@jmtsuji This is becoming more and more of an issue for me. We are finding out that more and more of the genomes we are processing are actually co-cultures even though they are originally thought to be single strain.

jmtsuji · 2024-10-11T06:58:14Z

@LeeBergstrand Thanks for picking up this thread again. Yeah, it sounds like adding some basic genome binning could be helpful even for "pure culture" genome work.

We would probably just need some basic binning rules for rotary -- for example, map the reads to the assembled contigs (within the same sample), then just run 1 genome binner and split out the contigs. Then, the annotation module could be run on each bin separately. This might be pretty simple to implement. (Later on, we could always consider adding more genome binners and aggregating their results to improve binning accuracy, but I am not sure if this would improve things much given that the cultures should generally have a pretty simple microbial community.)

One potential issue we would need to address is how to handle binning of true isolates. The last time I tested binning tools carefully (a few years ago), they generally errored out if they could not produce at least 2 bins. We should see if this is still the case. If so, then we would need some strategy (e.g., based on CheckM2 scores) to figure out if the raw contigs are likely for a single isolate and then skip binning if that is the case.

Also, we could consider changing the default Flye mode to --meta in the config file. My guess is that this might make some assemblies of true isolates worse in a few edge cases, but if the input data quality is good, it would have limited impact on isolate assemblies. Based on a quick look at the methods of the metaFlye paper, I assume the way that repeats in the assembly graph are identified in metaFlye should still work for isolates, but it might be more prone to errors than the algorithm used in the original Flye. I don't have any real evidence, though. I have seen some discussion on X that some folks prefer to use metaFlye by default. The alternative would be to try to predict if a dataset is pure or not before assembly and then choose the Flye mode based on that, but this approach might be too complicated.

@LeeBergstrand Any thoughts?

LeeBergstrand · 2024-10-11T21:14:15Z

@jmtsuji, Questions:

How would polishing affect binning? Do you want to bin before or after polishing?
How would a mixed metagenome affect our circularization code? Would you like to bin before circularization?

Where would the optimal place to put binning be?

LeeBergstrand · 2024-10-11T21:42:45Z

Right now, a vital issue is that Rotary needs to understand the concept of sub-samples (bins). We use the following design pattern throughout Rotary:

rule annotation:
    input:
        summaries=expand("{sample}/{sample}_annotation_summary.zip",sample=SAMPLE_NAMES),

In this pattern, we frequently use the SAMPLE_NAMES variable. However, this will not work when there are bins.

This issue is going to require significant refactoring to fix this issue.

I suggest waiting until we refactor things into pipeline-independent modules before pursuing binning. That way, you can call the annotation module on the bins or the single genomes.

LeeBergstrand · 2024-10-11T22:15:59Z

@jmtsuji, Questions:

How would polishing affect binning? Do you want to bin before or after polishing?

How would a mixed metagenome affect our circularization code? Would you like to bin before circularization?

Where would the optimal place to put binning be?

Another option is that Rotary has a meta-mode but we do things in two steps. You run rotary in normal mode and we give you a list of genomes that are contamianted via CheckM. Then you take these samples and manually do a second run with them in meta-mode. The meta-mode in the config turns flye-meta and binning on and off depending on the flag.

It really depends on where binning happens. It will be easier to add a bunch of bin wild cards the later in the pipeline the binning occurs. There is also some modularization tools that might help here to.

LeeBergstrand mentioned this issue Apr 19, 2024

Task list for rotary #15

Closed

11 tasks

LeeBergstrand added the long_term label Apr 19, 2024

LeeBergstrand changed the title ~~Optional genome binning -- but how to do this? It might end up as a separate tool from rotary. Discussion might be needed.~~ Metagenomics Mode (MetaFly and Binning) Apr 19, 2024

LeeBergstrand changed the title ~~Metagenomics Mode (MetaFly and Binning)~~ Metagenomics Mode (MetaFlye and Binning) Apr 19, 2024

jmtsuji added enhancement New feature or request question Further information is requested and removed long_term labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metagenomics Mode (MetaFlye and Binning) #160

Metagenomics Mode (MetaFlye and Binning) #160

LeeBergstrand commented Apr 19, 2024 •

edited

Loading

jmtsuji commented Apr 22, 2024

jmtsuji commented Apr 22, 2024

LeeBergstrand commented Apr 22, 2024

LeeBergstrand commented Apr 22, 2024

jmtsuji commented Apr 23, 2024

LeeBergstrand commented Sep 27, 2024

jmtsuji commented Oct 11, 2024

LeeBergstrand commented Oct 11, 2024 •

edited

Loading

LeeBergstrand commented Oct 11, 2024 •

edited

Loading

LeeBergstrand commented Oct 11, 2024

Metagenomics Mode (MetaFlye and Binning) #160

Metagenomics Mode (MetaFlye and Binning) #160

Comments

LeeBergstrand commented Apr 19, 2024 • edited Loading

jmtsuji commented Apr 22, 2024

jmtsuji commented Apr 22, 2024

LeeBergstrand commented Apr 22, 2024

LeeBergstrand commented Apr 22, 2024

jmtsuji commented Apr 23, 2024

LeeBergstrand commented Sep 27, 2024

jmtsuji commented Oct 11, 2024

LeeBergstrand commented Oct 11, 2024 • edited Loading

LeeBergstrand commented Oct 11, 2024 • edited Loading

LeeBergstrand commented Oct 11, 2024

LeeBergstrand commented Apr 19, 2024 •

edited

Loading

LeeBergstrand commented Oct 11, 2024 •

edited

Loading

LeeBergstrand commented Oct 11, 2024 •

edited

Loading