Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex - BFB - ecDNA - Differences #12

Open
DSchreyer opened this issue Oct 12, 2022 · 1 comment
Open

Complex - BFB - ecDNA - Differences #12

DSchreyer opened this issue Oct 12, 2022 · 1 comment

Comments

@DSchreyer
Copy link

In the AmpliconClassifier documentation is detailed, that only cyclic amplicons can be classified into BFB or ecDNAs.
However, in my analysis, I noticed that some were classified as Complex, but then received the BFB positive info. How can this be? Does this complex amplicon now also contain BFB signatures?

Screenshot 2022-10-12 at 11 49 50

I also noticed that changes in the downsampling method, AmpliconArchitect/AmpliconClassifier sometimes identifies amplicons as BFB and sometimes as ecDNAs. Our coverage is 15x so the downsampling results should be somehow similar. I suspect that reads connecting different amplified regions are missing and therefore not identified in the downsampled BAM (See below; left 15x, right downsampled 10x)

Screenshot 2022-10-12 at 11 47 30

On another note: Is an amplicon immediately defined as BFB if it contains a BFB signature even if it could also have ecDNAs? I noticed that all amplicons are either identified as BFB positive or ecDNA positive, but non are both. Even if the downsampled/normal results say that it could be both?

@jluebeck
Copy link
Member

Hi Daniel,

Thanks for reaching out with these excellent questions.

The documentation indeed notes that AA amplicons receiving a "cyclic" classification may contain BFB or ecDNA - however - this is not to say that only "cyclic" amplicons may have BFB or ecDNA. As you have observed, some BFBs are present in AA amplicons classified as complex non-cyclic, and this is completely fine. I will update the documentation to include this distinction and avoid future misinterpretations.

The downsampling variablility with AA is indeed on our radar. More generally speaking, changes in the seed identification method (e.g. replace CNVKit with Battenberg) and changing the downsampling may disproportionately affect the classification of low copy-number amplicons (CN < 10, like the example you shared), where there is already weaker evidence in the underlying reads for the amplified SVs, and the amplified contents may not be very well separated from the baseline chromosomal copy number. There is also some variability introduced by the random downsampling itself which we intend to make more deterministic. Generally, I would recommend for cell lines with high CN, the default downsampling is fine, but for very impure samples or possibly very subclonal samples, then I recommend going up to an effective coverage of 40x. We are still ironing out the best practices here so I am not yet ready to add that to the documentation for best practices.

The logic for BFB/ecDNA separation goes as follows:

  1. Check the AA paths/cycles if to determine which conform to the BFB mechanism.
  2. Determine if an adequate fraction of the AA paths/cycles conform to BFB mechanism and there is an adequate fraction of foldback SVs.
  3. If BFB present, then mark those genomic regions as BFB.
  4. From the remaining cycles (this would be all cycles if there was no BFB), weigh the the ecDNA-like cycles and make a determination of ecDNA presence (including how many distinct non-overlapping ecDNA were captured in the AA amplicon).
  5. If no ecDNA or BFB, then mark the amplification "feature" as Complex non-cyclic/Linear/No amp.

If the ecDNA and BFB are captured in the same AA amplicon, and they do not overlap in terms of genomic coordinates, then both can be called (we have examples where this happens). Ideally an AA amplicon only captures one focal amplification event, and there would not be both appearing at once, but this is not always the case.

Alternately, if an ecDNA and BFB overlapped genomically, and the AA amplicon still had enough relative path/cycle decompositions to call the BFB, then the amplicon would be called a BFB and the ecDNA would be missed. Distinguishing ecDNA from BFB is already a challenging task, and distinguishing an ecDNA derived from BFB or one overlapping a BFB is even more challenging, especially with short reads. We are simultaneously faced with a lack of cytogenetically validated testing examples with that phenomenon which would allow us to improve that aspect of the tool.

AmpliconClassifier is currently unpublished as we are still developing it, but hopefully we will have a preprint that outlines the classification logic in the next few months.

Thanks,
Jens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants