-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VIRUSBreakend VCF detail. #489
Comments
Apart from The FILTER, FORMAT, and INFO fields are the outputs from running GRIDSS on the viral genome. VIRUSBreakend identifies viral single breakends that go to non-viral sequence then identifies the integration locations in the host genome. The INFO and FORMAT annotations are single breakend viral annotations.
They are just the default GRIDSS FILTERs. GRIDSS has a higher threshold for single breakends than breakpoints hence the LOW_QUAL filter being applied to these integration sites.
In the Hartwig cohort, I've found that if an integration site was detected at all (regardless of FILTER), then it's highly likely to be a true positive. Is this not the case for your data?
No, they correspond to the support for the single breakend from a viral genome perspective. This means that RP and SR are always zero, REF/REFPAIR are viral reference supporting read counts.
What do you mean by "supporting viral read count" and "human read count"? To prevent doubling-counting of breakpoint-supporting read pairs that also have one read with a split read alignment, GRIDSS uses a supporting fragment approach. For a breakpoint, you have the following supporting fragment counts:
When the location of the viral integration is ambiguous (e.g. integration into a centromere), then it's not really possible to determine the number of reads/fragments supporting the reference allele on the host side. The simplest way to get the human ref support is to run
It'll be much faster to only process the relevant subset of reads. The relevant subset is the union of the reads extracted by VIRUSBreakend (already in the virusbreakend working directory), and the reads surrounding the breakpoint. The latter can be extracted by
Note that this only works for viral integration where the integration sites occurs in mappable sequence. Integration sites in unmappable sequence (such as centromeres, telomeres, and some repetative or low complexy sequences) will not be called by GRIDSS. |
REF+REFPAIR works except for the specific edge case of a concordant read pair that has an internal split read. >>>> >>>>------<<<<< primary sup primary (primaries are concordant) ^ | breakpoint position
Hi,
I am using VIRUSBreakend for detecting viral integration in human WGS(30X) samples.
I used the default parameters in the virusbreakend.sh script.
I got the output file in .vcf format and converted to excel for examples.(provided in the preview).
I have the following question regarding the output file:
1: What is the critera of FILTER? I found that the values are enough high in the QUAL(684.91, 357.52, 276.53), but they regarded as LOW-QUAL in the FILTER column. And, is it possible to change the filtering criteria and make it more strict?
2: I need to the supporting viral read count and human read count at breakpoint. I wonder RP,SR,REF,RF,VF FORAMT(or INFO) vaues are corresponding (representing) to these breakpoint values?
The text was updated successfully, but these errors were encountered: