Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing alignments #31

Open
yevshin opened this issue Mar 15, 2016 · 0 comments
Open

Missing alignments #31

yevshin opened this issue Mar 15, 2016 · 0 comments

Comments

@yevshin
Copy link

yevshin commented Mar 15, 2016

When the read maps to several places equally well and one of these mappings span exon/intron junction only one alignment is reported (alignment spanning exon/intron junction). I will provide test case:
file genome.fa:

1
AAAAAAAAAAGGGGCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGGTTTTTTTTTCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

file test.gtf:

1 test exon 1 50 . + . exon_id "E1"; exon_number "1"; gene_id "G1"; transcript_id "T1"; tss_id "TSS1";
1 test exon 60 100 . + . exon_id "E2"; exon_number "2"; gene_id "G1"; transcript_id "T1"; tss_id "TSS1";
1 test transcript 1 100 . + . gene_id "G1"; transcript_id "T1"; tss_id "TSS1";

file test.fq:

@r1
AAAAAAAAAAGGGGCCCCAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIII

commands to run:

bowtie2-build genome.fa genome
tophat2 --transcriptome-only -G test.gtf genome test.fq

this will produce file tophat_out/accepted_hits.bam containing single alignment:

samtools view tophat_out/accepted_hits.bam
r1 0 1 37 50 14M9N14M * 0 0 AAAAAAAAAAGGGGCCCCAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:28 YT:Z:UU XS:A:+ NH:i:1

but there should be two alignments. Note also that alignment is reported as unique (score 50 and NH:i:1), and when the user will filter uniquely mapped reads it will get many reads spanning exon/intron junctions that are actually not unique.

I tracked the problem to tophat/src/align_status.cpp:138 where score for alignments spanning exon/intron junctions is increased by 2, subsequently all other alignments will be considered as secondary since they have less score (by 2). I don't understand the reason for re-weighting scores of these alignments, but commenting out this line solves the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant