Error with exon end earlier than exon start #113

ESWRKJ · 2023-11-01T04:05:02Z

hello, RIK
when I use tama_collapse.py I found Error with exon end earlier than exon start
54827318 54827318
['m64041_201001_163935/150406967/ccs', 'm64041_201001_163935/99746185/ccs']
It is Pacbio Iso-seq data , please tell me what causes this?

GenomeRIK · 2023-11-02T00:28:25Z

Hello,

This typically happens when you get a lot of reads from a messy gene where you start having wobble bridging creating a collapse exon start which occurs before the collapse exon end. However, to be double sure it would help to see the SAM lines from those two reads. Could you post those?

Thank you,
Richard

ESWRKJ · 2023-11-02T02:19:04Z

hello, Richard
those are two reads information , Could you please assist me in identifying the issue?
150406967.txt
99746185.txt

GenomeRIK · 2023-11-02T02:39:09Z

What parameters are you using to run TAMA Collapse?

ESWRKJ · 2023-11-02T02:40:44Z

python /public/home/tama/tama_collapse.py -d merge_dup -x no_cap -b BAM -f /public/home/genome.fasta -s /public/home/Gb.bam -p Gb

GenomeRIK · 2023-11-02T02:46:56Z

This is quite strange. What mapper did you use and what parameters?

ESWRKJ · 2023-11-02T02:48:31Z

pbmm2 align --preset ISOSEQ --unmapped --sort -j 6 --log-level INFO --log-file pbmm2.log /public/home/genome.fasta --sample Gb ./Gb.flnc.bam Gb.bam

GenomeRIK · 2023-11-02T02:53:34Z

The problem seems to be coming from the alignment. You can see from the CIGAR string for read "m64041_201001_163935/99746185/ccs":

2194=2I1172=2767N183=1I132=1D2=510N326I850N254=1D518=202N914=

The issue is here:
Exon: 2194=2I1172=
Intron: 2767N
Exon: 183=1I132=1D2=
Intron: 510N
Exon: 326I <-- This exon is just one big insertion which makes no sense. So the mapper has a bug here for some reason. You may want to contact PacBio about this as this is their aligner version.
Intron: 850N
Exon: 254=1D518=
Intron: 202N
Exon: 914=

ESWRKJ · 2023-11-02T02:59:43Z

ok, So if I extract the reads, can the problem be solved?

GenomeRIK · 2023-11-02T03:01:31Z

In theory yes but if the mapper has a bug then there are likely issues with more mappings so I would not trust it until the bug is resolved. Until they know how this bug works then you cannot know if it is causing silent errors with other transcript models.

ESWRKJ · 2023-11-02T03:09:47Z

ok I hava a try. Thank you so much!

GenomeRIK · 2023-11-02T03:11:18Z

Just to clarify are you trying to contact Pacbio about the bug or trying to remove the read info from the file to make it run through TAMA Collapse?

ESWRKJ · 2023-11-02T03:13:50Z

All. two methods are being carried out simultaneously

GenomeRIK · 2023-11-02T03:39:22Z

Nice! Very efficient.

ESWRKJ · 2023-11-07T03:18:45Z

Hi @GenomeRIK
when I remove the two reads , the error is disappeared ; Pacbio suggests that I change the mapper to minimap2 ，Using minimap2 is normal. But I see tama_report.txt is difference.

Details

minimap2 results Total Gene Count: 59385 Total Transcript Count: 874323 Total Accepted Reads: 3944979 Total Discarded Reads: 4086923

Details

pbmm2 Total Gene Count: 59485 Total Transcript Count: 876767 Total Accepted Reads: 3965819 Total Discarded Reads: 63775

Discarded Reads Huge difference

GenomeRIK · 2023-11-07T23:59:29Z

The larger number of discarded reads from minimap is probably because you did not turn off secondary mappings when running minimap. You can check the read.txt file to see what is the reason for read discards.

ESWRKJ · 2023-11-08T08:17:14Z

yes minimap2 have some secondary mappings

In addition, The gene count and transcript count of the two are slightly different. Does this have any impact?

GenomeRIK · 2023-11-14T00:20:30Z

There is some impact but it may not make much of a difference to you depending on your objectives. Either way the Pacbio mapper is not working for you.

ESWRKJ · 2023-11-14T12:47:58Z

OK thank you for your assistance！

GenomeRIK closed this as completed Nov 17, 2023

ksahlin mentioned this issue Feb 16, 2024

TAMA collapse IndexError #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with exon end earlier than exon start #113

Error with exon end earlier than exon start #113

ESWRKJ commented Nov 1, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 7, 2023

GenomeRIK commented Nov 7, 2023

ESWRKJ commented Nov 8, 2023

GenomeRIK commented Nov 14, 2023

ESWRKJ commented Nov 14, 2023

Error with exon end earlier than exon start #113

Error with exon end earlier than exon start #113

Comments

ESWRKJ commented Nov 1, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 2, 2023

GenomeRIK commented Nov 2, 2023

ESWRKJ commented Nov 7, 2023

GenomeRIK commented Nov 7, 2023

ESWRKJ commented Nov 8, 2023

GenomeRIK commented Nov 14, 2023

ESWRKJ commented Nov 14, 2023