Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with exon end earlier than exon start #113

Closed
ESWRKJ opened this issue Nov 1, 2023 · 18 comments
Closed

Error with exon end earlier than exon start #113

ESWRKJ opened this issue Nov 1, 2023 · 18 comments

Comments

@ESWRKJ
Copy link

ESWRKJ commented Nov 1, 2023

hello, RIK
when I use tama_collapse.py I found Error with exon end earlier than exon start
54827318 54827318
['m64041_201001_163935/150406967/ccs', 'm64041_201001_163935/99746185/ccs']
It is Pacbio Iso-seq data , please tell me what causes this?

@GenomeRIK
Copy link
Owner

Hello,

This typically happens when you get a lot of reads from a messy gene where you start having wobble bridging creating a collapse exon start which occurs before the collapse exon end. However, to be double sure it would help to see the SAM lines from those two reads. Could you post those?

Thank you,
Richard

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 2, 2023

hello, Richard
those are two reads information , Could you please assist me in identifying the issue?
150406967.txt
99746185.txt

@GenomeRIK
Copy link
Owner

What parameters are you using to run TAMA Collapse?

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 2, 2023

python /public/home/tama/tama_collapse.py -d merge_dup -x no_cap -b BAM -f /public/home/genome.fasta -s /public/home/Gb.bam -p Gb

@GenomeRIK
Copy link
Owner

This is quite strange. What mapper did you use and what parameters?

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 2, 2023

pbmm2 align --preset ISOSEQ --unmapped --sort -j 6 --log-level INFO --log-file pbmm2.log /public/home/genome.fasta --sample Gb ./Gb.flnc.bam Gb.bam

@GenomeRIK
Copy link
Owner

The problem seems to be coming from the alignment. You can see from the CIGAR string for read "m64041_201001_163935/99746185/ccs":

2194=2I1172=2767N183=1I132=1D2=510N326I850N254=1D518=202N914=

The issue is here:
Exon: 2194=2I1172=
Intron: 2767N
Exon: 183=1I132=1D2=
Intron: 510N
Exon: 326I <-- This exon is just one big insertion which makes no sense. So the mapper has a bug here for some reason. You may want to contact PacBio about this as this is their aligner version.
Intron: 850N
Exon: 254=1D518=
Intron: 202N
Exon: 914=

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 2, 2023

ok, So if I extract the reads, can the problem be solved?

@GenomeRIK
Copy link
Owner

In theory yes but if the mapper has a bug then there are likely issues with more mappings so I would not trust it until the bug is resolved. Until they know how this bug works then you cannot know if it is causing silent errors with other transcript models.

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 2, 2023

ok I hava a try. Thank you so much!

@GenomeRIK
Copy link
Owner

Just to clarify are you trying to contact Pacbio about the bug or trying to remove the read info from the file to make it run through TAMA Collapse?

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 2, 2023

All. two methods are being carried out simultaneously

@GenomeRIK
Copy link
Owner

Nice! Very efficient.

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 7, 2023

Hi @GenomeRIK
when I remove the two reads , the error is disappeared ; Pacbio suggests that I change the mapper to minimap2 ,Using minimap2 is normal. But I see tama_report.txt is difference.

Details

minimap2 results Total Gene Count: 59385 Total Transcript Count: 874323 Total Accepted Reads: 3944979 Total Discarded Reads: 4086923

Details

pbmm2 Total Gene Count: 59485 Total Transcript Count: 876767 Total Accepted Reads: 3965819 Total Discarded Reads: 63775

Discarded Reads Huge difference

@GenomeRIK
Copy link
Owner

The larger number of discarded reads from minimap is probably because you did not turn off secondary mappings when running minimap. You can check the read.txt file to see what is the reason for read discards.

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 8, 2023

yes minimap2 have some secondary mappings
image

In addition, The gene count and transcript count of the two are slightly different. Does this have any impact?

@GenomeRIK
Copy link
Owner

There is some impact but it may not make much of a difference to you depending on your objectives. Either way the Pacbio mapper is not working for you.

@ESWRKJ
Copy link
Author

ESWRKJ commented Nov 14, 2023

OK thank you for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants