TAMA collapse IndexError #126

mainciburu · 2024-02-12T13:51:46Z

Hi,
I'm trying to run TAMA collapse on ONT aligned reads that have gone through the following pipeline:
-Adapter trimming and reorienting with Pychopper

PolyA trimming with TAMA
Alignment to genome with uLTRA

Now, I'm using this command to run TAMA collapse

python tama/tama_collapse.py -s reads.filtered.sorted.sam -f $ref_genome_dir/$ref_genome -x no_cap -a 100 -z 100 -m 10 -p tama/collapse/mysample

And I'm getting this error

sam count 1530000 Traceback (most recent call last): File "tama/tama_collapse.py", line 5190, in <module> [h_count,s_count,i_count,d_count,mis_count,nomatch_dict,sj_pre_error_list,sj_post_error_list] = calc_error_rate(start_pos,cigar,seq_list,scaff_name,read_id) File "tama/tama_collapse.py", line 847, in calc_error_rate next_cig_flag = cig_char_list[i + 1] IndexError: list index out of ran

I've guess it has something to do with the cigar string and errors around splice junction. I've also found the first read in my sam that causes the error (I think there's more than one). It's this one:

99:1914|a1ac6b88-3c74-4652-8b19-b682f6e0e40d 0 NC_027836.1 1435 60 2X3=1D1X1=1X4=4I3=3X3=3X166=1X8=1X11=3D104=1I84=2D56=1X111=2X113=1X74=1I1X12=2X6=1X8=1X26=1X2=1X17=1X6=1X88=1X15=1X58=1X137=1X38=1X11=1X5=1X14=1X7=1X10=1X31=1I3=1I32=44I1X2=1I2=1D1X2=1D2=180I1=1X2=1X3=1X2=1D1=1X43=1X2=1X3=1D1=1X3=1X45=1D112=1X7D2992N * 0 0 CGACTTGGCATTAGAATTAGGCTTTGGGGCGAAAATGACTTTATTCAACAAATCATAAAGATATTGGAACATTATATTTTATTTTTGGAATTTGAGCAGGGATAGTAGGTACTTCTTTAAGTTTATTAATTCGAGCTGAATTAGGGACTCCAGGATCTTTAATTGGAGATGATCAAATTTATAATACTATTGTAGCAGCTCATACTTTTATTATATTTTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGACTTGTACCTTTAATATTAGGAGCCCCTGATATAGCTTTCCCACGTATAAATAATATAAGTTTTTTGACTTTTACCCCCATCTTTAACTTTATTAATTTCTAGTAGCATTGTAGAAAATGGAGCAGGAACTGGATGAACAGTTTACCCCCCTCTCCTCTAATATTGCTCATGGTGGTAGTTCAGTAGATTTAGCTATTTTCCCACTTCATTTAGCTGGAATTTCATCTATTTTAGGAGCTATTAACTTTATTACTACTATTATTAATATACGATTAAATAATTTATCATTTGATCAAATACCTTTATTTATTTAGGCTGTAGGTATTACTGCATTCTTATTATTATTATCTTTACCTGTTTTAGCCGGAGCTATTACTATATTACTTACTGATCGAAATTTAAATACATCATTTTTCGATCCTGCAGGTGGAGGTGATCCTATTCTTTATCAACATTTATTTTGATTTTTTGGACATCCTGAAGTATATATTTTAATTTTACCAAGGATTTGGTATACATTCTCATATTATTTCCCAAGAAAGAGGTAAAAAGGAAACATTCGGGTGTTTAGGTATAATTTACGCTATACTAGCAATTGGTTTATTAGGATTTATTGTTTGAGCTCATCATATATTTACTGTAGGAATAGATATTGATACACGAGCATATTTTACATCAGCAACAATAATTATTACTGTACCAACAGGTATTAAAATTTTTAGTTGATTAGCTACTTTCCATGGAACTCAAATTAATTATTCCCCATCTATTTTATGAAGATTAGGATTTGTATTTTTATTTACTGTAGGAGGATTAACAGGTGTAATTTTATCTAATTCTTCTATTGATATTACTTTACATGATACTTACTATGTAGTTGCTCATTTCCATTATGTTTTATCAATAGGAGCTGTATTTGCTATTTTAGGGGGATTTATTCATTGATACCCATTATTTACTGGTTTATCTTCAAATCCTTATTTATTAAAAATTCAATTTTTTATTATATTTATCCGGAAGTAAATTTAACTTTCTTCCCACAACATTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGATGCATGCCTGGTGGTATTCTGATTATCCTGATTCTTATATTTCATGAAATATTATTTTATTATTGAATCATATATTTCATTATTAGCAATCATATTTATATTAATTATTATTTGAAATCTATAATTAATCAACGAATTTCTTTATTTACTTTAAATTTATCTTCTTCAATTGAATGATATCAAAATTTACCACCAGCTGAACATTCATATAATGAATTGCCTATTTTT * XA:Z: XC:Z:Insufficient_junction_coverage_unclassified NM:i:297

Could you help me out understand what's going on? How could I filter the sam file to get rid of problematic reads?
Thank you!

The text was updated successfully, but these errors were encountered:

ksahlin · 2024-02-12T14:12:39Z

Hi,

My bad. I am the developer of uLTRA. What you observe is an error in how uLTRA output the cigar (problematic cigar string as it ends with 112=1X7D2992N). It would take me some time to track down and fix this error. I do see that the read looks problematic.

In the meantime, perhaps it is possible to handle such errors gracefully in TAMA with a try except statemrnt and printing a warning if an error occurs in this part of the code?

Best,
Kristoffer

mainciburu · 2024-02-12T16:49:51Z

Thanks a lot for the fast reply! I've been trying to make sense of this error for a few days and I would have never found the answer. When you say that read I showed looks problematic, is it a formatting issue or does it look bad quality / bad alignment...?
In any case, I found and removed the reads with cigar string ending in N (it was just 3). I run TAMA again and the original error disappeared! Unfortunately, now I have a new error, probably not related to uLTRA anymore (?). I've seen it has been reported a couple of times here, but didn't get a clear idea of what to do about it in my case. It's the following:

Error with exon end earlier than exon start 738633 738633 ['105:518|689ca4ad-c28a-4bc4-aa8c-51cede2beb56', '99:514|04e01855-0675-4357-bac9-8cc36ad5497b', '133:524|25e08f18-7d09-4bb4-aab9-bfe7f4d66be7', '132:413|493618ca-d927-44c7-931e-c99a14ff3822', '105:523|9fbfd390-4964-454d-881d-3441cb994cfa', '119:412|dc5a1936-d9f9-4944-af15-ade882739d93', '121:325|54554b0c-84d6-4611-bdbf-c0fc12a887d3', '131:562|8a715b1c-eb42-4789-b79e-787b500e2b8c', '138:488|9b56f899-461f-4407-a590-8fa04b572ad4', '128:537|6655c133-3d0b-45d3-9e7d-3f1c09128914', '105:383|5c6fe978-7fba-46bb-8a79-b29c1f020d63', '108:395|79a13c40-b842-4f9f-b895-4a5fb21f61e0', '126:383|7cd9ba40-065a-4ee3-b6be-cbc70c24f81d', '139:519|20ff1d7f-5853-41bc-95ba-67141321180f', '126:415|09635947-bb94-4922-a3cf-acaff07eeac9', '130:541|45d993fe-091a-4a45-8944-adee3ab2e009', '136:549|9b0186ba-411a-4699-b259-552a1b257141', '98:311|250ea703-de96-40e3-9724-2084768bb525', '127:509|4ebcafbb-83af-4fdc-ba1b-b5523f43d734', '128:374|6026b742-a4d9-4d27-b33b-36956afd61fd', '129:379|8dea2747-d088-46f6-99a0-861dcd40c336', '129:384|2af9aaea-c4b1-477f-89e3-5b44629e65d1', '121:591|700b6c2b-320c-4a22-8c54-3327954e1be6', '126:507|de27e02f-378c-42c6-8174-4efa6d594b58', '104:411|f2bc7335-a24b-4a0d-9feb-c0835e17be5c', '136:545|e12af295-37af-4ef3-a6aa-2e895b22a41b', '129:417|68ad3d6c-e02a-4819-8f42-ca404a023041', '98:309|80ef9789-6713-4838-afd3-b33a35c91918', '136:574|f2dc187d-0b30-467b-8fbd-c0cf66bff3e5', '137:398|1c647ab7-c18a-4373-8d28-e8adb668983f', '131:532|e2e37bc5-1f95-4ecb-8ba8-b5d90f22e17e', '121:531|8bc49d2e-ac63-41c4-ab4f-77f9532d78d3']

So I still have a bunch of problematic reads... I attach the sam file with the reads mentioned in the error message here:
error_subset_sam.txt
I'll appreciate any hint on how to proceed. Thank you!!

ksahlin · 2024-02-16T14:10:15Z

When you say that read I showed looks problematic, is it a formatting issue or does it look bad quality / bad alignment...?

I was referring to the long homopolymer/STR. However, a mature aligner should handle these things seamlessly (diss to my own aligner). Although I now see that also other established aligners are struggling with the cigar output as reported in this repo.

I've seen it has been reported a couple of times here, but didn't get a clear idea of what to do about it in my case.

Not sure. Perhaps you can try to remove also those reads (as done here: #113). The 30-40 reads affected is only a tiny subset of your data I assume(?), so shouldn't change your results notably unless they are all from some gene(s) that you are interested in.

mainciburu · 2024-02-20T12:08:17Z

Thanks for the clarification! And yeh, I ended up removing those reads too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TAMA collapse IndexError #126

TAMA collapse IndexError #126

mainciburu commented Feb 12, 2024

ksahlin commented Feb 12, 2024

mainciburu commented Feb 12, 2024

ksahlin commented Feb 16, 2024 •

edited

Loading

mainciburu commented Feb 20, 2024

TAMA collapse IndexError #126

TAMA collapse IndexError #126

Comments

mainciburu commented Feb 12, 2024

ksahlin commented Feb 12, 2024

mainciburu commented Feb 12, 2024

ksahlin commented Feb 16, 2024 • edited Loading

mainciburu commented Feb 20, 2024

ksahlin commented Feb 16, 2024 •

edited

Loading