Error in version 0.1.2 #8

proteinosome · 2024-02-08T02:16:33Z

Hi @fenderglass and @aysegokce , this is Khi Pin. I've been trying to implement v0.1.2 in my WDL workflow (set to go public sometimes this month), but I've been running into errors where v0.1.1 had no issue:

[2024-02-08 09:17:40] INFO: Starting Severus 0.1.1
[2024-02-08 09:17:40] INFO: Parsing reads from COLO829.tumor.aligned.hiphase.bam
[2024-02-08 09:27:12] INFO: 	Total read length: 196360947730
[2024-02-08 09:27:12] INFO: 	Total aligned length: 196397433408 (1.00)
[2024-02-08 09:27:12] INFO: 	Read N50 / N90: 20063 / 12917
[2024-02-08 09:27:12] INFO: 	Alignments N50 / N90: 19557 / 11599
[2024-02-08 09:27:12] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0018 / 0.0035 / 0.0067
[2024-02-08 09:27:12] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0004 / 0.0010 / 0.0018
[2024-02-08 09:27:14] INFO: Parsing reads from COLO829.normal.aligned.hiphase.bam
[2024-02-08 09:37:33] INFO: 	Total read length: 204693733243
[2024-02-08 09:37:33] INFO: 	Total aligned length: 204676920510 (1.00)
[2024-02-08 09:37:33] INFO: 	Read N50 / N90: 21131 / 15095
[2024-02-08 09:37:33] INFO: 	Alignments N50 / N90: 20621 / 13732
[2024-02-08 09:37:33] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0017 / 0.0033 / 0.0060
[2024-02-08 09:37:33] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0004 / 0.0010 / 0.0017
[2024-02-08 09:37:35] INFO: Computing read quality
[2024-02-08 09:42:05] INFO: Annotating reads
[2024-02-08 09:47:06] INFO: Computing coverage histogram
[2024-02-08 09:50:17] INFO: 	Median coverage by PASS reads for COLO829.tumor.aligned.hiphase.bam (H1 / H2 / H0): 19.0 / 20.0 / 0.0
[2024-02-08 09:50:18] INFO: 	Median coverage by PASS reads for COLO829.normal.aligned.hiphase.bam (H1 / H2 / H0): 31.0 / 31.0 / 0.0
[2024-02-08 09:50:18] INFO: Resolving overlaps
[2024-02-08 09:50:18] INFO: Extracting split alignments
[2024-02-08 09:50:30] INFO: Extracting clipped reads
[2024-02-08 09:50:57] INFO: Clustering unmapped insertions
[2024-02-08 09:51:05] INFO: Starting breakpoint detection
[2024-02-08 09:51:39] INFO: Starting match_long_ins
[2024-02-08 09:51:39] INFO: Starting compute_bp_coverage
[2024-02-08 10:02:18] INFO: Filtering breakpoints
Traceback (most recent call last):
  File "/home/kpin/miniconda3/envs/severus/bin/severus", line 33, in <module>
    sys.exit(load_entry_point('severus==0.1.1', 'console_scripts', 'severus')())
  File "/home/kpin/miniconda3/envs/severus/lib/python3.10/site-packages/severus/main.py", line 205, in main
    double_breaks = call_breakpoints(segments_by_read, ref_lengths, coverage_histograms, bam_files, genome_ids, control_genomes, thread_pool, args)
  File "/home/kpin/miniconda3/envs/severus/lib/python3.10/site-packages/severus/breakpoint_finder.py", line 2036, in call_breakpoints
    double_breaks = double_breaks_filter(double_breaks, args.bp_min_support, cont_id)
  File "/home/kpin/miniconda3/envs/severus/lib/python3.10/site-packages/severus/breakpoint_finder.py", line 598, in double_breaks_filter
    dbs2.bp_1.spanning_reads[genome_id][ind_phased[0]] += db0.bp_1.spanning_reads[genome_id][0]
IndexError: list index out of range

real	45m39.039s
user	199m59.976s
sys	11m8.395s

Do you know why that is happening? The BAM files were aligned with pbmm2 and phased with hiphase using variants from clair3.

Thank you!

The text was updated successfully, but these errors were encountered:

aysegokce · 2024-02-09T21:24:59Z

Hello @proteinosome,

That part is updated in the new version. Can you share how you haplotag the bams?

Ayse

proteinosome · 2024-02-10T01:24:28Z

Hi @aysegokce , Clair3 was used to call germline variants on the tumor, then used as an input to HiPhase to haplotag the BAMs.

aysegokce · 2024-02-11T06:32:00Z

Hi @proteinosome,
There is a compatibility issue with HiPhase, primarily due to the difference in supplementary alignment phasing. We'll include it in the next releases.
Ayse

willhooper · 2024-02-14T21:48:04Z

Hi, I'm seeing a similar issue with v01.2. My CRAMs were aligned with minimap2, germline variants called with clair3, and phasing/haplotagging was done with longphase. v0.1.1 works fine:

[2024-02-14 20:55:10] INFO: Starting Severus 0.1.1
[2024-02-14 20:55:10] INFO: Parsing reads from COLO829-T.10X.haplotagged.cram
[2024-02-14 21:01:17] INFO: 	Total read length: 28095583258
[2024-02-14 21:01:17] INFO: 	Total aligned length: 27992370914 (1.00)
[2024-02-14 21:01:17] INFO: 	Read N50 / N90: 12194 / 6901
[2024-02-14 21:01:17] INFO: 	Alignments N50 / N90: 10909 / 5562
[2024-02-14 21:01:17] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0080 / 0.0121 / 0.0229
[2024-02-14 21:01:17] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0027 / 0.0043 / 0.0080
[2024-02-14 21:01:17] INFO: Parsing reads from COLO829-N.5X.haplotagged.cram
[2024-02-14 21:04:10] INFO: 	Total read length: 8622146850
[2024-02-14 21:04:10] INFO: 	Total aligned length: 8576689196 (0.99)
[2024-02-14 21:04:10] INFO: 	Read N50 / N90: 10730 / 6269
[2024-02-14 21:04:10] INFO: 	Alignments N50 / N90: 9425 / 5052
[2024-02-14 21:04:10] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0079 / 0.0120 / 0.0227
[2024-02-14 21:04:10] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0027 / 0.0043 / 0.0079
[2024-02-14 21:04:10] INFO: Computing read quality
[2024-02-14 21:06:29] INFO: Annotating reads
[2024-02-14 21:08:58] INFO: Computing coverage histogram
[2024-02-14 21:09:43] INFO: 	Median coverage by PASS reads for COLO829-T.10X.haplotagged.cram (H1 / H2 / H0): 2.0 / 1.0 / 1.0
[2024-02-14 21:09:45] INFO: 	Median coverage by PASS reads for COLO829-N.5X.haplotagged.cram (H1 / H2 / H0): 1.0 / 1.0 / 0.0
[2024-02-14 21:09:45] INFO: Resolving overlaps
[2024-02-14 21:09:45] INFO: Extracting split alignments
[2024-02-14 21:09:50] INFO: Extracting clipped reads
[2024-02-14 21:10:04] INFO: Clustering unmapped insertions
[2024-02-14 21:10:08] INFO: Starting breakpoint detection
[2024-02-14 21:12:04] INFO: Starting match_long_ins
[2024-02-14 21:12:04] INFO: Starting compute_bp_coverage
[2024-02-14 21:16:50] INFO: Filtering breakpoints
Traceback (most recent call last):
  File "//Severus/severus.py", line 30, in <module>
    main()
  File "//Severus/severus.py", line 26, in main
    sys.exit(main())
  File "/Severus/severus/main.py", line 205, in main
    double_breaks = call_breakpoints(segments_by_read, ref_lengths, coverage_histograms, bam_files, genome_ids, control_genomes, thread_pool, args)
  File "/Severus/severus/breakpoint_finder.py", line 2036, in call_breakpoints
    double_breaks = double_breaks_filter(double_breaks, args.bp_min_support, cont_id)
  File "/Severus/severus/breakpoint_finder.py", line 601, in double_breaks_filter
    to_keep.remove(ind0)
ValueError: list.remove(x): x not in list

linsindian · 2024-02-26T16:04:10Z

Hello,

I am Sin-Dian, a member of the Longphase team.Thank you for your contributions to somatic SV calling. Recently, while testing the COLO829 data using your software, we encountered a situation similar to the one described above. According to the error report, this issue seems to originate from lines 590 to 601 under the double_breaks_filter function. From my rough understanding of this segment, it primarily deals with a special case of unphased alignment on one side of a double break. Due to Longphase considering supplementary alignments during phasing, it's possible for the primary and supplementary alignment to appear on different haplotypes after haplotagging. This consequently increases the likelihood of entering this function.I've noticed a few potential errors as follows:

In the pairs stored in val, 'a' represents the phase tag, and 'b' represents the corresponding index in the to_keep list. It seems to aim to handle unphased break points, so should it be written as:

#original:
ind_unphased = [b for (a,b) in val if a]
ind_phased = [b for (a,b) in val if  not a]

#modified:
ind_unphased = [b for (a,b) in val if not a]
ind_phased = [b for (a,b) in val if  a]

In line 589, in certain situations, an "index out of range" error may occur. After comparing with the surrounding code, it seems that it should be modified to:

#original:
dbs2.bp_1.spanning_reads[genome_id][ind_phased[0]] += db0.bp_1.spanning_reads[genome_id][0]

#modified:
dbs2.bp_1.spanning_reads[genome_id][haplotype2[ind_phased[0]]] += db0.bp_1.spanning_reads[genome_id][0]

In line 601, a list.remove(x): x not in list error may occur. Since ind0 is a list and list.remove() cannot directly remove the list, and ind0 will only have one element, should it be written as to_keep.remove(ind0[0])?

#original:
ind0 = [i for i, db in enumerate(dbs) if db == db0]
to_keep.remove(ind0)

#modified:
ind0 = [i for i, db in enumerate(dbs) if db == db0]
to_keep.remove(ind0[0])

I think to_keep.remove(ind0[0]) and to_keep.remove(ind_unphase[0]) can achieve the same result. If changed to the latter, there's no need to generate ind0 separately.

Here's the code after summarizing the above suggestions:

ind_unphased = [b for (a,b) in val if not a]
ind_phased = [b for (a,b) in val if a]
db0 = dbs[ind_unphased[0]]
dbs2 = dbs[ind_phased[0]]
dbs2.supp_read_ids += db0.supp_read_ids
dbs2.supp = len(set(dbs2.supp_read_ids))
db0.supp = 0
span_bp1 = db0.bp_1.spanning_reads[genome_id][0]
dbs2.bp_1.spanning_reads[genome_id][haplotype2[ind_phased[0]]] += span_bp1 
db0.bp_1.spanning_reads[genome_id][0] = 0
ind0 = [i for i, db in enumerate(dbs) if db == db0]
to_keep.remove(ind_unphased[0])

Hope these suggestions can be helpful.

Best regards,
Sin-Dian

aysegokce · 2024-02-26T20:42:40Z

Hello @linsindian,
Thank you for using Severus and your suggestions! We are working on improving SV phasing and extending its use with other phasing tools. It will be a part of the next release!
Ayse

aysegokce added the enhancement New feature or request label Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in version 0.1.2 #8

Error in version 0.1.2 #8

proteinosome commented Feb 8, 2024

aysegokce commented Feb 9, 2024

proteinosome commented Feb 10, 2024

aysegokce commented Feb 11, 2024

willhooper commented Feb 14, 2024

linsindian commented Feb 26, 2024 •

edited

Loading

aysegokce commented Feb 26, 2024

Error in version 0.1.2 #8

Error in version 0.1.2 #8

Comments

proteinosome commented Feb 8, 2024

aysegokce commented Feb 9, 2024

proteinosome commented Feb 10, 2024

aysegokce commented Feb 11, 2024

willhooper commented Feb 14, 2024

linsindian commented Feb 26, 2024 • edited Loading

aysegokce commented Feb 26, 2024

linsindian commented Feb 26, 2024 •

edited

Loading