Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in version 0.1.2 #8

Open
proteinosome opened this issue Feb 8, 2024 · 6 comments
Open

Error in version 0.1.2 #8

proteinosome opened this issue Feb 8, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@proteinosome
Copy link

Hi @fenderglass and @aysegokce , this is Khi Pin. I've been trying to implement v0.1.2 in my WDL workflow (set to go public sometimes this month), but I've been running into errors where v0.1.1 had no issue:

[2024-02-08 09:17:40] INFO: Starting Severus 0.1.1
[2024-02-08 09:17:40] INFO: Parsing reads from COLO829.tumor.aligned.hiphase.bam
[2024-02-08 09:27:12] INFO: 	Total read length: 196360947730
[2024-02-08 09:27:12] INFO: 	Total aligned length: 196397433408 (1.00)
[2024-02-08 09:27:12] INFO: 	Read N50 / N90: 20063 / 12917
[2024-02-08 09:27:12] INFO: 	Alignments N50 / N90: 19557 / 11599
[2024-02-08 09:27:12] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0018 / 0.0035 / 0.0067
[2024-02-08 09:27:12] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0004 / 0.0010 / 0.0018
[2024-02-08 09:27:14] INFO: Parsing reads from COLO829.normal.aligned.hiphase.bam
[2024-02-08 09:37:33] INFO: 	Total read length: 204693733243
[2024-02-08 09:37:33] INFO: 	Total aligned length: 204676920510 (1.00)
[2024-02-08 09:37:33] INFO: 	Read N50 / N90: 21131 / 15095
[2024-02-08 09:37:33] INFO: 	Alignments N50 / N90: 20621 / 13732
[2024-02-08 09:37:33] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0017 / 0.0033 / 0.0060
[2024-02-08 09:37:33] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0004 / 0.0010 / 0.0017
[2024-02-08 09:37:35] INFO: Computing read quality
[2024-02-08 09:42:05] INFO: Annotating reads
[2024-02-08 09:47:06] INFO: Computing coverage histogram
[2024-02-08 09:50:17] INFO: 	Median coverage by PASS reads for COLO829.tumor.aligned.hiphase.bam (H1 / H2 / H0): 19.0 / 20.0 / 0.0
[2024-02-08 09:50:18] INFO: 	Median coverage by PASS reads for COLO829.normal.aligned.hiphase.bam (H1 / H2 / H0): 31.0 / 31.0 / 0.0
[2024-02-08 09:50:18] INFO: Resolving overlaps
[2024-02-08 09:50:18] INFO: Extracting split alignments
[2024-02-08 09:50:30] INFO: Extracting clipped reads
[2024-02-08 09:50:57] INFO: Clustering unmapped insertions
[2024-02-08 09:51:05] INFO: Starting breakpoint detection
[2024-02-08 09:51:39] INFO: Starting match_long_ins
[2024-02-08 09:51:39] INFO: Starting compute_bp_coverage
[2024-02-08 10:02:18] INFO: Filtering breakpoints
Traceback (most recent call last):
  File "/home/kpin/miniconda3/envs/severus/bin/severus", line 33, in <module>
    sys.exit(load_entry_point('severus==0.1.1', 'console_scripts', 'severus')())
  File "/home/kpin/miniconda3/envs/severus/lib/python3.10/site-packages/severus/main.py", line 205, in main
    double_breaks = call_breakpoints(segments_by_read, ref_lengths, coverage_histograms, bam_files, genome_ids, control_genomes, thread_pool, args)
  File "/home/kpin/miniconda3/envs/severus/lib/python3.10/site-packages/severus/breakpoint_finder.py", line 2036, in call_breakpoints
    double_breaks = double_breaks_filter(double_breaks, args.bp_min_support, cont_id)
  File "/home/kpin/miniconda3/envs/severus/lib/python3.10/site-packages/severus/breakpoint_finder.py", line 598, in double_breaks_filter
    dbs2.bp_1.spanning_reads[genome_id][ind_phased[0]] += db0.bp_1.spanning_reads[genome_id][0]
IndexError: list index out of range

real	45m39.039s
user	199m59.976s
sys	11m8.395s

Do you know why that is happening? The BAM files were aligned with pbmm2 and phased with hiphase using variants from clair3.

Thank you!

@aysegokce
Copy link
Contributor

Hello @proteinosome,

That part is updated in the new version. Can you share how you haplotag the bams?

Ayse

@proteinosome
Copy link
Author

Hi @aysegokce , Clair3 was used to call germline variants on the tumor, then used as an input to HiPhase to haplotag the BAMs.

@aysegokce
Copy link
Contributor

Hi @proteinosome,
There is a compatibility issue with HiPhase, primarily due to the difference in supplementary alignment phasing. We'll include it in the next releases.
Ayse

@aysegokce aysegokce added the enhancement New feature or request label Feb 11, 2024
@willhooper
Copy link

Hi, I'm seeing a similar issue with v01.2. My CRAMs were aligned with minimap2, germline variants called with clair3, and phasing/haplotagging was done with longphase. v0.1.1 works fine:

[2024-02-14 20:55:10] INFO: Starting Severus 0.1.1
[2024-02-14 20:55:10] INFO: Parsing reads from COLO829-T.10X.haplotagged.cram
[2024-02-14 21:01:17] INFO: 	Total read length: 28095583258
[2024-02-14 21:01:17] INFO: 	Total aligned length: 27992370914 (1.00)
[2024-02-14 21:01:17] INFO: 	Read N50 / N90: 12194 / 6901
[2024-02-14 21:01:17] INFO: 	Alignments N50 / N90: 10909 / 5562
[2024-02-14 21:01:17] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0080 / 0.0121 / 0.0229
[2024-02-14 21:01:17] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0027 / 0.0043 / 0.0080
[2024-02-14 21:01:17] INFO: Parsing reads from COLO829-N.5X.haplotagged.cram
[2024-02-14 21:04:10] INFO: 	Total read length: 8622146850
[2024-02-14 21:04:10] INFO: 	Total aligned length: 8576689196 (0.99)
[2024-02-14 21:04:10] INFO: 	Read N50 / N90: 10730 / 6269
[2024-02-14 21:04:10] INFO: 	Alignments N50 / N90: 9425 / 5052
[2024-02-14 21:04:10] INFO: 	Read error rate (Q25 / Q50 / Q75): 0.0079 / 0.0120 / 0.0227
[2024-02-14 21:04:10] INFO: 	Read mismatch rate (Q25 / Q50 / Q75): 0.0027 / 0.0043 / 0.0079
[2024-02-14 21:04:10] INFO: Computing read quality
[2024-02-14 21:06:29] INFO: Annotating reads
[2024-02-14 21:08:58] INFO: Computing coverage histogram
[2024-02-14 21:09:43] INFO: 	Median coverage by PASS reads for COLO829-T.10X.haplotagged.cram (H1 / H2 / H0): 2.0 / 1.0 / 1.0
[2024-02-14 21:09:45] INFO: 	Median coverage by PASS reads for COLO829-N.5X.haplotagged.cram (H1 / H2 / H0): 1.0 / 1.0 / 0.0
[2024-02-14 21:09:45] INFO: Resolving overlaps
[2024-02-14 21:09:45] INFO: Extracting split alignments
[2024-02-14 21:09:50] INFO: Extracting clipped reads
[2024-02-14 21:10:04] INFO: Clustering unmapped insertions
[2024-02-14 21:10:08] INFO: Starting breakpoint detection
[2024-02-14 21:12:04] INFO: Starting match_long_ins
[2024-02-14 21:12:04] INFO: Starting compute_bp_coverage
[2024-02-14 21:16:50] INFO: Filtering breakpoints
Traceback (most recent call last):
  File "//Severus/severus.py", line 30, in <module>
    main()
  File "//Severus/severus.py", line 26, in main
    sys.exit(main())
  File "/Severus/severus/main.py", line 205, in main
    double_breaks = call_breakpoints(segments_by_read, ref_lengths, coverage_histograms, bam_files, genome_ids, control_genomes, thread_pool, args)
  File "/Severus/severus/breakpoint_finder.py", line 2036, in call_breakpoints
    double_breaks = double_breaks_filter(double_breaks, args.bp_min_support, cont_id)
  File "/Severus/severus/breakpoint_finder.py", line 601, in double_breaks_filter
    to_keep.remove(ind0)
ValueError: list.remove(x): x not in list

@linsindian
Copy link

linsindian commented Feb 26, 2024

Hello,

I am Sin-Dian, a member of the Longphase team.Thank you for your contributions to somatic SV calling. Recently, while testing the COLO829 data using your software, we encountered a situation similar to the one described above. According to the error report, this issue seems to originate from lines 590 to 601 under the double_breaks_filter function. From my rough understanding of this segment, it primarily deals with a special case of unphased alignment on one side of a double break. Due to Longphase considering supplementary alignments during phasing, it's possible for the primary and supplementary alignment to appear on different haplotypes after haplotagging. This consequently increases the likelihood of entering this function.I've noticed a few potential errors as follows:

  1. In the pairs stored in val, 'a' represents the phase tag, and 'b' represents the corresponding index in the to_keep list. It seems to aim to handle unphased break points, so should it be written as:
#original:
ind_unphased = [b for (a,b) in val if a]
ind_phased = [b for (a,b) in val if  not a]

#modified:
ind_unphased = [b for (a,b) in val if not a]
ind_phased = [b for (a,b) in val if  a]
  1. In line 589, in certain situations, an "index out of range" error may occur. After comparing with the surrounding code, it seems that it should be modified to:
#original:
dbs2.bp_1.spanning_reads[genome_id][ind_phased[0]] += db0.bp_1.spanning_reads[genome_id][0]

#modified:
dbs2.bp_1.spanning_reads[genome_id][haplotype2[ind_phased[0]]] += db0.bp_1.spanning_reads[genome_id][0]
  1. In line 601, a list.remove(x): x not in list error may occur. Since ind0 is a list and list.remove() cannot directly remove the list, and ind0 will only have one element, should it be written as to_keep.remove(ind0[0])?
#original:
ind0 = [i for i, db in enumerate(dbs) if db == db0]
to_keep.remove(ind0)

#modified:
ind0 = [i for i, db in enumerate(dbs) if db == db0]
to_keep.remove(ind0[0])

I think to_keep.remove(ind0[0]) and to_keep.remove(ind_unphase[0]) can achieve the same result. If changed to the latter, there's no need to generate ind0 separately.

Here's the code after summarizing the above suggestions:

ind_unphased = [b for (a,b) in val if not a]
ind_phased = [b for (a,b) in val if a]
db0 = dbs[ind_unphased[0]]
dbs2 = dbs[ind_phased[0]]
dbs2.supp_read_ids += db0.supp_read_ids
dbs2.supp = len(set(dbs2.supp_read_ids))
db0.supp = 0
span_bp1 = db0.bp_1.spanning_reads[genome_id][0]
dbs2.bp_1.spanning_reads[genome_id][haplotype2[ind_phased[0]]] += span_bp1 
db0.bp_1.spanning_reads[genome_id][0] = 0
ind0 = [i for i, db in enumerate(dbs) if db == db0]
to_keep.remove(ind_unphased[0])

Hope these suggestions can be helpful.

Best regards,
Sin-Dian

@aysegokce
Copy link
Contributor

Hello @linsindian,
Thank you for using Severus and your suggestions! We are working on improving SV phasing and extending its use with other phasing tools. It will be a part of the next release!
Ayse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants