Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatability of minimap2 #1

Closed
jamesc99 opened this issue Aug 22, 2024 · 5 comments
Closed

compatability of minimap2 #1

jamesc99 opened this issue Aug 22, 2024 · 5 comments

Comments

@jamesc99
Copy link

Hi there,

Thanks for developing this tool!

I was trying to run minimap2-aligned BAM files with Sawfish and encountered an error saying 'no rq tag'. I checked PacBio's doc and found it is the tag describing the read expected accuracy.

I am wondering if I can run a minimap2-aligned BAM file on Sawfish, like adding specific parameter, etc.

Thank you!
Ryan

Error log file:

[2024-08-21][16:53:50][sawfish][INFO] Starting sawfish
[2024-08-21][16:53:50][sawfish][INFO] cmdline: sawfish discover --threads 8 --bam ../../../test_data/hg002_pacbio_minimap2.bam --ref /users/u250191/ryan_scratch_ln/reference/human-grch38.fasta
[2024-08-21][16:53:50][sawfish][INFO] Running on 8 threads
[2024-08-21][16:53:51][sawfish][INFO] Writing discover settings to file: 'sawfish_discover_output/discover.settings.json'
[2024-08-21][16:53:51][sawfish][INFO] Reading reference genome from file '/users/u250191/ryan_scratch_ln/reference/human-grch38.fasta'
[2024-08-21][16:55:18][sawfish][INFO] Processing alignment file '../../../test_data/hg002_pacbio_minimap2.bam'
[2024-08-21][17:00:18][sawfish][INFO] Processed alignments on 1,852,004 of 3,088,286 ref genome kb (59%)
[2024-08-21][17:03:32][sawfish][INFO] Finished processing all alignments
[2024-08-21][17:03:32][sawfish][INFO] Writing max sv depth genome regions to bed file: 'sawfish_discover_output/max.depth.bed'
[2024-08-21][17:03:32][sawfish][INFO] Getting depth bin GC content
[2024-08-21][17:03:36][sawfish][INFO] Getting GC depth correction levels
[2024-08-21][17:03:36][sawfish][INFO] Completing breakpoint observation clustering
[2024-08-21][17:03:37][sawfish][INFO] Finished breakpoint observation clustering
[2024-08-21][17:03:37][sawfish][INFO] Consolidating adjacent indel candidates
[2024-08-21][17:03:37][sawfish][INFO] Finding large insertion candidates
[2024-08-21][17:03:37][sawfish][INFO] Writing breakpoint_cluster debug bed file: 'sawfish_discover_output/debug.breakpoint_clusters.bed'
[2024-08-21][17:03:37][sawfish][INFO] Refining SV Candidates
[2024-08-21][17:03:37][sawfish][INFO] Writing debug assembly region bed file: 'sawfish_discover_output/assembly.regions.bed'
[2024-08-21][17:03:37][sawfish][INFO] Starting refinement of breakpoint evidence clusters 
thread '<unnamed>' panicked at src/bam_utils.rs:412:5:
Missing rq tag in read m54329_180926_230856/7012789/ccs
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/var/spool/slurm/d/job538394/slurm_script: line 34: 805518 Aborted                 (core dumped) sawfish discover --threads 8 --bam $input_bam --ref $ref38

codes for minimap2 mapping:
$minimap -Y -ax map-hifi -R "@RG\tSM:${INFOR}\tID:${RG_ID}" $ref38 $FASTQ_FILE --MD -t 8

@ctsa
Copy link
Member

ctsa commented Aug 22, 2024

Hi Thanks for reporting this issue.

I should first note that we strongly recommend pbmm2 for all sawfish analysis, as discussed in a bit more detail here:

https://github.com/PacificBiosciences/sawfish/blob/main/docs/user_guide.md#read-mapper

It may be possible to use minimap2 with the appropriate flags to prevent hard-clipping but this is untested/unsupported.

The good news for this specific issue with the rq tag is we've just released an update (0.12.2) yesterday to relax this requirement, so a version update should take care of this specific error.

@zengxi-hada
Copy link

zengxi-hada commented Aug 23, 2024

@zengxi-hada
Copy link

zengxi-hada commented Aug 25, 2024

I just runned version 0.12.2, but get new errors:
thread '' panicked at src/refine_sv/trimmed_reads.rs:197:5: assertion failed: !is_hard_clipped(&record.cigar())

Below is the full running log:

[2024-08-25][09:31:28][sawfish][INFO] Starting sawfish
[2024-08-25][09:31:28][sawfish][INFO] cmdline: /home/xiz978/bin/BCH/sawfish-v0.12.2-x86_64-unknown-linux-gnu/bin/sawfish discover --threads 16 --ref /n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta --bam /n/storage_test/xiz978/Wangziying/HG002/hifi/HG002_aligned_GRCh38_winnowmap.sorted.bam --output-dir HG002_discover_dir
[2024-08-25][09:31:28][sawfish][INFO] Running on 16 threads
[2024-08-25][09:31:29][sawfish][INFO] Writing discover settings to file: 'HG002_discover_dir/discover.settings.json'
[2024-08-25][09:31:29][sawfish][INFO] Reading reference genome from file '/n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta'
[2024-08-25][09:31:58][sawfish][INFO] Processing alignment file '/n/storage_test/xiz978/Wangziying/HG002/hifi/HG002_aligned_GRCh38_winnowmap.sorted.bam'
[2024-08-25][09:36:58][sawfish][INFO] Processed alignments on 1,181,630 of 3,099,922 ref genome kb (38%)
[2024-08-25][09:42:05][sawfish][INFO] Processed alignments on 2,595,202 of 3,099,922 ref genome kb (83%)
[2024-08-25][09:43:24][sawfish][INFO] Finished processing all alignments
[2024-08-25][09:43:24][sawfish][INFO] Writing max sv depth genome regions to bed file: 'HG002_discover_dir/max.depth.bed'
[2024-08-25][09:43:24][sawfish][INFO] Getting depth bin GC content
[2024-08-25][09:43:27][sawfish][INFO] Getting GC depth correction levels
[2024-08-25][09:43:27][sawfish][INFO] Completing breakpoint observation clustering
[2024-08-25][09:43:28][sawfish][INFO] Finished breakpoint observation clustering
[2024-08-25][09:43:28][sawfish][INFO] Consolidating adjacent indel candidates
[2024-08-25][09:43:28][sawfish][INFO] Finding large insertion candidates
[2024-08-25][09:43:28][sawfish][INFO] Writing breakpoint_cluster debug bed file: 'HG002_discover_dir/debug.breakpoint_clusters.bed'
[2024-08-25][09:43:28][sawfish][INFO] Refining SV Candidates
[2024-08-25][09:43:28][sawfish][INFO] Writing debug assembly region bed file: 'HG002_discover_dir/assembly.regions.bed'
[2024-08-25][09:43:28][sawfish][INFO] Starting refinement of breakpoint evidence clusters
thread '<unnamed>' panicked at src/refine_sv/trimmed_reads.rs:197:5:
assertion failed: !is_hard_clipped(&record.cigar())
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/var/spool/slurmd/job45032830/slurm_script: line 9: 17052 Aborted                 /home/xiz978/bin/BCH/sawfish-v0.12.2-x86_64-unknown-linux-gnu/bin/sawfish discover --threads 16 --ref /n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta --bam /n/storage_test/xiz978/Wangziying/HG002/hifi/HG002_aligned_GRCh38_winnowmap.sorted.bam --output-dir HG002_discover_dir
[2024-08-25][09:43:41][sawfish][INFO] Starting sawfish
[2024-08-25][09:43:41][sawfish][INFO] cmdline: /home/xiz978/bin/BCH/sawfish-v0.12.2-x86_64-unknown-linux-gnu/bin/sawfish joint-call --threads 16 --sample HG002_discover_dir --output-dir HG002_joint_call_dir
[2024-08-25][09:43:41][sawfish][INFO] Running on 16 threads
[2024-08-25][09:43:41][sawfish][INFO] Reading reference genome from file '/n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta'
[2024-08-25][09:44:10][sawfish][INFO] Reading all sample discovery input
thread '<unnamed>' panicked at /var/lib/bamboo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/unwrap-1.2.1/src/lib.rs:55:25:

@ctsa
Copy link
Member

ctsa commented Aug 26, 2024

Thanks @zengxi-hada,

I will take a note to make a more clear error message. The error indicates that your input bam uses hard-clipping for split reads, which sawfish doesn't support at this time. As enumerated above, we support and test the SV caller for pbmm2 only. It may be possible to get the analysis working with minimap, but at minimum you would need to change the settings to prevent hard-clipping of split reads, and even for that case minimap is not tested so there may be additional complications, I would recommend using pbmm2, some discussion on this in the user guide can be found here:

https://github.com/PacificBiosciences/sawfish/blob/main/docs/user_guide.md#read-mapper

@ctsa
Copy link
Member

ctsa commented Aug 26, 2024

Closing the original issues as answered, tracking msg improvement in #2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants