-
Notifications
You must be signed in to change notification settings - Fork 12
Tutorial
- CPU: 8vCPUs
- Memory: 40GByte RAM
- Free space: 500GByte
Use the following command to pull the required singularity images.
mkdir $PWD/image
singularity pull $PWD/image/sra-tools_3.0.0.sif docker://ncbi/sra-tools:3.0.0
singularity pull $PWD/image/minimap2_2.17.sif docker://aokad/minimap2:2.17
singularity pull $PWD/image/nanomonsv_v0.5.0.sif docker://friend1ws/nanomonsv:v0.5.0
Use the following command to download the reference.
mkdir $PWD/reference
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta \
-O $PWD/reference/Homo_sapiens_assembly38.fasta
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.fai \
-O $PWD/reference/Homo_sapiens_assembly38.fasta.fai
The Oxford Nanopore Sequencing data used in the bioRxiv paper is available through the public sequence repository service (BioProject ID: PRJDB10898):
SRA-toolkit is a set of tools for working with data registered in the SRA.
Use the following command to download sequence data.
COLO829 (14.5 hours)
mkdir -p $PWD/fastq/COLO829
singularity exec $PWD/image/sra-tools_3.0.0.sif fasterq-dump -e 8 -O $PWD/fastq/COLO829 DRR258589
COLO829BL (13 hours)
mkdir $PWD/fastq/COLO829BL
singularity exec $PWD/image/sra-tools_3.0.0.sif fasterq-dump -e 8 -O $PWD/fastq/COLO829BL DRR258590
Next, alignment with minimap2 and sorting with samtools.
COLO829 (6 hours)
mkdir -p $PWD/bam/COLO829/
singularity exec $PWD/image/minimap2_2.17.sif sh -c \
"minimap2 -ax map-ont -t 8 -p 0.1 $PWD/reference/Homo_sapiens_assembly38.fasta $PWD/fastq/COLO829/DRR258589.fastq \
| samtools view -Shb > $PWD/bam/COLO829/COLO829.unsorted.bam && \
samtools sort -@ 8 -m 2G $PWD/bam/COLO829/COLO829.unsorted.bam -o $PWD/bam/COLO829/COLO829.bam && \
samtools index $PWD/bam/COLO829/COLO829.bam"
COLO829BL (5 hours)
mkdir -p $PWD/bam/COLO829BL/
singularity exec $PWD/image/minimap2_2.17.sif sh -c \
"minimap2 -ax map-ont -t 8 -p 0.1 $PWD/reference/Homo_sapiens_assembly38.fasta $PWD/fastq/COLO829BL/DRR258590.fastq \
| samtools view -Shb > $PWD/bam/COLO829BL/COLO829BL.unsorted.bam && \
samtools sort -@ 8 -m 2G $PWD/bam/COLO829BL/COLO829BL.unsorted.bam -o $PWD/bam/COLO829BL/COLO829BL.bam && \
samtools index $PWD/bam/COLO829BL/COLO829BL.bam"
Remove temporary files.
rm -r $PWD/fastq/
rm $PWD/bam/COLO829/COLO829.unsorted.bam
rm $PWD/bam/COLO829BL/COLO829BL.unsorted.bam
We prepared a control panel that has been created using the 30 Nanopore sequencing data from the Human Pangenome Reference Consortium, which you can download by the following command:
mkdir -p $PWD/control_panel
wget https://zenodo.org/api/files/08b52270-9f9b-47bd-b03d-81f5859d676f/hprc_year1_data_freeze_nanopore_guppy4_minimap2_2_24_merge_control_GRCh38.tar.gz -O $PWD/control_panel/hprc_year1_data_freeze_nanopore_guppy4_minimap2_2_24_merge_control_GRCh38.tar.gz
tar -xvf $PWD/control_panel/hprc_year1_data_freeze_nanopore_guppy4_minimap2_2_24_merge_control_GRCh38.tar.gz -C $PWD/control_panel/
This control panel is made by aligning 36 Nanopore sequencing data to the GRCh38 reference genome (obtained from here) with minimap2 version 2.24. When you use these control panels and publish, do not forget to credit to HPRC!
This step parses all the supporting reads of putative somatic SVs.
COLO829 (1 hour)
singularity exec $PWD/image/nanomonsv_v0.5.0.sif \
nanomonsv parse \
$PWD/bam/COLO829/COLO829.bam \
$PWD/output/COLO829/COLO829
COLO829BL (1 hour)
singularity exec $PWD/image/nanomonsv_v0.5.0.sif \
nanomonsv parse \
$PWD/bam/COLO829BL/COLO829BL.bam \
$PWD/output/COLO829BL/COLO829BL
After successful completion, you will find supporting reads stratified by deletions, insertions, and rearrangements:
$PWD/output/
|- COLO829/
| |- COLO829.deletion.sorted.bed.gz
| |- COLO829.insertion.sorted.bed.gz
| |- COLO829.rearrangement.sorted.bedpe.gz
| |- COLO829.bp_info.sorted.bed.gz
| |- COLO829.bp_info.sorted.bed.gz.tbi
|
|- COLO829BL/
|- COLO829BL.deletion.sorted.bed.gz
|- {output_prefix}.insertion.sorted.bed.gz
|- {output_prefix}.rearrangement.sorted.bedpe.gz
|- {output_prefix}.bp_info.sorted.bed.gz
|- {output_prefix}.bp_info.sorted.bed.gz.tbi
This step gets the SV result from the parsed supporting reads data obtained above.
COLO829BL and COLO829BL (1.5 hours)
singularity exec $PWD/image/nanomonsv_v0.5.0.sif \
nanomonsv \
get \
$PWD/output/COLO829/COLO829 \
$PWD/bam/COLO829/COLO829.bam \
$PWD/reference/Homo_sapiens_assembly38.fasta \
--control_prefix $PWD/output/COLO829BL/COLO829BL \
--control_bam $PWD/bam/COLO829BL/COLO829BL.bam \
--processes 8 \
--single_bnd \
--use_racon \
--control_panel_prefix $PWD/control_panel/hprc_year1_data_freeze_nanopore_guppy4_minimap2_2_24_merge_control_GRCh38/hprc_year1_data_freeze_nanopore_guppy4_minimap2_2_24_merge_control_GRCh38
After successful execution, you will be able to find the result file names as $PWD/output/COLO829/COLO829.nanomonsv.result.txt
.
One of the most effective filters is removing insertions and deletions confined in simple repeat regions. For that, the user needs to prepare the bgzip'ed and tabix'ed simple repeat bed file as follows:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/simpleRepeat.txt.gz
zcat simpleRepeat.txt.gz | cut -f 2-4 | sort -k1,1 -k2,2n -k3,3n > simpleRepeat.bed
bgzip -c simpleRepeat.bed > simpleRepeat.bed.gz
tabix -p bed simpleRepeat.bed.gz
Then,
wget https://raw.githubusercontent.com/friend1ws/nanomonsv/master/misc/add_simple_repeat.py
singularity exec $PWD/image/nanomonsv_v0.5.0.sif \
python3 add_simple_repeat.py \
$PWD/output/COLO829/COLO829.nanomonsv.result.txt \
$PWD/output/COLO829/COLO829.nanomonsv.result.filt.txt \
simpleRepeat.bed.gz
Now, indels confined within simple repeat are labeled as "Simple_repeat" in COLO829.nanomonsv.result.filt.txt file. You can create a file that includes only SVs that passed every filter checks as follows:
head -n 1 COLO829.nanomonsv.result.filt.txt > COLO829.nanomonsv.result.filt.pass.txt
tail -n +2 COLO829.nanomonsv.result.filt.txt | grep PASS >> COLO829.nanomonsv.result.filt.pass.txt