-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add doc and data updates for v0.1.7 release
- Loading branch information
Showing
17 changed files
with
161 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -o nounset | ||
|
||
# | ||
# This script yields a recommended set of exclusion regions for CNV calling on hs37d5, by | ||
# converting excluded regions from hg19. | ||
# | ||
hg19_excluded_regions=cnv.excluded_regions.hg19.bed.gz | ||
|
||
# This script depends on bgzip and tabix, customize these values if they are not already in the path | ||
bgzip=bgzip | ||
tabix=tabix | ||
|
||
hg19_renamed() { | ||
gzip -dc $hg19_excluded_regions |\ | ||
sed s/^chr// |\ | ||
awk '$1~/^([1-2]?[0-9]|[XY]|MT)$/' | ||
} | ||
|
||
other() { | ||
wget -O - http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz.fai |\ | ||
awk '$1!~/^([1-2]?[0-9]|[XY]|MT)$/ {printf "%s\t0\t%s\tother\n",$1,$2}' | ||
} | ||
|
||
|
||
label=cnv.excluded_regions.hs37d5 | ||
|
||
cat <(hg19_renamed) <(other) | $bgzip -c >| $label.bed.gz | ||
$tabix -p bed $label.bed.gz | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
chrX 0 2699520 chrX_PAR_1 2 | ||
chrX 2699520 154931043 chrX_uniq_1 2 | ||
chrX 154931043 155260560 chrX_PAR_2 2 | ||
chrY 0 2649520 chrY_PAR_1 0 | ||
chrY 2649520 59034049 chrY_uniq_1 0 | ||
chrY 59034049 59363566 chrY_PAR_2 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
chrX 0 2699520 chrX_PAR_1 2 | ||
chrX 2699520 154931043 chrX_uniq_1 1 | ||
chrX 154931043 155260560 chrX_PAR_2 2 | ||
chrY 0 2649520 chrY_PAR_1 0 | ||
chrY 2649520 59034049 chrY_uniq_1 1 | ||
chrY 59034049 59363566 chrY_PAR_2 0 |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
X 0 2699520 X_PAR_1 2 | ||
X 2699520 154931043 X_uniq_1 2 | ||
X 154931043 155260560 X_PAR_2 2 | ||
Y 0 2649520 Y_PAR_1 0 | ||
Y 2649520 59034049 Y_uniq_1 0 | ||
Y 59034049 59363566 Y_PAR_2 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
X 0 2699520 X_PAR_1 2 | ||
X 2699520 154931043 X_uniq_1 1 | ||
X 154931043 155260560 X_PAR_2 2 | ||
Y 0 2649520 Y_PAR_1 0 | ||
Y 2649520 59034049 Y_uniq_1 1 | ||
Y 59034049 59363566 Y_PAR_2 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,30 @@ | ||
# Output files | ||
All outputs are based on the provided `{OUTPUT_PREFIX}` and a inferred `{sample_name}`. | ||
All outputs are based on the provided `{OUTPUT_PREFIX}` and an inferred `{sample_name}`. | ||
The `{sample_name}` is extracted from the alignment file. | ||
The name is taken from the first `@RG` tag in the alignment file header including a sample name. | ||
|
||
## Primary outputs | ||
* `{OUTPUT_PREFIX}.{sample_name}.vcf.gz` - the primary VCF output containing copy number variant calls for the sample | ||
* `{OUTPUT_PREFIX}.{sample_name}.depth.bw` - a bigwig file containing the depth measurements | ||
* `{OUTPUT_PREFIX}.{sample_name}.depth.bw` - a bigwig depth track | ||
* `{OUTPUT_PREFIX}.{sample_name}.copynum.bedgraph` - the copy number values calculated for each region | ||
|
||
## Secondary outputs | ||
* `{OUTPUT_PREFIX}.log` - the log file generated from running HiFiCNV | ||
* `{OUTPUT_PREFIX}.{sample_name}.maf.bw` - a bigwig file containing the minor allele frequency measurements, only generated if a VCF file is provided | ||
|
||
## Debug outputs | ||
Additional outputs related to GC correction can be obtained with the `--debug-gc-correction` option, these are debug | ||
outputs and may change in future updates: | ||
* `{OUTPUT_PREFIX}.gc_frac.bw` - A bigwig track of GC fraction windows (from the reference sequence) shared across all | ||
samples. | ||
* `{OUTPUT_PREFIX}.{sample_name}.gc_scaled_depth.bw` - A bigwig depth track, similar to the standard bigwig depth output | ||
except that all depths are scaled by their region's GC correction factor. Note that the internal segmentation model uses | ||
GC correction factors directly instead of these adjusted depths, so these depths are only used for visualization. | ||
* `{OUTPUT_PREFIX}.{sample_name}.gc_correction_table.tsv` - Sample GC correction factors as a function of GC fraction | ||
* `{OUTPUT_PREFIX}.{sample_name}.gc_reduction_factor.bw` - A bigwig track of sample GC correction factors by region | ||
|
||
## VCF notes | ||
HiFiCNV follows VCF format specification 4.2. | ||
The `QUAL` field is reported as an average of the next-most-likely copy-number state for each bin from the HMM (see Methods). | ||
It also includes a `TARGET_SIZE` filter flag for events that are smaller than 100kbp. | ||
This filter can be disabled using the `--disable-vcf-filters` option. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters