Eric T Dawson, Mike Lin and Charles Markello, Jean Monlong, Adam Novak MIT License, 2023
Workflow Description Language (WDL) scripts for vg workflows.
- Giraffe-DeepVariant workflows. Either the full Giraffe-DeepVariant workflow, or parts of it are available:
- Giraffe-DeepVariant workflow to perform the full workflow: starting from FASTQs/CRAM, align reads to a pangenome and run DeepVariant.
- Giraffe workflow to map reads and produce BAMs ready to use by DeepVariant.
- Giraffe-DeepVariant from GAF workflow to project reads aligned to a pangenome (GAF), prepare them and run DeepVariant.
- Happy workflow to evaluate small variants against a truthset using hap.py/vcfeval.
- GAF to sorted GAM workflow to convert a GAF into a sorted and indexed GAM. E.g. to use with the sequenceTubeMap.
- Giraffe SV workflow to map short reads to a pangenome and genotype SVs with vg.
- Haplotype Sampling workflow to create a personalized pangenome using haplotype sampling
- Map-call workflow to map reads and call small variants vg, DeepVariant and GATK (legacy?).
- Map-call Pedigree workflow to map reads and call variants in a pedigree with vg (legacy?).
See also the Going further section for more details on some aspects and HOW-TOs:
- Path list
- Read realignment
- Using the HPRC pangenomes
- Reference prefix removal
- CRAM input
- Single-end reads
- Unmapped reads
- Reads chunking
The full workflow to go from sequencing reads (FASTQs, CRAM) to small variant calls (VCF).
- workflow file: workflows/giraffe_and_deepvariant.wdl
- Dockstore page
- If you use this workflow, please cite the HPRC preprint.
Parameters (semi-auto-generated from the parameter_meta section):
- INPUT_READ_FILE_1: Input sample 1st read pair fastq.gz
- INPUT_READ_FILE_2: Input sample 2nd read pair fastq.gz
- INPUT_CRAM_FILE: Input CRAM file
- CRAM_REF: Genome fasta file associated with the CRAM file
- CRAM_REF_INDEX: Index of the fasta file associated with the CRAM file
- GBZ_FILE: Path to .gbz index file
- DIST_FILE: Path to .dist index file
- MIN_FILE: Path to .min index file
- SAMPLE_NAME: The sample name
- OUTPUT_GAF: Should a GAF file with the aligned reads be saved? Default is 'true'.
- OUTPUT_SINGLE_BAM: Should a single merged BAM file be saved? If yes, unmapped reads will be inluded and 'calling bams' (one per contig) won't be outputed. Default is 'true'.
- PAIRED_READS: Are the reads paired? Default is 'true'.
- READS_PER_CHUNK: Number of reads contained in each mapping chunk. Default 20 000 000.
- PATH_LIST_FILE: (OPTIONAL) Text file where each line is a path name in the GBZ index, to use instead of CONTIGS. If neither is given, paths are extracted from the GBZ and subset to chromosome-looking paths.
- CONTIGS: (OPTIONAL) Desired reference genome contigs, which are all paths in the GBZ index.
- REFERENCE_PREFIX: Remove this off the beginning of path names in surjected BAM (set to match prefix in PATH_LIST_FILE)
- REFERENCE_FILE: (OPTIONAL) If specified, use this FASTA reference instead of extracting it from the graph. Required if the graph does not contain all bases of the reference.
- REFERENCE_INDEX_FILE: (OPTIONAL) If specified, use this .fai index instead of indexing the reference file.
- REFERENCE_DICT_FILE: (OPTIONAL) If specified, use this pre-computed .dict file of sequence lengths. Required if REFERENCE_INDEX_FILE is set
- LEFTALIGN_BAM: Whether or not to left-align reads in the BAM. Default is 'true'.
- REALIGN_INDELS: Whether or not to realign reads near indels. Default is 'true'.
- REALIGNMENT_EXPANSION_BASES: Number of bases to expand indel realignment targets by on either side, to free up read tails in slippery regions. Default is 160.
- MIN_MAPQ: Minimum MAPQ of reads to use for calling. 4 is the lowest at which a mapping is more likely to be right than wrong. Default is 1
- MAX_FRAGMENT_LENGTH: Maximum distance at which to mark paired reads properly paired. Default is 3000.
- GIRAFFE_OPTIONS: (OPTIONAL) extra command line options for Giraffe mapper
- TRUTH_VCF: (OPTIONAL) Path to .vcf.gz to compare against
- TRUTH_VCF_INDEX: (OPTIONAL) Path to Tabix index for TRUTH_VCF
- EVALUATION_REGIONS_BED: (OPTIONAL) BED to restrict comparison against TRUTH_VCF to
- DV_MODEL_META: (OPTIONAL) .meta file for a custom DeepVariant calling model
- DV_MODEL_INDEX: (OPTIONAL) .index file for a custom DeepVariant calling model
- DV_MODEL_DATA: (OPTIONAL) .data-00000-of-00001 file for a custom DeepVariant calling model
- DV_KEEP_LEGACY_AC: Should DV use the legacy allele counter behavior? Default is 'true'.
- DV_NORM_READS: Should DV normalize reads itself? Default is 'false'.
- OTHER_MAKEEXAMPLES_ARG: Additional arguments for the make_examples step of DeepVariant
- SPLIT_READ_CORES: Number of cores to use when splitting the reads into chunks. Default is 8.
- MAP_CORES: Number of cores to use when mapping the reads. Default is 16.
- MAP_MEM: Memory, in GB, to use when mapping the reads. Default is 120.
- CALL_CORES: Number of cores to use when calling variants. Default is 8.
- CALL_MEM: Memory, in GB, to use when calling variants. Default is 50.
Related topics: read realignment, reference prefix removal, CRAM input, reads chunking, path list, single-end reads, unmapped reads, HPRC pangenomes.
Test locally with:
miniwdl run --as-me workflows/giraffe_and_deepvariant.wdl -i params/giraffe_and_deepvariant.json
miniwdl run --as-me workflows/giraffe_and_deepvariant.wdl -i params/giraffe_and_deepvariant_single_end.json
miniwdl run --as-me workflows/giraffe_and_deepvariant.wdl -i params/giraffe_and_deepvariant_cram.json
Core VG Giraffe mapping, usable for DeepVariant. Reads are mapped to a pangenome with vg giraffe and pre-processed (e.g. indel realignment).
- workflow file: workflows/giraffe.wdl
- Dockstore page
- If you use this workflow, please cite the HPRC preprint.
Parameters (semi-auto-generated from the parameter_meta section):
- INPUT_READ_FILE_1: Input sample 1st read pair fastq.gz
- INPUT_READ_FILE_2: Input sample 2nd read pair fastq.gz
- INPUT_CRAM_FILE: Input CRAM file
- CRAM_REF: Genome fasta file associated with the CRAM file
- CRAM_REF_INDEX: Index of the fasta file associated with the CRAM file
- GBZ_FILE: Path to .gbz index file
- DIST_FILE: Path to .dist index file
- MIN_FILE: Path to .min index file
- SAMPLE_NAME: The sample name
- OUTPUT_SINGLE_BAM: Should a single merged BAM file be saved? If yes, unmapped reads will be inluded and 'calling bams' (one per contig) won't be outputed. Default is 'true'.
- OUTPUT_CALLING_BAMS: Should individual contig BAMs be saved? Default is 'false'.
- OUTPUT_GAF: Should a GAF file with the aligned reads be saved? Default is 'false'.
- PAIRED_READS: Are the reads paired? Default is 'true'.
- READS_PER_CHUNK: Number of reads contained in each mapping chunk. Default 20 000 000.
- PATH_LIST_FILE: (OPTIONAL) Text file where each line is a path name in the GBZ index, to use instead of CONTIGS. If neither is given, paths are extracted from the GBZ and subset to chromosome-looking paths.
- CONTIGS: (OPTIONAL) Desired reference genome contigs, which are all paths in the GBZ index.
- REFERENCE_PREFIX: Remove this off the beginning of path names in surjected BAM (set to match prefix in PATH_LIST_FILE)
- REFERENCE_FILE: (OPTIONAL) If specified, use this FASTA reference instead of extracting it from the graph. Required if the graph does not contain all bases of the reference.
- REFERENCE_INDEX_FILE: (OPTIONAL) If specified, use this .fai index instead of indexing the reference file.
- REFERENCE_DICT_FILE: (OPTIONAL) If specified, use this pre-computed .dict file of sequence lengths. Required if REFERENCE_INDEX_FILE i
- LEFTALIGN_BAM: Whether or not to left-align reads in the BAM. Default is 'true'.
- REALIGN_INDELS: Whether or not to realign reads near indels. Default is 'true'.
- REALIGNMENT_EXPANSION_BASES: Number of bases to expand indel realignment targets by on either side, to free up read tails in slippery regions. Default is 160.
- MAX_FRAGMENT_LENGTH: Maximum distance at which to mark paired reads properly paired. Default is 3000.
- GIRAFFE_OPTIONS: (OPTIONAL) extra command line options for Giraffe mapper
- SPLIT_READ_CORES: Number of cores to use when splitting the reads into chunks. Default is 8.
- MAP_CORES: Number of cores to use when mapping the reads. Default is 16.
- MAP_MEM: Memory, in GB, to use when mapping the reads. Default is 120.
- HAPLOTYPE_SAMPLING: Whether or not to use haplotype sampling before running giraffe. Default is 'true'
- IN_DIPLOID:Whether or not to use diploid sampling while doing haplotype sampling. Has to use with Haplotype_sampling=true. Default is 'true'
- HAPL_FILE: (OPTIONAL) Path to .hapl file used in haplotype sampling
- R_INDEX_FILE: (OPTIONAL) Path to .ri file used in haplotype sampling
- IN_KFF_FILE: (OPTIONAL) Path to .kff file used in haplotype sampling
- IN_HAPLOTYPE_NUMBER: Number of generated synthetic haplotypes used in haplotype sampling. (Default: 4)
Related topics: read realignment, reference prefix removal, CRAM input, reads chunking, path list, single-end reads, unmapped reads, HPRC pangenomes, Haplotype Sampling.
Test locally with:
miniwdl run --as-me workflows/giraffe.wdl -i params/giraffe.json
miniwdl run --as-me workflows/giraffe.wdl -i params/giraffe.singleended.json
miniwdl run --as-me workflows/giraffe.wdl -i params/giraffe.singleended.cram.json
miniwdl run --as-me workflows/giraffe.wdl -i params/haplotype_sampling_and_giraffe.json
Surject a GAF and prepare the BAMs (e.g. fix names, indel realign), and call small variants with DeepVariant.
- workflow file: workflows/giraffe_and_deepvariant_fromGAF.wdl
- Dockstore page
- If you use this workflow, please cite the HPRC preprint.
Parameters (semi-auto-generated from the parameter_meta section):
- INPUT_GAF: Input gzipped GAF file
- GBZ_FILE: Path to .gbz index file
- SAMPLE_NAME: The sample name
- OUTPUT_SINGLE_BAM: Should a single merged BAM file be saved? If yes, unmapped reads will be inluded and 'calling bams' (one per contig) won't be outputed. Default is 'true'.
- PAIRED_READS: Are the reads paired? Default is 'true'.
- PATH_LIST_FILE: (OPTIONAL) Text file where each line is a path name in the GBZ index, to use instead of CONTIGS. If neither is given, paths are extracted from the GBZ and subset to chromosome-looking paths.
- CONTIGS: (OPTIONAL) Desired reference genome contigs, which are all paths in the GBZ index.
- REFERENCE_PREFIX: Remove this off the beginning of path names in surjected BAM (set to match prefix in PATH_LIST_FILE)
- REFERENCE_FILE: (OPTIONAL) If specified, use this FASTA reference instead of extracting it from the graph. Required if the graph does not contain all bases of the reference.
- REFERENCE_INDEX_FILE: (OPTIONAL) If specified, use this .fai index instead of indexing the reference file.
- REFERENCE_DICT_FILE: (OPTIONAL) If specified, use this pre-computed .dict file of sequence lengths. Required if REFERENCE_INDEX_FILE is set
- LEFTALIGN_BAM: Whether or not to left-align reads in the BAM. Default is 'true'.
- REALIGN_INDELS: Whether or not to realign reads near indels. Default is 'true'.
- REALIGNMENT_EXPANSION_BASES: Number of bases to expand indel realignment targets by on either side, to free up read tails in slippery regions. Default is 160.
- MIN_MAPQ: Minimum MAPQ of reads to use for calling. 4 is the lowest at which a mapping is more likely to be right than wrong. Default is 1
- MAX_FRAGMENT_LENGTH: Maximum distance at which to mark paired reads properly paired. Default is 3000.
- DV_MODEL_META: (OPTIONAL) .meta file for a custom DeepVariant calling model
- DV_MODEL_INDEX: (OPTIONAL) .index file for a custom DeepVariant calling model
- DV_MODEL_DATA: (OPTIONAL) .data-00000-of-00001 file for a custom DeepVariant calling model
- DV_KEEP_LEGACY_AC: Should DV use the legacy allele counter behavior? Default is 'true'.
- DV_NORM_READS: Should DV normalize reads itself? Default is 'fasle'.
- OTHER_MAKEEXAMPLES_ARG: Additional arguments for the make_examples step of DeepVariant
- VG_CORES: Number of cores to use when projecting the reads. Default is 16.
- VG_MEM: Memory, in GB, to use when projecting the reads. Default is 120.
- CALL_CORES: Number of cores to use when calling variants. Default is 8.
- CALL_MEM: Memory, in GB, to use when calling variants. Default is 50.
Related topics: read realignment, reference prefix removal, path list, single-end reads, unmapped reads, HPRC pangenomes.
Test locally with:
miniwdl run --as-me workflows/giraffe_and_deepvariant_fromGAF.wdl -i params/giraffe_and_deepvariant_gaf.json
miniwdl run --as-me workflows/giraffe_and_deepvariant_fromGAF.wdl -i params/giraffe_and_deepvariant_gaf_single_end.json
Evaluation of the small variant calls using hap.py.
- workflow file: workflows/happy_evaluation.wdl
- Dockstore page
Parameters (semi-auto-generated from the parameter_meta section):
- VCF: bgzipped VCF with variant calls
- VCF_INDEX: (Optional) If specified, use this tabix index for the VCF instead of indexing it
- TRUTH_VCF: bgzipped VCF with truthset
- TRUTH_VCF_INDEX: (Optional) If specified, use this index for the truth VCF instead of indexing it
- REFERENCE_FILE: Use this FASTA reference.
- REFERENCE_INDEX_FILE: (Optional) If specified, use this .fai index instead of indexing the reference file.
- EVALUATION_REGIONS_BED: (Optional) BED to restrict comparison against TRUTH_VCF to
- REFERENCE_PREFIX: (Optional) Remove this off the beginning of sequence names in the VCF
Test locally with:
miniwdl run --as-me workflows/happy_evaluation.wdl -i params/happy_evaluation.json
Currently, only GAM file can be sorted and indexed, for example to extract and subgraph and visualize, or use with the sequenceTubeMap. This workflow converts reads aligned to a pangenome in a GAF file to a sorted and indexed GAM file.
- workflow file: workflows/sort_graph_aligned_reads.wdl
- Dockstore page
Parameters (semi-auto-generated from the parameter_meta section):
- GAF_FILE: GAF file to convert and sort.
- GBZ_FILE: the GBZ index of the graph
- SAMPLE_NAME: (Optional) a sample name
Related topics: HPRC pangenomes.
Test locally with:
miniwdl run --as-me workflows/sort_graph_aligned_reads.wdl -i params/sort_graph_aligned_reads.gaf.json
Workflow for mapping short reads and genotyping the structural variants in a pangenome.
- workflow file
- parameter file
- Dockstore page
- If you use this workflow, please cite the Giraffe-SV article.
Workflow for creating a personalized pangenome with haplotype sampling.
Parameters (semi-auto-generated from the parameter_meta section):
- IN_GBZ_FILE: Path to .gbz index file
- INPUT_READ_FILE_FIRST: Input sample 1st read pair fastq.gz
- INPUT_READ_FILE_SECOND: Input sample 2st read pair fastq.gz
- HAPL_FILE: Path to .hapl file
- IN_DIST_FILE: Path to .dist file
- R_INDEX_FILE: Path to .ri file
- KFF_FILE: Path to .kff file
- IN_OUTPUT_NAME_PREFIX: Name of the output file (Default: haplotype_sampled_graph)
- IN_KMER_LENGTH: Size of kmer using for sampling (Up to 31) (Default: 29)
- CORES: Number of cores to use with commands. (Default: 16)
- WINDOW_LENGTH: Window length used for building the minimizer index. (Default: 11)
- SUBCHAIN_LENGTH: Target length (in bp) for subchains. (Default: 10000)
- HAPLOTYPE_NUMBER: Number of generated synthetic haplotypes. (Default: 4)
- PRESENT_DISCOUNT: Multiplicative factor for discounting scores for present kmers. (Default: 0.9)
- HET_ADJUST: Additive term for adjusting scores for heterozygous kmers. (Default: 0.05)
- ABSENT_SCORE: Score for absent kmers. (Default: 0.8)
- INCLUDE_REFERENCE: Include reference paths and generic paths from the full graph in the sampled graph. (Default: true)
- DIPLOID: Activate diploid sampling. (Default: true)
Test locally with:
miniwdl run --as-me workflows/haplotype_sampling.wdl -i params/haplotype_sampling.json
- workflow file
- parameter file
- If you use this workflow, please cite the Pedigree-VG article.
See below more information about: read realignment, reference prefix removal, CRAM input, reads chunking, path list, single-end reads, unmapped reads, HPRC pangenomes.
Once the reads are projected to a linear reference, we've noticed that realigning the reads can improve the variant calling with DeepVariant. This helps mostly for the small insertions-deletions (indels).
The full realignment process involves:
- Leftaligning the reads
with freebayes'
bamleftalign
.- Can be enabled/disabled with the
LEFTALIGN_BAM
parameter
- Can be enabled/disabled with the
- Identifying regions to realign further with GATK RealignerTargetCreator.
- Expand those regions with bedtools.
- Number of bases to expand controlled by the
REALIGNMENT_EXPANSION_BASES
parameter.
- Number of bases to expand controlled by the
- Realigning the reads in those regions with ABRA2.
The last 3 steps can be enabled/disabled with the REALIGN_INDELS
parameter.
Although it produces the best variant calls, these extra steps increase the computational resources (and cost) of the workflow. For a lighter run, switch off those two realignment steps and use DeepVariant's integrated realigner instead with:
LEFTALIGN_BAM=false
REALIGN_INDELS=false
DV_NORM_READS=true
The names of contigs/paths/haplotypes in pangenomes sometimes contains a prefix that we'd want to remove. In the HPRC pangenomes, for example, the chromosomal contigs from GRCh38 are named GRCh38.chr1, etc. In practice, we want to remove this prefix from the variant calls (VCFs), or reads aligned to that reference (BAMs).
This is controlled by the REFERENCE_PREFIX
parameters in the workflows.
Setting REFERENCE_PREFIX="GRCh38."
for example will ensure the VCFs/BAMs have chr1, etc. for contig names.
Because the pangenome uses them, the prefix must still be present when specifying the paths to project the reads too
though.
Hence, the CONTIGS
and PATH_LIST_FILE
must use the prefix.
However, provided reference FASTAs or dictionary must not have the prefix.
These could be FASTA or .dict
files from the "official" reference genome or pre-computed for them, hence no prefix.
So, no prefix in REFERENCE_FILE
, REFERENCE_INDEX_FILE
, REFERENCE_DICT_FILE
.
When the input is a CRAM file (INPUT_CRAM_FILE
) instead of a pair of FASTQ
files (INPUT_READ_FILE_1
/INPUT_READ_FILE_2
), the user must also provide the appropriate reference FASTA to work with
that CRAM file with CRAM_REF
, and its index with CRAM_REF_INDEX
.
The CRAM file will be converted back to a pair of FASTQs, so it costs a little bit more to analyze CRAMs than FASTQs currently.
Sequencing reads are chunked to parallelize read mapping.
The amount of chunking is controlled by the READS_PER_CHUNK
parameter which specify how many reads each chunk should
have.
For a WGS experiment, we use chunks of about 20M reads.
We might not always want to project the reads alignments to all the paths in the pangenome. For example, we might only care about alignment to chromosomes and not alternate contigs. Or there might be multiple sets of paths like in the CHM13-based HPRC pangenome which contains both reference paths for CHM13 and GRCh38. In that case, we can specify a list of paths to project the reads to using one of the following.
PATH_LIST_FILE
is a file which lists the paths names, one per line.
For the HPRC pangenomes it looks like:
GRCh38.chr1
GRCh38.chr2
GRCh38.chr3
...etc
Otherwise, paths can be listed in the CONTIGS
parameter as a list (WDL array).
Workflows expect paired-end reads, but some workflows can also analyze single-end reads.
To use single-end reads:
- If providing FASTQs, only provide
INPUT_READ_FILE_1
(noINPUT_READ_FILE_2
). - Use
PAIRED_READS=false
If including unmapped reads in the BAMs is important, make sure to switch on OUTPUT_SINGLE_BAM=true
in
the Giraffe-DeepVariant workflow and Giraffe workflow.
We recommend using the filtered CHM13-based pangenome (freeze 1). It contains both the CHM13 and GRCh38 reference paths.
Use the following indexes for the pangenome:
GBZ_FILE
: hprc-v1.0-mc-chm13-minaf.0.1.gbz- GBZ with the pangenome and haplotypes.
MIN_FILE
: hprc-v1.0-mc-chm13-minaf.0.1.min- Minimizer index.
DIST_FILE
: hprc-v1.0-mc-chm13-minaf.0.1.dist- Distance index.
To project reads and call variants relative to the GRCh38 reference:
REFERENCE_PREFIX="GRCh38."
PATH_LIST_FILE
containing GRCh38.chr1, GRCh38.chr2, etc. File available at GRCh38.path_list.txtREFERENCE_FILE
: hg38.faREFERENCE_INDEX_FILE
: hg38.fa.fai. Optional, the workflow will create it if necessary (for a small extra cost/time).REFERENCE_DICT_FILE
: hg38.dict. Optional, the workflow will create it if necessary (for a small extra cost/time).
To project reads and call variants relative to the CHM13 reference:
REFERENCE_PREFIX="CHM13."
PATH_LIST_FILE
containing CHM13.chr1, CHM13.chr2, etc. File available at CHM13.path_list.txtREFERENCE_FILE
: chm13v2.0.plus_hs38d1_analysis_set.compact_decoys.faREFERENCE_INDEX_FILE
: chm13v2.0.plus_hs38d1_analysis_set.compact_decoys.fa.fai. Optional, the workflow will create it if necessary (for a small extra cost/time).REFERENCE_DICT_FILE
: chm13v2.0.plus_hs38d1_analysis_set.compact_decoys.dict. Optional, the workflow will create it if necessary (for a small extra cost/time).
For earlier versions of DeepVariant (<1.5), models were retrained using reads aligned to the HPRC pangenomes.
The corresponding model files were deposited
at: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=publications/PANGENOME_2022/DeepVariant/models/DEEPVARIANT_MC_Y1/.
They can be passed to the workflows using the DV_MODEL_META
, DV_MODEL_INDEX
, and DV_MODEL_DATA
.
Note that it is not necessary to use custom models in the latest version of the workflows as DeepVariant v1.5
includes default models suited for analyzing reads mapped to pangenomes (and projected back to a linear reference).
The workflows that were deposited on Dockstore can be launched using its command line or on platform like Terra.
Install miniwdl, for example,
with pip
:
pip3 install miniwdl
Clone this repo somewhere with git clone https://github.com/vgteam/vg_wdl.git
Run a workflow using:
miniwdl run /path/to/vg_wdl/workflows/WORKFLOW.wdl -i your-inputs.json
To modify the input parameters, edit the input .json
with the necessary changes.
Cromwell can be run WDL workflows with:
java -jar $CROMWELL_JAR run workflow.wdl -i inputs.json
Where CROMWELL_JAR points at the Cromwell jar
downloaded their release page, for example set
with CROMWELL_JAR=/path/to/cromwell-<whatever>.jar
in your shell.
To run one of the workflows in this repo, clone the repo somewhere with git clone https://github.com/vgteam/vg_wdl.git
and run the desired workflow .wdl
file:
java -jar $CROMWELL_JAR run /path/to/vg_wdl/workflows/WORKFLOW.wdl -i inputs.json
WDL needs the runtime Docker image to be present online (e.g. Dockerhub). Cromwell/miniwdl will pull those images automatically. VG images are available at quay.io and can be pulled with:
docker pull quay.io/vgteam/vg:v1.44.0
Specific versions can be specified like above for version v1.44.0
.
To test the workflow locally, e.g. on the small simulated dataset, you can run it with Cromwell or miniwdl (see Usage). So, from the root of this repo, run something like:
java -jar $CROMWELL_JAR run workflows/WORKFLOW.wdl -i params/INPUTS.json
## or
miniwdl run --as-me workflows/WORKFLOW.wdl -i params/INPUTS.json
Miniwdl might be slightly more useful when developing/testing a WDL because is catches errors in WDL syntax faster, and is a bit more explicit about them.
If you use the Giraffe-DeepVariant workflows, please cite the HPRC preprint:
Liao, Asri, Ebler, et al. A Draft Human Pangenome Reference. preprint, bioRxiv 2022; doi: https://doi.org/10.1101/2022.07.09.499321
If you use the SV genotyping workflow with vg giraffe, please cite this article:
Sirén, Monlong, Chang, Novak, Eizenga, et al. Pangenomics Enables Genotyping of Known Structural Variants in 5202 Diverse Genomes. Science, vol. 374, no. 6574, Dec. 2021; doi: https://doi.org/10.1126/science.abg8871.
If you use the pedigree-based workflow for rare variant discovery, please cite this article:
Markello et al. A Complete Pedigree-Based Graph Workflow for Rare Candidate Variant Analysis. Genome Research, Apr. 2022; doi: https://doi.org/10.1101/gr.276387.121.
Please open an Issue on GitHub for help, bug reports, or feature requests. When doing so, please remember that vg_wdl is open-source software made by a community of developers. Please be considerate and support a positive environment.