workflow/kfdrc_RNAseq_workflow.cwl

cwlVersion: v1.2
class: Workflow
id: kfdrc-rnaseq-workflow
label: Kids First DRC RNAseq Workflow
doc: |
  # Kids First RNA-Seq Workflow V4

  This is the Kids First RNA-Seq pipeline, which calculates gene and transcript isoform expression, detects fusions and splice junctions.
  We have transitioned to this current version which upgrades several software components.
  Our legacy workflow is still available as [v3.0.1](https://github.com/kids-first/kf-rnaseq-workflow/tree/v3.0.1), and on CAVATICA, [revision 8](https://cavatica.sbgenomics.com/public/apps/cavatica/apps-publisher/kfdrc-rnaseq-workflow/8)

  <p align="center">
    <img src="docs/kids_first_logo.svg" alt="Kids First repository logo" width="660px" />
  </p>
  <p align="center">
    <a href="https://github.com/kids-first/kf-rnaseq-workflow/blob/main/LICENSE"><img src="https://img.shields.io/github/license/kids-first/kf-rnaseq-workflow.svg?style=for-the-badge"></a>
  </p>

  ## Introduction
  This pipeline has an optional Cutadapt to trim adapters from the raw reads, alignment-to-fastq conversion if necessary, and passes the reads to STAR for alignment.
  The alignment output is used by RSEM for gene expression abundance estimation and rMATS for differential alternative splicing events detection.
  Additionally, Kallisto is used for quantification, but uses pseudoalignments to estimate the gene abundance from the raw data.
  Fusion calling is performed using Arriba and STAR-Fusion detection tools on the STAR alignment outputs.
  Filtering and prioritization of fusion calls is done by annoFuse.
  Metrics for the workflow are generated by RNA-SeQC.
  Junction files for the workflow are generated by rMATS.

  If you would like to run this workflow using the CAVATICA public app, a basic primer on running public apps can be found [here](https://www.notion.so/d3b/Starting-From-Scratch-Running-Cavatica-af5ebb78c38a4f3190e32e67b4ce12bb).
  Alternatively, if you would like to run it locally using `cwltool`, a basic primer on that can be found [here](https://www.notion.so/d3b/Starting-From-Scratch-Running-CWLtool-b8dbbde2dc7742e4aff290b0a878344d) and combined with app-specific info from the readme below.
  This workflow is the current production workflow, equivalent to this [CAVATICA public app](https://cavatica.sbgenomics.com/public/apps#cavatica/apps-publisher/kfdrc-rnaseq-workflow).

  ### Cutadapt
  [Cutadapt v3.4](https://github.com/marcelm/cutadapt) Cut adapter sequences from raw reads if needed.
  ### [STAR](docs/STAR_2.7.10a.md)
  [STAR v2.7.10a](https://doi.org/f4h523) RNA-Seq raw data alignment.
  ### [RSEM](docs/RSEM_1.3.1.md)
  [RSEM v1.3.1](https://doi:10/cwg8n5) Calculation of gene expression.
  ### Kallisto
  [Kallisto v0.43.1](https://doi:10.1038/nbt.3519) Raw data pseudoalignment to estimate gene abundance.
  ### [STAR-Fusion](docs/STAR-Fusion_1.10.1.md)
  [STAR-Fusion v1.10.1](https://doi:10.1101/120295) Fusion detection for `STAR` chimeric reads.
  ### [Arriba](docs/ARRIBA_2.2.1.md)
  [Arriba v2.2.1](https://github.com/suhrig/arriba/) Fusion caller that uses `STAR` aligned reads and chimeric reads output.
  ### [annoFuse](docs/D3B_ANNOFUSE.md)
  [annoFuse 0.92.0](https://github.com/d3b-center/annoFuse/releases/tag/v0.92.0) Filter and prioritize fusion calls. For more information, please see the following [paper](https://www.biorxiv.org/content/10.1101/839738v3).
  ### RNA-SeQC
  [RNA-SeQC v2.3.4](https://github.com/broadinstitute/rnaseqc) Generate metrics such as gene and transcript counts, sense/antisense mapping, mapping rates, etc
  ### [rMATS](docs/D3B_RMATS.md)
  [rMATS turbo v4.1.2](https://github.com/Xinglab/rmats-turbo) Computational tool to detect differential alternative splicing events from RNA-Seq data
  ### [T1k](docs/T1K_README.md)
  [T1k v1.0.5](https://github.com/mourisl/T1K/) Genotype highly polymorphic genes (e.g. HLA) with bulk RNA-seq data.

  ## Usage

  ### Runtime Estimates:
  Based on a test set of five input BAMs, CAVATICA compute and storage estimates:
   - Typical 2 hour run time, 10 hours is a higher end possibility
   - Cost:
     - Pure spot instances with no terminations: $2.37 mean
     - Pure on-demand: $5.19 mean
     - Warning: If spot instance kill rate is high, especially for `c5.9xlarge` instance type, the cost could end up greater than on-demand
   - Storage:
     - Total output size 6GB mean
     - Storage estimate ~ $0.14 per month

  ### Inputs common:
  ```yaml
  inputs:
    output_basename: { type: 'string?', doc: "String to use as basename for outputs. Will use read1 file basename if null." }
    reads1: { type: File, doc: "Input fastq file, gzipped or uncompressed OR alignment file file" }
    reads2: { type: 'File?', doc: "If paired end, R2 reads files, gzipped or uncompressed" }

    is_paired_end: {type: 'boolean?', doc: "For BAM inputs, are the reads paired end?"}
    wf_strand_param: { type: ['null', {type: 'enum', name: wf_strand_param, symbols: ["default",
            "rf-stranded", "fr-stranded"]}], doc: "use 'default' for unstranded/auto, 'rf-stranded' if read1 in the fastq read pairs is reverse complement to the transcript, 'fr-stranded' if read1 same sense as transcript" }
    gtf_anno: { type: 'File', doc: "General transfer format (gtf) file with gene models corresponding to fasta reference" }
    star_fusion_genome_untar_path: {type: 'string?', doc: "This is what the path will be when genome_tar is unpackaged", default: "GRCh38_v39_CTAT_lib_Mar242022.CUSTOM"}
    reference_fasta: {type: 'File', doc: "GRCh38.primary_assembly.genome.fa", "sbg:suggestedValue": {
      class: File, path: 5f500135e4b0370371c051b4, name: GRCh38.primary_assembly.genome.fa,
      secondaryFiles: [{class: File, path: 62866da14d85bc2e02ba52db, name: GRCh38.primary_assembly.genome.fa.fai}]},
    secondaryFiles: ['.fai']}

  ```

  ### Alignment (SAM/BAM/CRAM) input-specific:
  ```yaml
  inputs:
    reads1: File
  ```

  ### PE Fastq input-specific:
  ```yaml
  inputs:
    reads1: File
    reads2: File
  ```

  ### SE Fastq input-specific:
  ```yaml
  inputs:
    reads1: File
  ```

  ### Samtools fastq:
  ```yaml
  samtools_fastq_cores: { type: 'int?', doc: "Num cores for align2fastq conversion, if input is an alignment file", default: 16 }
  cram_reference: { type: 'File?', secondaryFiles: [.fai], doc: "If input align is cram and you are uncertain all contigs are registered at http://www.ebi.ac.uk/ena/cram/md5/, provide here" }
  ```
  ### cutadapt:
  ```yaml
  r1_adapter: { type: 'string?', doc: "Optional input. If the input reads have already been trimmed, leave these as null. If they do need trimming, supply the adapters." }
  r2_adapter: { type: 'string?', doc: "Optional input. If the input reads have already been trimmed, leave these as null. If they do need trimming, supply the adapters." }
  min_len: { type: 'int?', doc: "If you do not use this option, reads that have a length of zero (empty reads) are kept in the output", default: 20 }
  quality_base: { type: 'int?', doc: "Phred scale used", default: 33 }
  quality_cutoff: {type: 'int[]?', doc: "Quality trim cutoff, see https://cutadapt.readthedocs.io/en/v3.4/guide.html#quality-trimming for how 5' 3' is handled" }

  ```
  ### STAR:
  This section may seem overwhelming.
  Many defaults are set.
  Kids First favors setting/overriding defaults with "arriba-heavy" specified in [STAR docs](docs/STAR_2.7.10a.md), however if it is not a tumor sample, then GTEx is preferred
  ```yaml
    outSAMattrRGline: {type: 'string?', doc: "Suggested setting format is: ID:sample_name LB:aliquot_id PL:platform SM:BSID for example ID:7316-242 LB:750189 PL:ILLUMINA SM:BS_W72364MN. STAR will automatically convert unquoted spaces into tabs. If you wish to have a value with whitespace, the KEY:VALUE must be enclosed in double quotes. Refer to the start documentation for complete input details. If not provided, value will be autogenerated based on the reads1 file basename."}
    STARgenome: {type: File, doc: "Tar gzipped reference that will be unzipped at run time", "sbg:suggestedValue": {class: File, path: 62853e7ad63f7c6d8d7ae5a7,
        name: STAR_2.7.10a_GENCODE39.tar.gz}}
    runThreadN: {type: 'int?', default: 36, doc: "Adjust this value to change number of cores used."}
    twopassMode: {type: ['null', {type: enum, name: twopassMode, symbols: ["Basic",
            "None"]}], default: "Basic", doc: "Enable two pass mode to detect novel splice events. Default is basic (on)."}
    alignSJoverhangMin: {type: 'int?', default: 8, doc: "minimum overhang for unannotated junctions. ENCODE default used."}
    outFilterMismatchNoverLmax: {type: 'float?', default: 0.1, doc: "alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value"}
    outFilterType: {type: ['null', {type: enum, name: outFilterType, symbols: ["BySJout",
            "Normal"]}], default: "BySJout", doc: "type of filtering. Normal: standard filtering using only current alignment. BySJout (default): keep only those reads that contain junctions that passed filtering into SJ.out.tab."}
    outFilterScoreMinOverLread: {type: 'float?', default: 0.33, doc: "alignment will be output only if its score is higher than or equal to this value, normalized to read length (sum of mate's lengths for paired-end reads)"}
    outFilterMatchNminOverLread: {type: 'float?', default: 0.33, doc: "alignment will be output only if the number of matched bases is higher than or equal to this value., normalized to the read length (sum of mates' lengths for paired-end reads)"}
    outReadsUnmapped: {type: ['null', {type: enum, name: outReadsUnmapped, symbols: [
            "None", "Fastx"]}], default: "None", doc: "output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). none (default): no output. Fastx: output in separate fasta/fastq files, Unmapped.out.mate1/2."}
    limitSjdbInsertNsj: {type: 'int?', default: 1200000, doc: "maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run"}
    outSAMstrandField: {type: ['null', {type: enum, name: outSAMstrandField, symbols: [
            "intronMotif", "None"]}], default: "intronMotif", doc: "Cufflinks-like strand field flag. None: not used. intronMotif (default): strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out."}
    outFilterIntronMotifs: {type: ['null', {type: enum, name: outFilterIntronMotifs,
          symbols: ["None", "RemoveNoncanonical", "RemoveNoncanonicalUnannotated"]}],
      default: "None", doc: "filter alignment using their motifs. None (default): no filtering. RemoveNoncanonical: filter out alignments that contain non-canonical junctions RemoveNoncanonicalUnannotated: filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept."}
    alignSoftClipAtReferenceEnds: {type: ['null', {type: enum, name: alignSoftClipAtReferenceEnds,
          symbols: ["Yes", "No"]}], default: "Yes", doc: "allow the soft-clipping of the alignments past the end of the chromosomes. Yes (default): allow. No: prohibit, useful for compatibility with Cufflinks"}
    quantMode: {type: ['null', {type: enum, name: quantMode, symbols: [TranscriptomeSAM
              GeneCounts, '-', TranscriptomeSAM, GeneCounts]}], default: TranscriptomeSAM
        GeneCounts, doc: "types of quantification requested. -: none. TranscriptomeSAM: output SAM/BAM alignments to transcriptome into a separate file GeneCounts: count reads per gene. Choices are additive, so default is 'TranscriptomeSAM GeneCounts'"}
    outSAMtype: {type: ['null', {type: enum, name: outSAMtype, symbols: ["BAM Unsorted",
            "None", "BAM SortedByCoordinate", "SAM Unsorted", "SAM SortedByCoordinate"]}],
      default: "BAM Unsorted", doc: "type of SAM/BAM output. None: no SAM/BAM output. Otherwise, first word is output type (BAM or SAM), second is sort type (Unsorted or SortedByCoordinate)"}
    outSAMunmapped: {type: ['null', {type: enum, name: outSAMunmapped, symbols: ["Within",
            "None", "Within KeepPairs"]}], default: "Within", doc: "output of unmapped reads in the SAM format. None: no output. Within (default): output unmapped reads within the main SAM file (i.e. Aligned.out.sam) Within KeepPairs: record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads"}
    genomeLoad: {type: ['null', {type: enum, name: genomeLoad, symbols: ["NoSharedMemory",
            "LoadAndKeep", "LoadAndRemove", "LoadAndExit"]}], default: "NoSharedMemory",
      doc: "mode of shared memory usage for the genome file. In this context, the default value makes the most sense, the others are their as a courtesy."}
    chimMainSegmentMultNmax: {type: 'int?', default: 1, doc: "maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments"}
    outSAMattributes: {type: 'string?', default: 'NH HI AS nM NM ch', doc: "a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. Please refer to the STAR manual, as there are numerous combinations: https://raw.githubusercontent.com/alexdobin/star_2.7.10a/master/doc/STARmanual.pdf"}
    alignInsertionFlush: {type: ['null', {type: enum, name: alignInsertionFlush, symbols: [
            "None", "Right"]}], default: "None", doc: "how to flush ambiguous insertion positions. None (default): insertions not flushed. Right: insertions flushed to the right. STAR Fusion recommended (SF)"}
    alignIntronMax: {type: 'int?', default: 1000000, doc: "maximum intron size. SF recommends 100000"}
    alignMatesGapMax: {type: 'int?', default: 1000000, doc: "maximum genomic distance between mates, SF recommends 100000 to avoid readthru fusions within 100k"}
    alignSJDBoverhangMin: {type: 'int?', default: 1, doc: "minimum overhang for annotated junctions. SF recommends 10"}
    outFilterMismatchNmax: {type: 'int?', default: 999, doc: "maximum number of mismatches per pair, large number switches off this filter"}
    alignSJstitchMismatchNmax: {type: 'string?', default: "5 -1 5 5", doc: "maximum number of mismatches for stitching of the splice junctions. Value '5 -1 5 5' improves SF chimeric junctions, also recommended by arriba (AR)"}
    alignSplicedMateMapLmin: {type: 'int?', default: 0, doc: "minimum mapped length for a read mate that is spliced. SF recommends 30"}
    alignSplicedMateMapLminOverLmate: {type: 'float?', default: 0.5, doc: "alignSplicedMateMapLmin normalized to mate length. SF recommends 0, AR 0.5"}
    chimJunctionOverhangMin: {type: 'int?', default: 10, doc: "minimum overhang for a chimeric junction. SF recommends 8, AR 10"}
    chimMultimapNmax: {type: 'int?', default: 50, doc: "maximum number of chimeric multi-alignments. SF recommends 20, AR 50."}
    chimMultimapScoreRange: {type: 'int?', default: 1, doc: "the score range for multi-mapping chimeras below the best chimeric score. Only works with chimMultimapNmax > 1. SF recommends 3"}
    chimNonchimScoreDropMin: {type: 'int?', default: 20, doc: "int>=0: to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value. SF recommends 10"}
    chimOutJunctionFormat: {type: 'int?', default: 1, doc: "formatting type for the Chimeric.out.junction file, value 1 REQUIRED for SF"}
    chimOutType: {type: ['null', {type: enum, name: chimOutType, symbols: ["Junctions SeparateSAMold WithinBAM SoftClip", "Junctions", "SeparateSAMold", "WithinBAM SoftClip", "WithinBAM HardClip", "Junctions SeparateSAMold", "Junctions WithinBAM SoftClip", "Junctions WithinBAM HardClip", "Junctions SeparateSAMold WithinBAM HardClip", "SeparateSAMold WithinBAM SoftClip", "SeparateSAMold WithinBAM HardClip"]}], default: "Junctions WithinBAM SoftClip", doc: "type of chimeric output. Args are additive, and defined as such - Junctions: Chimeric.out.junction. SeparateSAMold: output old SAM into separate Chimeric.out.sam file WithinBAM: output into main aligned BAM files (Aligned.*.bam). WithinBAM HardClip: hard-clipping in the CIGAR for supplemental chimeric alignments WithinBAM SoftClip:soft-clipping in the CIGAR for supplemental chimeric alignments"}
    chimScoreDropMax: {type: 'int?', default: 30, doc: "max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length. AR recommends 30"}
    chimScoreJunctionNonGTAG: {type: 'int?', default: -1, doc: "penalty for a non-GT/AG chimeric junction. default -1, SF recommends -4, AR -1"}
    chimScoreSeparation: {type: 'int?', default: 1, doc: "int>=0: minimum difference (separation) between the best chimeric score and the next one. AR recommends 1"}
    chimSegmentMin: {type: 'int?', default: 10, doc: "minimum length of chimeric segment length, if ==0, no chimeric output. REQUIRED for SF, 12 is their default, AR recommends 10"}
    chimSegmentReadGapMax: {type: 'int?', default: 3, doc: "maximum gap in the read sequence between chimeric segments. AR recommends 3"}
    outFilterMultimapNmax: {type: 'int?', default: 50, doc: "max number of multiple alignments allowed for a read: if exceeded, the read is considered unmapped. ENCODE value is default. AR recommends 50"}
    peOverlapMMp: {type: 'float?', default: 0.01, doc: "maximum proportion of mismatched bases in the overlap area. SF recommends 0.1"}
    peOverlapNbasesMin: {type: 'int?', default: 10, doc: "minimum number of overlap bases to trigger mates merging and realignment. Specify >0 value to switch on the 'merging of overlapping mates'algorithm. SF recommends 12,  AR recommends 10"}
  ```
  ### arriba:
  ```yaml
    arriba_memory: {type: 'int?', doc: "Mem intensive tool. Set in GB", default: 64}
  ```
  ### STAR Fusion:
  ```yaml
    FusionGenome: {type: 'File', doc: "STAR-Fusion CTAT Genome lib", "sbg:suggestedValue": {
        class: File, path: 62853e7ad63f7c6d8d7ae5a8, name: GRCh38_v39_CTAT_lib_Mar242022.CUSTOM.tar.gz}}
    compress_chimeric_junction: {type: 'boolean?', default: true, doc: 'If part of a
        workflow, recommend compressing this file as final output'}
  ```
  ### RNAseQC:
  ```yaml
    RNAseQC_GTF: {type: 'File', doc: "gtf file from `gtf_anno` that has been collapsed GTEx-style", "sbg:suggestedValue": {class: File, path: 62853e7ad63f7c6d8d7ae5a3,
        name: gencode.v39.primary_assembly.rnaseqc.stranded.gtf}}
  ```
  ### kallisto
  ```yaml
    kallisto_idx: {type: 'File', doc: "Specialized index of a **transcriptome** fasta file for kallisto", "sbg:suggestedValue": {class: File, path: 62853e7ad63f7c6d8d7ae5a6,
        name: RSEM_GENCODE39.transcripts.kallisto.idx}}
  ```
  ### RSEM:
  ```yaml
    RSEMgenome: {type: 'File', doc: "RSEM reference tar ball", "sbg:suggestedValue": {
        class: File, path: 62853e7ad63f7c6d8d7ae5a5, name: RSEM_GENCODE39.tar.gz}}
    estimate_rspd: {type: 'boolean?', doc: "Set this option if you want to estimate the read start position distribution (RSPD) from data", default: true}
  ```
  ### annoFuse:
  ```yaml
    sample_name: {type: 'string?', doc: "Sample ID of the input reads. If not provided, will use reads1 file basename."}
    annofuse_col_num: {type: 'int?', doc: "0-based column number in file of fusion name."}
    fusion_annotator_ref: { type: 'File', doc: "Tar ball with fusion_annot_lib.idx and blast_pairs.idx from STAR-Fusion CTAT Genome lib. Can be same as FusionGenome, but only two files needed from that package", "sbg:suggestedValue": { class: 'File', path: '63cff818facdd82011c8d6fe', name: 'GRCh38_v39_fusion_annot_custom.tar.gz' }}
  ```
  ### rmats
  ```yaml
    rmats_variable_read_length: {type: 'boolean?', doc: "Allow reads with lengths that differ from --readLength to be processed. --readLength will still be used to determine IncFormLen and SkipFormLen."}
    rmats_novel_splice_sites: {type: 'boolean?', doc: "Select for novel splice site detection or unannotated splice sites. 'true' to detect or add this parameter, 'false' to disable denovo detection. Tool Default: false"}
    rmats_stat_off: {type: 'boolean?', doc: "Select to skip statistical analysis, either between two groups or on single sample group. 'true' to add this parameter. Tool default: false"}
    rmats_allow_clipping: {type: 'boolean?', doc: "Allow alignments with soft or hard clipping to be used."}
    rmats_threads: {type: 'int?', doc: "Threads to allocate to RMATs."}
    rmats_ram: {type: 'int?', doc: "GB of RAM to allocate to RMATs."}
  ```

  ### T1k
  ```yaml
    run_t1k: { type: 'boolean?', default: true, doc: "Set to false to disable T1k HLA typing" }
    hla_rna_ref_seqs: { type: 'File?', doc: "FASTA file containing the HLA allele reference sequences for RNA." }
    hla_rna_gene_coords: { type: 'File?', doc: "FASTA file containing the coordinates of the HLA genes for RNA." }
  ```

  ### Run:

  1) Reads inputs:

  - For PE fastq input, please enter the reads 1 file in `reads1` and the reads 2 file in `reads2`.
  - For SE fastq input, enter the single ends reads file in `reads1` and leave `reads2` empty as it is optional.
  - For alignment input (SAM/BAM/CRAM), please enter the reads file in `reads1` and leave `reads2` empty as it is optional.

  2) `r1_adapter` and `r2_adapter` are OPTIONAL:

  - If the input reads have already been trimmed, leave these as null and cutadapt step will simple pass on the fastq files to STAR.
  - If they do need trimming, supply the adapters and the cutadapt step will trim, and pass trimmed fastqs along.
  - `min_len` if adapter is trimmed, currently set to min `20` bp. Change this as you see fit
  - `quality_base` set to phred scale `33` by default if trimming. There was a weird time when `64` was used - change if different
  - `quality_cutoff` if adapter is trimmed and you want to set a min bp quality. A single value will apply to both paired ends, 2 values will allow you to assign a different one to each (unusual)

  3) `wf_strand_param` is now *optional* as the workflow will try to determine strandedness for you. Note: if the workflow fails to detect a strandedness, it will fail. If you'd like to override autodetect, it is a workflow convenience param so that, if you input the following, the equivalent will propagate to the four tools that use that parameter:

  - `default`: 'rsem_std': null, 'kallisto_std': null, 'rnaseqc_std': null, 'arriba_std': null. This means unstranded or auto in the case of arriba.
  - `rf-stranded`: 'rsem_std': 0, 'kallisto_std': 'rf-stranded', 'rnaseqc_std': 'rf', 'arriba_std': 'reverse'.  This means if read1 in the input fastq/bam is reverse complement to the transcript that it maps to.
  - `fr-stranded`: 'rsem_std': 1, 'kallisto_std': 'fr-stranded', 'rnaseqc_std': 'fr', 'arriba_std': 'yes'. This means if read1 in the input fastq/bam is the same sense (maps 5' to 3') to the transcript that it maps to.

  4) Suggested STAR `outSAMattrRGline` format is `ID:sample_name LB:aliquot_id   PL:platform SM:BSID`:

  For example, `ID:7316-242   LB:750189 PL:ILLUMINA SM:BS_W72364MN`

  These `KEY:VALUE` fields can be separated by either a whitespace or tab
  character. Any unquoted whitespace will be automatically converted to a tab
  value by STAR. If you wish to include whitespaces in your `VALUE`, you must put
  double quotes around the `KEY:VALUE`. For example if you wanted a `DS` key with a
  `I love read groups` value, the entry would look like: `ID:xxx "DS:I love read
  groups"`. See the STAR documentation on `outSAMattrRGline` for complete details.

  5) Suggested REFERENCE inputs are:

  - `reference_fasta`: [GRCh38.primary_assembly.genome.fa](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/GRCh38.primary_assembly.genome.fa.gz), will need to unzip
  - `gtf_anno`: [gencode.v39.primary_assembly.annotation.gtf](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/gencode.v39.primary_assembly.annotation.gtf.gz), will need to unzip
  - `FusionGenome`: GRCh38_v39_CTAT_lib_Mar242022.CUSTOM.tar.gz. A custom library built using instructions from (https://github.com/STAR-Fusion/STAR-Fusion/wiki/installing-star-fusion#preparing-the-genome-resource-lib), using GENCODE 39 reference.
  - `RNAseQC_GTF`: gencode.v39.primary_assembly.rnaseqc.stranded.gtf OR gencode.v39.primary_assembly.rnaseqc.unstranded.gtf, built using `gtf_anno` and following build instructions [here](https://github.com/broadinstitute/rnaseqc#usage) and [here](https://github.com/broadinstitute/gtex-pipeline/tree/master/gene_model)
  - `RSEMgenome`: RSEM_GENCODE39.tar.gz, built using the `reference_fasta` and `gtf_anno`, following `GENCODE` instructions from [here](https://deweylab.github.io/RSEM/README.html), then creating a tar ball of the results.
  - `STARgenome`: STAR_2.7.10a_GENCODE39.tar.gz, created using the star_2.7.10a_genome_generate.cwl tool, using the `reference_fasta`, `gtf_anno`, and setting `sjdbOverhang` to 100
  - `kallisto_idx`: RSEM_GENCODE39.transcripts.kallisto.idx, built from RSEM GENCODE 39 transcript fasts, in `RSEMgenome` tar ball, following instructions from [here](https://pachterlab.github.io/kallisto/manual)
  - `hla_rna_ref_seqs`: hla_v3.43.0_gencode_v39_rna_seq.fa, created using https://github.com/mourisl/T1K/blob/master/t1k-build.pl with [hla.dat v3.43.0](http://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla.dat) and [GENCODE v39 primary assembly GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/gencode.v39.primary_assembly.annotation.gtf.gz)
  - `hla_rna_gene_coords`: hla_v3.43.0_gencode_v39_rna_coord.fa, created using https://github.com/mourisl/T1K/blob/master/t1k-build.pl with [hla.dat v3.43.0](http://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla.dat) and [GENCODE v39 primary assembly GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/gencode.v39.primary_assembly.annotation.gtf.gz)

  6) rMATS requires the length of the reads in the sample. This workflow will attempt to estimate the read length based on a polling of reads. If the user wishes to override this value they can set `read_length_median` to their desired read length. Additionally, there is a `rmats_variable_read_length` boolean that users can set if their reads are not uniform in length. This workflow will poll the reads and set that value to true if it observes multiple read lengths. Like read length, user-provided input will override this guess.

  7) While `output_basename`, `sample_name`, and `outSAMattrRGline` are optional, it is strongly recommended that the user provide these values for data quality purposes. If the user does not provide these values, the basename of the reads1 file will be substituted in their place.

  - `output_basename` and `sample_name` values will become `reads1.basename.split('.')[0]`
  - `outSAMattrRGline` value will become `ID:reads1.basename.split('.')[0]_1 LB:reads1.basename.split('.')[0] SM:reads1.basename.split('.')[0] PL:Illumina`
  - Additionally, if no `outSAMattrRGline` input is provided a disclaimer will be added to the `@RG` header line that reads: `DS:Values for this read group were auto-generated and may not reflect the true read group information.`

  ### Outputs:
  ```yaml
    cutadapt_stats: {type: 'File?', outputSource: cutadapt_3-4/cutadapt_stats, doc: "Cutadapt stats output, only if adapter is supplied."}
    STAR_sorted_genomic_cram: {type: 'File', outputSource: samtools_bam_to_cram/output,
      doc: "STAR sorted and indexed genomic alignment cram"}
    STAR_chimeric_junctions: {type: 'File?', outputSource: star_fusion_1-10-1/chimeric_junction_compressed,
      doc: "STAR chimeric junctions"}
    STAR_gene_count: {type: 'File', outputSource: star_2-7-10a/gene_counts, doc: "STAR genecounts"}
    STAR_junctions_out: {type: 'File', outputSource: star_2-7-10a/junctions_out, doc: "STARjunction reads"}
    STAR_final_log: {type: 'File', outputSource: star_2-7-10a/log_final_out, doc: "STAR metricslog file of unique, multi-mapping, unmapped, and chimeric reads"}
    STAR-Fusion_results: {type: 'File', outputSource: star_fusion_1-10-1/abridged_coding,
      doc: "STAR fusion detection from chimeric reads"}
    arriba_fusion_results: {type: 'File', outputSource: arriba_fusion_2-2-1/arriba_fusions,
      doc: "Fusion output from Arriba"}
    arriba_fusion_viz: {type: 'File', outputSource: arriba_draw_2-2-1/arriba_pdf, doc: "pdf output from Arriba"}
    RSEM_isoform: {type: 'File', outputSource: rsem/isoform_out, doc: "RSEM isoform expression estimates"}
    RSEM_gene: {type: 'File', outputSource: rsem/gene_out, doc: "RSEM gene expression estimates"}
    RNASeQC_Metrics: {type: 'File', outputSource: rna_seqc/Metrics, doc: "Metrics on mapping, intronic, exonic rates, count information, etc"}
    RNASeQC_counts: {type: 'File', outputSource: supplemental/RNASeQC_counts, doc: "Contains gene tpm, gene read, and exon counts"}
    kallisto_Abundance: {type: 'File', outputSource: kallisto/abundance_out, doc: "Gene abundance output from STAR genomic bam file"}
    annofuse_filtered_fusions_tsv: {type: 'File?', outputSource: annofuse/annofuse_filtered_fusions_tsv,
      doc: "Filtered fusions called by annoFuse."}
    rmats_filtered_alternative_3_prime_splice_sites_jc: {type: 'File', outputSource: rmats/filtered_alternative_3_prime_splice_sites_jc,
      doc: "Alternative 3 prime splice sites JC.txt output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
    rmats_filtered_alternative_5_prime_splice_sites_jc: {type: 'File', outputSource: rmats/filtered_alternative_5_prime_splice_sites_jc,
      doc: "Alternative 5 prime splice sites JC.txt output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
    rmats_filtered_mutually_exclusive_exons_jc: {type: 'File', outputSource: rmats/filtered_mutually_exclusive_exons_jc,
      doc: "Mutually exclusive exons JC.txt output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
    rmats_filtered_retained_introns_jc: {type: 'File', outputSource: rmats/filtered_retained_introns_jc,
      doc: "Retained introns JC.txt output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
    rmats_filtered_skipped_exons_jc: {type: 'File', outputSource: rmats/filtered_skipped_exons_jc,
      doc: "Skipped exons JC.txt output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
    t1k_genotype_tsv: {type: 'File?', outputSource: t1k/genotype_tsv, doc: "Genotyping results from T1k" }
  ```

  ## Reference build notes:
   - STAR-Fusion reference built with command `/usr/local/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl --gtf gencode.v39.primary_assembly.annotation.gtf --annot_filter_rule ../AnnotFilterRule.pm --CPU 36 --fusion_annot_lib ../fusion_lib.Mar2021.dat.gz --genome_fa ../GRCh38.primary_assembly.genome.fa --output_dir GRCh38_v39_CTAT_lib_Mar242022.CUSTOM --human_gencode_filter --pfam_db current --dfam_db human 2> build.errs > build.out &`
   - fusion_annotator_ref built by placing GRCh38_v39_CTAT_lib_Mar242022.CUSTOM/fusion_annot_lib.idx and GRCh38_v39_CTAT_lib_Mar242022.CUSTOM/blast_pairs.idx into its own tar ball
   - kallisto index built using RSEM `RSEM_GENCODE39.transcripts.fa` file as transcriptome fasta, using command: `kallisto index -i RSEM_GENCODE39.transcripts.kallisto.idx RSEM_GENCODE39.transcripts.fa`
   - RNA-SEQc reference built using [collapse gtf script](https://github.com/broadinstitute/gtex-pipeline/blob/master/gene_model/collapse_annotation.py)
     - Two references needed if data are stranded vs. unstranded
     - Flag `--collapse_only` used for stranded
requirements:
- class: ScatterFeatureRequirement
- class: MultipleInputFeatureRequirement
- class: SubworkflowFeatureRequirement
- class: InlineJavascriptRequirement
- class: StepInputExpressionRequirement
inputs:
  # many tool
  reference_fasta: {type: 'File', doc: "GRCh38.primary_assembly.genome.fa", "sbg:suggestedValue": {class: File, path: 5f500135e4b0370371c051b4,
      name: GRCh38.primary_assembly.genome.fa, secondaryFiles: [{class: File, path: 62866da14d85bc2e02ba52db, name: GRCh38.primary_assembly.genome.fa.fai}]},
    secondaryFiles: ['.fai']}
  output_basename: {type: 'string?', doc: "String to use as basename for outputs. Will use read1 file basename if null"}
  reads1: {type: File, doc: "Input fastq file, gzipped or uncompressed OR alignment file"}
  reads2: {type: 'File?', doc: "If paired end, R2 reads files, gzipped or uncompressed"}
  is_paired_end: {type: 'boolean?', doc: "For BAM inputs, are the reads paired end?"}
  wf_strand_param: {type: ['null', {type: 'enum', name: wf_strand_param, symbols: ["default", "rf-stranded", "fr-stranded"]}], doc: "use
      'default' for unstranded/auto, 'rf-stranded' if read1 in the fastq read pairs is reverse complement to the transcript, 'fr-stranded'
      if read1 same sense as transcript"}
  gtf_anno: {type: 'File', doc: "General transfer format (gtf) file with gene models corresponding to fasta reference", "sbg:suggestedValue": {
      class: File, path: 62853e7ad63f7c6d8d7ae5a4, name: gencode.v39.primary_assembly.annotation.gtf}}
  star_fusion_genome_untar_path: {type: 'string?', doc: "This is what the path will be when genome_tar is unpackaged", default: "GRCh38_v39_CTAT_lib_Mar242022.CUSTOM"}
  read_length_median: {type: 'int?', doc: "The median read length for the reads provided."}
  read_length_stddev: {type: 'float?', doc: "Standard Deviation of the median read length."}
  samtools_fastq_cores: {type: 'int?', doc: "Num cores for align2fastq conversion, if input is an alignment file", default: 16}
  cram_reference: {type: 'File?', secondaryFiles: [.fai], doc: "If input align is cram and you are uncertain all contigs are registered
      at http://www.ebi.ac.uk/ena/cram/md5/, provide here"}
  r1_adapter: {type: 'string?', doc: "Optional input. If the input reads have already been trimmed, leave these as null. If they do
      need trimming, supply the adapters."}
  r2_adapter: {type: 'string?', doc: "Optional input. If the input reads have already been trimmed, leave these as null. If they do
      need trimming, supply the adapters."}
  min_len: {type: 'int?', doc: "If you do not use this option, reads that have a length of zero (empty reads) are kept in the output",
    default: 20}
  quality_base: {type: 'int?', doc: "Phred scale used", default: 33}
  quality_cutoff: {type: 'int[]?', doc: "Quality trim cutoff, see https://cutadapt.readthedocs.io/en/v3.4/guide.html#quality-trimming
      for how 5' 3' is handled"}
  outSAMattrRGline: {type: 'string?', doc: "Suggested setting format is: ID:sample_name LB:aliquot_id PL:platform SM:BSID for example
      ID:7316-242 LB:750189 PL:ILLUMINA SM:BS_W72364M N. STAR will automatically convert unquoted spaces into tabs. If you wish to
      have a value with whitespace, the KEY:VALUE must be enclosed in double quotes. Refer to the start documen tation for complete
      input details. If not provided, value will be autogenerated based on the reads1 file basename."}
  STARgenome: {type: File, doc: "Tar gzipped reference that will be unzipped at run time", "sbg:suggestedValue": {class: File, path: 62853e7ad63f7c6d8d7ae5a7,
      name: STAR_2.7.10a_GENCODE39.tar.gz}}
  runThreadN: {type: 'int?', default: 36, doc: "Adjust this value to change number of cores used by STAR."}
  twopassMode: {type: ['null', {type: enum, name: twopassMode, symbols: ["Basic", "None"]}], default: "Basic", doc: "Enable two pass
      mode to detect novel splice events. Default is basic (on)."}
  alignSJoverhangMin: {type: 'int?', default: 8, doc: "minimum overhang for unannotated junctions. ENCODE default used."}
  outFilterMismatchNoverLmax: {type: 'float?', default: 0.1, doc: "alignment will be output only if its ratio of mismatches to *mapped*
      length is less than or equal to this value"}
  outFilterType: {type: ['null', {type: enum, name: outFilterType, symbols: ["BySJout", "Normal"]}], default: "BySJout", doc: "type
      of filtering. Normal: standard filtering using only current alignment. BySJout (default): keep only those reads that contain
      junctions that passed filtering into SJ.out.tab."}
  outFilterScoreMinOverLread: {type: 'float?', default: 0.33, doc: "alignment will be output only if its score is higher than or equal
      to this value, normalized to read length (sum of mate's lengths for paired-end reads)"}
  outFilterMatchNminOverLread: {type: 'float?', default: 0.33, doc: "alignment will be output only if the number of matched bases
      is higher than or equal to this value., normalized to the read length (sum of mates' lengths for paired-end reads)"}
  outReadsUnmapped: {type: ['null', {type: enum, name: outReadsUnmapped, symbols: ["None", "Fastx"]}], default: "None", doc: "output
      of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). none (default):
      no output. Fastx: output in separate fasta/fastq files, Unmapped.out.mate1/2."}
  limitSjdbInsertNsj: {type: 'int?', default: 1200000, doc: "maximum number of junction to be inserted to the genome on the fly at
      the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run"}
  outSAMstrandField: {type: ['null', {type: enum, name: outSAMstrandField, symbols: ["intronMotif", "None"]}], default: "intronMotif",
    doc: "Cufflinks-like strand field flag. None: not used. intronMotif (default): strand derived from the intron motif. This option
      changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out."}
  outFilterIntronMotifs: {type: ['null', {type: enum, name: outFilterIntronMotifs, symbols: ["None", "RemoveNoncanonical", "RemoveNoncanonicalUnannotated"]}],
    default: "None", doc: "filter alignment using their motifs. None (default): no filtering. RemoveNoncanonical: filter out alignments
      that contain non-canonical junctions RemoveNoncanonicalUnannotated: filter out alignments that contain non-canonical unannotated
      junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept."}
  alignSoftClipAtReferenceEnds: {type: ['null', {type: enum, name: alignSoftClipAtReferenceEnds, symbols: ["Yes", "No"]}], default: "Yes",
    doc: "allow the soft-clipping of the alignments past the end of the chromosomes. Yes (default): allow. No: prohibit, useful for
      compatibility with Cufflinks"}
  quantMode: {type: ['null', {type: enum, name: quantMode, symbols: [TranscriptomeSAM GeneCounts, '-', TranscriptomeSAM, GeneCounts]}],
    default: TranscriptomeSAM GeneCounts, doc: "types of quantification requested. -: none. TranscriptomeSAM: output SAM/BAM alignments
      to transcriptome into a separate file GeneCounts: count reads per gene. Choices are additive, so default is 'TranscriptomeSAM
      GeneCounts'"}
  outSAMtype: {type: ['null', {type: enum, name: outSAMtype, symbols: ["BAM Unsorted", "None", "BAM SortedByCoordinate", "SAM Unsorted",
          "SAM SortedByCoordinate"]}], default: "BAM Unsorted", doc: "type of SAM/BAM output. None: no SAM/BAM output. Otherwise,
      first word is output type (BAM or SAM), second is sort type (Unsorted or SortedByCoordinate)"}
  outSAMunmapped: {type: ['null', {type: enum, name: outSAMunmapped, symbols: ["Within", "None", "Within KeepPairs"]}], default: "Within",
    doc: "output of unmapped reads in the SAM format. None: no output. Within (default): output unmapped reads within the main SAM
      file (i.e. Aligned.out.sam) Within KeepPairs: record unmapped mate for each alignment, and, in case of unsorted output, keep
      it adjacent to its mapped mate. Only affects multi-mapping reads"}
  genomeLoad: {type: ['null', {type: enum, name: genomeLoad, symbols: ["NoSharedMemory", "LoadAndKeep", "LoadAndRemove", "LoadAndExit"]}],
    default: "NoSharedMemory", doc: "mode of shared memory usage for the genome file. In this context, the default value makes the
      most sense, the others are their as a courtesy."}
  chimMainSegmentMultNmax: {type: 'int?', default: 1, doc: "maximum number of multi-alignments for the main chimeric segment. =1 will
      prohibit multimapping main segments"}
  outSAMattributes: {type: 'string?', default: 'NH HI AS nM NM ch', doc: "a string of desired SAM attributes, in the order desired
      for the output SAM. Tags can be listed in any combination/order. Please refer to the STAR manual, as there are numerous combinations:
      https://raw.githubusercontent.com/alexdobin/star_2.7.10a/master/doc/STARmanual.pdf"}
  alignInsertionFlush: {type: ['null', {type: enum, name: alignInsertionFlush, symbols: ["None", "Right"]}], default: "None", doc: "how
      to flush ambiguous insertion positions. None (default): insertions not flushed. Right: insertions flushed to the right. STAR
      Fusion recommended (SF)"}
  alignIntronMax: {type: 'int?', default: 1000000, doc: "maximum intron size. SF recommends 100000"}
  alignMatesGapMax: {type: 'int?', default: 1000000, doc: "maximum genomic distance between mates, SF recommends 100000 to avoid readthru
      fusions within 100k"}
  alignSJDBoverhangMin: {type: 'int?', default: 1, doc: "minimum overhang for annotated junctions. SF recommends 10"}
  outFilterMismatchNmax: {type: 'int?', default: 999, doc: "maximum number of mismatches per pair, large number switches off this
      filter"}
  alignSJstitchMismatchNmax: {type: 'string?', default: "5 -1 5 5", doc: "maximum number of mismatches for stitching of the splice
      junctions. Value '5 -1 5 5' improves SF chimeric junctions, also recommended by arriba (AR)"}
  alignSplicedMateMapLmin: {type: 'int?', default: 0, doc: "minimum mapped length for a read mate that is spliced. SF recommends 30"}
  alignSplicedMateMapLminOverLmate: {type: 'float?', default: 0.5, doc: "alignSplicedMateMapLmin normalized to mate length. SF recommends
      0, AR 0.5"}
  chimJunctionOverhangMin: {type: 'int?', default: 10, doc: "minimum overhang for a chimeric junction. SF recommends 8, AR 10"}
  chimMultimapNmax: {type: 'int?', default: 50, doc: "maximum number of chimeric multi-alignments. SF recommends 20, AR 50."}
  chimMultimapScoreRange: {type: 'int?', default: 1, doc: "the score range for multi-mapping chimeras below the best chimeric score.
      Only works with chimMultimapNmax > 1. SF recommends 3"}
  chimNonchimScoreDropMin: {type: 'int?', default: 20, doc: "int>=0: to trigger chimeric detection, the drop in the best non-chimeric
      alignment score with respect to the read length has to be greater than this value. SF recommends 10"}
  chimOutJunctionFormat: {type: 'int?', default: 1, doc: "formatting type for the Chimeric.out.junction file, value 1 REQUIRED for
      SF"}
  chimOutType: {type: ['null', {type: enum, name: chimOutType, symbols: ["Junctions SeparateSAMold WithinBAM SoftClip", "Junctions",
          "SeparateSAMold", "WithinBAM SoftClip", "WithinBAM HardClip", "Junctions SeparateSAMold", "Junctions WithinBAM SoftClip",
          "Junctions WithinBAM HardClip", "Junctions SeparateSAMold WithinBAM HardClip", "SeparateSAMold WithinBAM SoftClip", "SeparateSAMold
            WithinBAM HardClip"]}], default: "Junctions WithinBAM SoftClip", doc: "type of chimeric output. Args are additive, and
      defined as such - Junctions: Chimeric.out.junction. SeparateSAMold: output old SAM into separate Chimeric.out.sam file WithinBAM:
      output into main aligned BAM files (Aligned.*.bam). WithinBAM HardClip: hard-clipping in the CIGAR for supplemental chimeric
      alignments WithinBAM SoftClip:soft-clipping in the CIGAR for supplemental chimeric alignments"}
  chimScoreDropMax: {type: 'int?', default: 30, doc: "max drop (difference) of chimeric score (the sum of scores of all chimeric segments)
      from the read length. AR recommends 30"}
  chimScoreJunctionNonGTAG: {type: 'int?', default: -1, doc: "penalty for a non-GT/AG chimeric junction. default -1, SF recommends
      -4, AR -1"}
  chimScoreSeparation: {type: 'int?', default: 1, doc: "int>=0: minimum difference (separation) between the best chimeric score and
      the next one. AR recommends 1"}
  chimSegmentMin: {type: 'int?', default: 10, doc: "minimum length of chimeric segment length, if ==0, no chimeric output. REQUIRED
      for SF, 12 is their default, AR recommends 10"}
  chimSegmentReadGapMax: {type: 'int?', default: 3, doc: "maximum gap in the read sequence between chimeric segments. AR recommends
      3"}
  outFilterMultimapNmax: {type: 'int?', default: 50, doc: "max number of multiple alignments allowed for a read: if exceeded, the
      read is considered unmapped. ENCODE value is default. AR recommends 50"}
  peOverlapMMp: {type: 'float?', default: 0.01, doc: "maximum proportion of mismatched bases in the overlap area. SF recommends 0.1"}
  peOverlapNbasesMin: {type: 'int?', default: 10, doc: "minimum number of overlap bases to trigger mates merging and realignment.
      Specify >0 value to switch on the 'merging of overlapping mates'algorithm. SF recommends 12,  AR recommends 10"}
  arriba_memory: {type: 'int?', doc: "Mem intensive tool. Set in GB", default: 64}
  FusionGenome: {type: 'File', doc: "STAR-Fusion CTAT Genome lib", "sbg:suggestedValue": {class: File, path: 62853e7ad63f7c6d8d7ae5a8,
      name: GRCh38_v39_CTAT_lib_Mar242022.CUSTOM.tar.gz}}
  compress_chimeric_junction: {type: 'boolean?', default: true, doc: 'If part of a workflow, recommend compressing this file as final
      output'}
  RNAseQC_GTF: {type: 'File', doc: "gtf file from `gtf_anno` that has been collapsed GTEx-style", "sbg:suggestedValue": {class: File,
      path: 62853e7ad63f7c6d8d7ae5a3, name: gencode.v39.primary_assembly.rnaseqc.stranded.gtf}}
  kallisto_idx: {type: 'File', doc: "Specialized index of a **transcriptome** fasta file for kallisto", "sbg:suggestedValue": {class: File,
      path: 62853e7ad63f7c6d8d7ae5a6, name: RSEM_GENCODE39.transcripts.kallisto.idx}}
  RSEMgenome: {type: 'File', doc: "RSEM reference tar ball", "sbg:suggestedValue": {class: File, path: 62853e7ad63f7c6d8d7ae5a5, name: RSEM_GENCODE39.tar.gz}}
  estimate_rspd: {type: 'boolean?', doc: "Set this option if you want to estimate the read start position distribution (RSPD) from
      data", default: true}
  sample_name: {type: 'string?', doc: "Sample ID of the input reads. If not provided, will use reads1 file basename."}
  annofuse_col_num: {type: 'int?', doc: "0-based column number in file of fusion name.", default: 30}
  fusion_annotator_ref: {type: 'File', doc: "Tar ball with fusion_annot_lib.idx and blast_pairs.idx from STAR-Fusion CTAT Genome lib.
      Can be same as FusionGenome, but only two files needed from that package", "sbg:suggestedValue": {class: 'File', path: '63cff818facdd82011c8d6fe',
      name: 'GRCh38_v39_fusion_annot_custom.tar.gz'}}
  rmats_variable_read_length: {type: 'boolean?', doc: "Allow reads with lengths that differ from --readLength to be processed. --readLength
      will still be used to determine IncFormLen and SkipFormLen."}
  rmats_novel_splice_sites: {type: 'boolean?', doc: "Select for novel splice site detection or unannotated splice sites. 'true' to
      detect or add this parameter, 'false' to disable denovo detection. Tool Default: false"}
  rmats_stat_off: {type: 'boolean?', doc: "Select to skip statistical analysis, either between two groups or on single sample group.
      'true' to add this parameter. Tool default: false"}
  rmats_allow_clipping: {type: 'boolean?', doc: "Allow alignments with soft or hard clipping to be used."}
  rmats_threads: {type: 'int?', doc: "Threads to allocate to RMATs."}
  rmats_ram: {type: 'int?', doc: "GB of RAM to allocate to RMATs."}
  run_t1k: {type: 'boolean?', default: true, doc: "Set to false to disable T1k HLA typing"}
  hla_rna_ref_seqs: {type: 'File?', doc: "FASTA file containing the HLA allele reference sequences for RNA.", "sbg:suggestedValue": {
      class: File, path: 6669ac8127374715fc3ba3c3, name: hla_v3.43.0_gencode_v39_rna_seq.fa}}
  hla_rna_gene_coords: {type: 'File?', doc: "FASTA file containing the coordinates of the HLA genes for RNA.", "sbg:suggestedValue": {
      class: File, path: 6669ac8127374715fc3ba3c1, name: hla_v3.43.0_gencode_v39_rna_coord.fa}}
  t1k_abnormal_unmap_flag: {type: 'boolean?', doc: "Set if the flag in BAM for the unmapped read-pair is nonconcordant"}
  t1k_ram: {type: 'int?', doc: "GB of RAM to allocate to T1k." }
outputs:
  cutadapt_stats: {type: 'File?', outputSource: cutadapt_3-4/cutadapt_stats, doc: "Cutadapt stats output, only if adapter is supplied."}
  STAR_sorted_genomic_cram: {type: 'File', outputSource: samtools_bam_to_cram/output, doc: "STAR sorted and indexed genomic alignment
      cram"}
  STAR_chimeric_junctions: {type: 'File?', outputSource: star_fusion_1-10-1/chimeric_junction_compressed, doc: "STAR chimeric junctions"}
  STAR_gene_count: {type: 'File', outputSource: star_2-7-10a/gene_counts, doc: "STAR genecounts"}
  STAR_junctions_out: {type: 'File', outputSource: star_2-7-10a/junctions_out, doc: "STARjunction reads"}
  STAR_final_log: {type: 'File', outputSource: star_2-7-10a/log_final_out, doc: "STAR metricslog file of unique, multi-mapping, unmapped,
      and chimeric reads"}
  STAR-Fusion_results: {type: 'File', outputSource: star_fusion_1-10-1/abridged_coding, doc: "STAR fusion detection from chimeric
      reads"}
  arriba_fusion_results: {type: 'File', outputSource: arriba_fusion_2-2-1/arriba_fusions, doc: "Fusion output from Arriba"}
  arriba_fusion_viz: {type: 'File', outputSource: arriba_draw_2-2-1/arriba_pdf, doc: "pdf output from Arriba"}
  RSEM_isoform: {type: 'File', outputSource: rsem/isoform_out, doc: "RSEM isoform expression estimates"}
  RSEM_gene: {type: 'File', outputSource: rsem/gene_out, doc: "RSEM gene expression estimates"}
  RNASeQC_Metrics: {type: 'File', outputSource: rna_seqc/Metrics, doc: "Metrics on mapping, intronic, exonic rates, count information,
      etc"}
  RNASeQC_counts: {type: 'File', outputSource: supplemental/RNASeQC_counts, doc: "Contains gene tpm, gene read, and exon counts"}
  kallisto_Abundance: {type: 'File', outputSource: kallisto/abundance_out, doc: "Gene abundance output from STAR genomic bam file"}
  annofuse_filtered_fusions_tsv: {type: 'File?', outputSource: annofuse/annofuse_filtered_fusions_tsv, doc: "Filtered fusions called
      by annoFuse."}
  rmats_filtered_alternative_3_prime_splice_sites_jc: {type: 'File', outputSource: rmats/filtered_alternative_3_prime_splice_sites_jc,
    doc: "Alternative 3 prime splice sites JC.txt output from RMATs containing only those calls with 10 or more junction spanning
      read counts of support"}
  rmats_filtered_alternative_5_prime_splice_sites_jc: {type: 'File', outputSource: rmats/filtered_alternative_5_prime_splice_sites_jc,
    doc: "Alternative 5 prime splice sites JC.txt output from RMATs containing only those calls with 10 or more junction spanning
      read counts of support"}
  rmats_filtered_mutually_exclusive_exons_jc: {type: 'File', outputSource: rmats/filtered_mutually_exclusive_exons_jc, doc: "Mutually
      exclusive exons JC.txt output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
  rmats_filtered_retained_introns_jc: {type: 'File', outputSource: rmats/filtered_retained_introns_jc, doc: "Retained introns JC.txt
      output from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
  rmats_filtered_skipped_exons_jc: {type: 'File', outputSource: rmats/filtered_skipped_exons_jc, doc: "Skipped exons JC.txt output
      from RMATs containing only those calls with 10 or more junction spanning read counts of support"}
  t1k_genotype_tsv: {type: 'File?', outputSource: t1k/genotype_tsv, doc: "Genotyping results from T1k"}
steps:
  basename_picker:
    run: ../tools/basename_picker.cwl
    in:
      root_name:
        source: reads1
        valueFrom: $(self.basename.split('.')[0])
      output_basename: output_basename
      sample_name: sample_name
      star_rg_line: outSAMattrRGline
    out: [outname, outsample, outrg]
  alignmentfile_pairedness:
    run: ../tools/alignmentfile_pairedness.cwl
    when: $(inputs.input_reads.basename.search(/.(b|cr|s)am$/) != -1)
    in:
      input_reads: reads1
      input_reference: cram_reference
    out: [is_paired_end]
  align2fastq:
    # Skip if input is FASTQ already
    run: ../tools/samtools_fastq.cwl
    when: $(inputs.input_reads_1.basename.search(/.(b|cr|s)am$/) != -1)
    in:
      input_reads_1: reads1
      SampleID: basename_picker/outname
      cores: samtools_fastq_cores
      is_paired_end:
        source: [is_paired_end, alignmentfile_pairedness/is_paired_end]
        pickValue: first_non_null
      cram_reference: cram_reference
    out: [fq1, fq2]
  cutadapt_3-4:
    # Skip if no adapter given, get fastq from prev step if not null or wf input
    run: ../tools/cutadapter_3.4.cwl
    when: $(inputs.r1_adapter != null)
    in:
      readFilesIn1:
        source: [align2fastq/fq1, reads1]
        pickValue: first_non_null
      readFilesIn2:
        source: [align2fastq/fq2, reads2]
        pickValue: first_non_null
      r1_adapter: r1_adapter
      r2_adapter: r2_adapter
      min_len: min_len
      quality_base: quality_base
      quality_cutoff: quality_cutoff
      sample_name: basename_picker/outname
    out: [trimmedReadsR1, trimmedReadsR2, cutadapt_stats]
  star_2-7-10a:
    # will get fastq from first non-null in this order - cutadapt, align2fastq, wf input
    run: ../tools/star_2.7.10a_align.cwl
    in:
      outSAMattrRGline: basename_picker/outrg
      genomeDir: STARgenome
      readFilesIn1:
        source: [cutadapt_3-4/trimmedReadsR1, align2fastq/fq1, reads1]
        pickValue: first_non_null
      readFilesIn2:
        source: [cutadapt_3-4/trimmedReadsR2, align2fastq/fq2, reads2]
        pickValue: first_non_null
      outFileNamePrefix: basename_picker/outname
      runThreadN: runThreadN
      twopassMode: twopassMode
      alignSJoverhangMin: alignSJoverhangMin
      outFilterMismatchNoverLmax: outFilterMismatchNoverLmax
      outFilterType: outFilterType
      outFilterScoreMinOverLread: outFilterScoreMinOverLread
      outFilterMatchNminOverLread: outFilterMatchNminOverLread
      outReadsUnmapped: outReadsUnmapped
      limitSjdbInsertNsj: limitSjdbInsertNsj
      outSAMstrandField: outSAMstrandField
      outFilterIntronMotifs: outFilterIntronMotifs
      alignSoftClipAtReferenceEnds: alignSoftClipAtReferenceEnds
      quantMode: quantMode
      outSAMtype: outSAMtype
      outSAMunmapped: outSAMunmapped
      genomeLoad: genomeLoad
      chimMainSegmentMultNmax: chimMainSegmentMultNmax
      outSAMattributes: outSAMattributes
      alignInsertionFlush: alignInsertionFlush
      alignIntronMax: alignIntronMax
      alignMatesGapMax: alignMatesGapMax
      alignSJDBoverhangMin: alignSJDBoverhangMin
      outFilterMismatchNmax: outFilterMismatchNmax
      alignSJstitchMismatchNmax: alignSJstitchMismatchNmax
      alignSplicedMateMapLmin: alignSplicedMateMapLmin
      alignSplicedMateMapLminOverLmate: alignSplicedMateMapLminOverLmate
      chimJunctionOverhangMin: chimJunctionOverhangMin
      chimMultimapNmax: chimMultimapNmax
      chimMultimapScoreRange: chimMultimapScoreRange
      chimNonchimScoreDropMin: chimNonchimScoreDropMin
      chimOutJunctionFormat: chimOutJunctionFormat
      chimOutType: chimOutType
      chimScoreDropMax: chimScoreDropMax
      chimScoreJunctionNonGTAG: chimScoreJunctionNonGTAG
      chimScoreSeparation: chimScoreSeparation
      chimSegmentMin: chimSegmentMin
      chimSegmentReadGapMax: chimSegmentReadGapMax
      outFilterMultimapNmax: outFilterMultimapNmax
      peOverlapMMp: peOverlapMMp
      peOverlapNbasesMin: peOverlapNbasesMin
    out: [chimeric_junctions, chimeric_sam_out, gene_counts, genomic_bam_out, junctions_out, log_final_out, log_out, log_progress_out,
      transcriptome_bam_out]
  samtools_sort:
    run: ../tools/samtools_sort.cwl
    in:
      unsorted_bam: star_2-7-10a/genomic_bam_out
      chimeric_sam_out: star_2-7-10a/chimeric_sam_out
    out: [sorted_bam, sorted_bai, chimeric_bam_out]
  t1k:
    run: ../tools/t1k.cwl
    when: $(inputs.run_t1k)
    in:
      run_t1k: run_t1k
      bam:
        source: [samtools_sort/sorted_bam, samtools_sort/sorted_bai]
        valueFrom: |
          ${
            var bundle = self[0];
            bundle.secondaryFiles = [self[1]];
            return bundle;
          }
      reference: hla_rna_ref_seqs
      gene_coordinates: hla_rna_gene_coords
      abnormal_unmap_flag: t1k_abnormal_unmap_flag
      preset:
        valueFrom: "hla"
      output_basename:
        source: output_basename
        valueFrom: $(self).t1k_hla
      skip_post_analysis:
        valueFrom: $(1 == 1)
      ram: t1k_ram
    out: [genotype_tsv]
  bam_strandness:
    run: ../tools/bam_strandness.cwl
    in:
      input_bam: samtools_sort/sorted_bam
      annotation_gtf: gtf_anno
      kallisto_idx: kallisto_idx
      paired_end:
        source: [reads2, is_paired_end, alignmentfile_pairedness/is_paired_end]
        valueFrom: |
          $(self[0] != null ? true : self[1] != null ? self[1] : self[2])
    out: [output, strandedness, read_length_median, read_length_stddev, is_paired_end]
  rmats:
    run: ../workflow/rmats_wf.cwl
    in:
      gtf_annotation: gtf_anno
      sample_1_bams:
        source: samtools_sort/sorted_bam
        valueFrom: |
          $([self])
      read_length: read_length_median
      variable_read_length: rmats_variable_read_length
      read_type:
        source: [is_paired_end, bam_strandness/is_paired_end]
        pickValue: first_non_null
        valueFrom: |
          $(self ? "paired" : "single")
      strandedness:
        source: [wf_strand_param, bam_strandness/strandedness]
        pickValue: first_non_null
        valueFrom: |
          $(self == "rf-stranded" ? "fr-firststrand" : self == "fr-stranded" ? "fr-secondstrand" : "fr-unstranded")
      novel_splice_sites: rmats_novel_splice_sites
      stat_off: rmats_stat_off
      allow_clipping: rmats_allow_clipping
      output_basename: basename_picker/outname
      rmats_threads: rmats_threads
      rmats_ram: rmats_ram
    out: [filtered_alternative_3_prime_splice_sites_jc, filtered_alternative_5_prime_splice_sites_jc, filtered_mutually_exclusive_exons_jc,
      filtered_retained_introns_jc, filtered_skipped_exons_jc]
  strand_parse:
    run: ../tools/expression_parse_strand_param.cwl
    in:
      wf_strand_param:
        source: [wf_strand_param, bam_strandness/strandedness]
        pickValue: first_non_null
    out: [rsem_std, kallisto_std, rnaseqc_std, arriba_std]
  star_fusion_1-10-1:
    run: ../tools/star_fusion_1.10.1_call.cwl
    in:
      Chimeric_junction: star_2-7-10a/chimeric_junctions
      genome_tar: FusionGenome
      output_basename: basename_picker/outname
      genome_untar_path: star_fusion_genome_untar_path
      compress_chimeric_junction: compress_chimeric_junction
    out: [abridged_coding, chimeric_junction_compressed]
  arriba_fusion_2-2-1:
    run: ../tools/arriba_fusion_2.2.1.cwl
    in:
      genome_aligned_bam:
        source: [samtools_sort/sorted_bam, samtools_sort/sorted_bai]
        valueFrom: |
          ${
            var bundle = self[0];
            bundle.secondaryFiles = [self[1]];
            return bundle;
          }
      memory: arriba_memory
      reference_fasta: reference_fasta
      gtf_anno: gtf_anno
      outFileNamePrefix: basename_picker/outname
      arriba_strand_flag: strand_parse/arriba_std
    out: [arriba_fusions]
  arriba_draw_2-2-1:
    run: ../tools/arriba_draw_2.2.1.cwl
    in:
      fusions: arriba_fusion_2-2-1/arriba_fusions
      genome_aligned_bam:
        source: [samtools_sort/sorted_bam, samtools_sort/sorted_bai]
        valueFrom: |
          ${
            var bundle = self[0];
            bundle.secondaryFiles = [self[1]];
            return bundle;
          }
      gtf_anno: gtf_anno
      memory: arriba_memory
    out: [arriba_pdf]
  rsem:
    run: ../tools/rsem_calc_expression.cwl
    in:
      bam: star_2-7-10a/transcriptome_bam_out
      paired_end:
        source: [is_paired_end, bam_strandness/is_paired_end]
        pickValue: first_non_null
      estimate_rspd: estimate_rspd
      genomeDir: RSEMgenome
      outFileNamePrefix: basename_picker/outname
      strandedness: strand_parse/rsem_std
    out: [gene_out, isoform_out]
  rna_seqc:
    run: ../tools/rnaseqc_2.4.2.cwl
    in:
      aligned_sorted_reads: samtools_sort/sorted_bam
      collapsed_gtf: RNAseQC_GTF
      stranded: strand_parse/rnaseqc_std
      unpaired:
        source: [is_paired_end, bam_strandness/is_paired_end]
        pickValue: first_non_null
        valueFrom: |
          $(!self)
    out: [Metrics, Gene_TPM, Gene_count, Exon_count]
  supplemental:
    run: ../tools/supplemental_tar_gz.cwl
    in:
      outFileNamePrefix: basename_picker/outname
      Gene_TPM: rna_seqc/Gene_TPM
      Gene_count: rna_seqc/Gene_count
      Exon_count: rna_seqc/Exon_count
    out: [RNASeQC_counts]
  kallisto:
    run: ../tools/kallisto_calc_expression.cwl
    in:
      transcript_idx: kallisto_idx
      strand: strand_parse/kallisto_std
      reads1:
        source: [cutadapt_3-4/trimmedReadsR1, align2fastq/fq1, reads1]
        pickValue: first_non_null
      reads2:
        source: [cutadapt_3-4/trimmedReadsR2, align2fastq/fq2, reads2]
        pickValue: first_non_null
      SampleID: basename_picker/outname
      avg_frag_len:
        source: [read_length_median, bam_strandness/read_length_median]
        valueFrom: |
          $(self.some(function(e){ return e != null }) ? self.filter(function(e) { return e != null })[0] : null)
      std_dev:
        source: [read_length_stddev, bam_strandness/read_length_stddev]
        valueFrom: |
          $(self.some(function(e){ return e != null }) ? self.filter(function(e) { return e != null })[0] : null)
    out: [abundance_out]
  annofuse:
    run: ../workflow/kfdrc_annoFuse_wf.cwl
    in:
      sample_name: basename_picker/outsample
      FusionGenome: fusion_annotator_ref
      genome_untar_path: star_fusion_genome_untar_path
      rsem_expr_file: rsem/gene_out
      arriba_output_file: arriba_fusion_2-2-1/arriba_fusions
      star_fusion_output_file: star_fusion_1-10-1/abridged_coding
      col_num: annofuse_col_num
      output_basename: basename_picker/outname
    out: [annofuse_filtered_fusions_tsv]
  samtools_bam_to_cram:
    run: ../tools/samtools_bam_to_cram.cwl
    in:
      reference: reference_fasta
      input_bam:
        source: [samtools_sort/sorted_bam, samtools_sort/sorted_bai]
        valueFrom: |
          ${
            var bundle = self[0];
            bundle.secondaryFiles = [self[1]];
            return bundle;
          }
    out: [output]
$namespaces:
  sbg: https://sevenbridges.com
hints:
- class: "sbg:maxNumberOfParallelInstances"
  value: 3
"sbg:license": Apache License 2.0
"sbg:publisher": KFDRC
"sbg:categories":
- ALIGNMENT
- ANNOFUSE
- ARRIBA
- BAM
- CRAM
- CUTADAPT
- FASTQ
- KALLISTO
- PE
- RNASEQ
- RNASEQC
- RMATS
- RSEM
- SE
- STAR
"sbg:links":
- id: 'https://github.com/kids-first/kf-rnaseq-workflow/releases/tag/v4.8.0'
  label: github-release