-
Notifications
You must be signed in to change notification settings - Fork 10
Autoscaling Example Using AWS
John Vivian edited this page Apr 21, 2018
·
1 revision
- Add SSH key to Agent if not already added
ssh-add
- Install Toil and dependencies if not already installed
sudo apt-get install build-essential python-dev libssl-dev libffi-dev
pip install toil[all]
toil launch-cluster aws-example --keyPairName yourkeypair@gmail.com --leaderNodeType t2.medium --zone us-west-2a
toil ssh-cluster --zone us-west-2a aws-example
pip install toil-rnaseq
toil-rnaseq generate
toil-rnaseq run --retryCount=2 --nodeTypes=c3.8xlarge --maxNodes=1 --batchSystem=mesos --provisioner=aws aws:us-west-2:aws-example-jobstore
<Leader IP>:5050
##############################################################################################################
# TOIL RNA-SEQ WORKFLOW MANIFEST FILE #
##############################################################################################################
# Edit this manifest to include information pertaining to each sample to be run.
# There are 4 tab-separated columns: filetype, paired/unpaired, UUID, URL(s) to sample
#
# filetype Filetype of the sample. Options: "tar", "fq", or "bam" for tarball, fastq/fastq.gz, or BAM
# paired Indicates whether the data is paired or single-ended. Options: "paired" or "single"
# UUID This should be a unique identifier for the sample to be processed
# URL A URL starting with {scheme} that points to the sample
#
# If sample is being submitted as a fastq or several fastqs, provide URLs separated by a comma.
# If providing paired fastqs, alternate the fastqs so every R1 is paired with its R2 as the next URL.
# Samples must have the same extension - do not mix and match gzip and non-gzipped sample pairs.
#
# Samples consisting of tarballs with fastq files inside must follow the file name convention of
# ending in an R1/R2 or _1/_2 followed by one of the 4 extensions: .fastq.gz, .fastq, .fq.gz, .fq
#
# BAMs are accepted, but must have been aligned from paired reads NOT single-end reads.
#
# GDC URLs may only point to individual BAM files. No other format is accepted.
#
# Examples of several combinations are provided below. Lines beginning with # are ignored.
#
# tar paired UUID_1 file:///path/to/sample.tar
# fq paired UUID_2 file:///path/to/R1.fq.gz,file:///path/to/R2.fq.gz
# tar single UUID_3 http://sample-depot.com/single-end-sample.tar
# tar paired UUID_4 s3://my-bucket-name/directory/paired-sample.tar.gz
# fq single UUID_5 s3://my-bucket-name/directory/single-end-file.fq
# bam paired UUID_6 gdc://1a5f5e03-4219-4704-8aaf-f132f23f26c7
#
# Place your samples below, one per line.
fq paired EXAMPLE s3://example-aws-bucket/Read1.fq,s3://s3-example-bucket/Read2.fq
##############################################################################################################
# TOIL RNA-SEQ WORKFLOW CONFIGURATION FILE #
##############################################################################################################
# This configuration file is formatted in YAML. Simply write the value (at least one space) after the colon.
# Edit the values in this configuration file and then rerun the pipeline: "toil-rnaseq run"
# Just Kallisto or STAR/RSEM can be run by supplying only the inputs to those tools
#
# URLs can take the form: http://, ftp://, file://, s3://, gdc://
# Local inputs follow the URL convention: file:///full/path/to/input
# S3 URLs follow the convention: s3://bucket/directory/file.txt
#
# Comments (beginning with #) do not need to be removed. Optional parameters left blank are treated as false.
##############################################################################################################
# REQUIRED OPTIONS #
##############################################################################################################
# Required: Output location of sample. Can be full path to a directory or an s3:// URL
# WARNING: S3 buckets must exist prior to upload, or it will fail.
output-dir: s3://s3-example-bucket/
##############################################################################################################
# WORKFLOW INPUTS (Alignment and Quantification) #
##############################################################################################################
# URL {scheme} to index tarball used by STAR
star-index: http://hgwdev.soe.ucsc.edu/~jtvivian/toil-rnaseq-inputs/starIndex_hg38_no_alt.tar.gz
# URL {scheme} to reference tarball used by RSEM
# Running RSEM requires a star-index as a well as an rsem-ref
rsem-ref: http://hgwdev.soe.ucsc.edu/~jtvivian/toil-rnaseq-inputs/rsem_ref_hg38_no_alt.tar.gz
# URL {scheme} to kallisto index file.
kallisto-index: http://hgwdev.soe.ucsc.edu/~jtvivian/toil-rnaseq-inputs/kallisto_hg38.idx
# URL {scheme} to hera index
hera-index: http://hgwdev.soe.ucsc.edu/~jtvivian/toil-rnaseq-inputs/hera-index.tar.gz
# Maximum file size of input sample (for resource allocation during initial download)
max-sample-size: 20G
##############################################################################################################
# WORKFLOW OPTIONS (Quality Control) #
##############################################################################################################
# If true, will preprocess samples with cutadapt using adapter sequences.
cutadapt: true
# Adapter sequence to trim when running CutAdapt. Defaults set for Illumina
fwd-3pr-adapter: AGATCGGAAGAG
# Adapter sequence to trim (for reverse strand) when running CutAdapt. Defaults set for Illumina
rev-3pr-adapter: AGATCGGAAGAG
# If true, will run FastQC and include QC in sample output
fastqc: true
##############################################################################################################
# CREDENTIAL OPTIONS (for downloading samples from secure locations) #
##############################################################################################################
# Optional: Provide a full path to a 32-byte key used for SSE-C Encryption in Amazon
ssec:
# Optional: Provide a full path to the token.txt used to download from the GDC
gdc-token:
##############################################################################################################
# ADDITIONAL FILE OUTPUT OPTIONS #
##############################################################################################################
# Optional: If true, saves the wiggle file (.bg extension) output by STAR
# WARNING: Requires STAR sorting, which has memory leak issues that can crash the workflow.
wiggle:
# Optional: If true, saves the aligned BAM (by coordinate) produced by STAR
# You must also specify an ssec key if you want to upload to the s3-output-dir
# as read data is assumed to be controlled access
save-bam:
##############################################################################################################
# DEVELOPER OPTIONS #
##############################################################################################################
# Optional: If true, uses resource requirements appropriate for continuous integration
ci-test: