Skip to content
This repository has been archived by the owner on May 2, 2024. It is now read-only.

Nextflow Pipelines

Alexander Pico edited this page Aug 5, 2020 · 2 revisions

Bulk RNA-seq

We are recommending the popular nextflow pipeline for RNA sequencing analysis pipeline using STAR, HISAT2 and Salmon with gene counts and quality control.

Useful links:

Installation and test run

  1. Log into wynton
ssh user@log2.wynton.ucsf.edu
  1. ssh into a dev node
ssh dev2
  1. From your home directory, download nextflow
curl -fsSL get.nextflow.io | bash
  1. Make a bin directory (if you haven't already) and move nextflow there
mkdir bin
mv nextflow ~/bin/
  1. Create a nextflow configuration file to specify SGE settings
printf 'process.executor = "sge"\nprocess.penv = "smp"\nprocess.clusterOptions = "-S /bin/bash"' > .nextflow/config
  1. Run the nextflow test pipeline specifying the singularity profile. The console will display the progress in realtime. A warning message will appear during the first run regarding the automatic creation of a singularity cache directory.
nextflow run nf-core/rnaseq -profile test,singularity
  1. The output be in the results directory. Pipeline reports are in results/pipeline_info/. Note: if you get an error, try running it a second time.
ls results/
ls results/pipeline_info

Custom runs

Now you can setup and run the pipeline on your own data with step like the following:

  • Copy your fastq files over to wynton (see How to move data)
  • Specify max_memory, genome, reads and optional skip* arguments in the command (see docs on reads, genome and many others args that considered carefully)
nextflow run nf-core/rnaseq --max_memory '8.GB' --skipBiotypeQC --skipFastQC --skipTrimming --genome GRCh38 --reads '*_R{1,2}.fastq.gz' -profile singularity

Pro-tips:

  • Review the execution_report.html to determine the necessary max_memory value for your analysis.
  • You may want to use screen or tmux to manage longer runs.

ATAC-seq

Coming soon...

Other analyses

Clone this wiki locally