The pipeline is inspired by bulkRNAseq pipeline from bioinformatics and biostatistics core at Van andel institute.
- Create a samplesheet file to execute the pipeline which should be a
csv
file following the format below:
sample | fq1 | fq2 |
---|---|---|
sampleA | sampleA_R1.fastq.gz | sampleA_R2.fastq.gz |
- Execute the pipeline. Following steps/tools will be executed.
fastqc
on each sample - raw fastq filesfastp
to trim adapter sequences and low quality reads- below options used for
fastp
--qualified_quality_phred 20
--adapter_fasta $adapter
- below options used for
fastq_screen
on R1 fastq files to detect possible contaminants.preseq
for library complexityqualimap
for gene body coverage plotsortMeRNA
for rRNA detection
multiqc
for summarizing the output files of the qc tools- Reads alignment using
STAR
withquantMode GeneCounts
option to generate a gene count matrix - Transcription quantification using
Salmon
Adjust the configuration files such as bulk_rnaseq_conf/run.config and cluster.config
. After that,
sbatch run_bulk_rnaseq.slurm
- clustter configuration ->
cluster.config
- location of reference genome ->
reference.config
,STAR
andsalmon
used. - singularity image file path ->
processes.config
run.config
for location of samplesheet and turn on/offRibodetector
for rRNA removal,salmon
andtpm calculator
strand info: try https://github.com/signalbash/how_are_we_stranded_here
https://github.com/igordot/genomics/blob/master/notes/rna-seq-strand.md
reverse strand for Illumina TruSeq Stranded Total RNA
https://dbrg77.wordpress.com/2015/03/20/library-type-option-in-the-tuxedo-suite/