Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for 3-prime RNA sequencing #62

Merged
merged 150 commits into from
May 12, 2023
Merged
Show file tree
Hide file tree
Changes from 140 commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
849ed48
feat: added skeleton for 3-prime RNA-seq analysis
johanneskoester Jun 2, 2022
f8702fd
add a rule for obtaining the maximum read length after trimming
johanneskoester Jun 22, 2022
55a1fe6
remove unnecessary parameter
johanneskoester Jun 22, 2022
2bc7a0a
add script
johanneskoester Jun 22, 2022
381cfbc
add env
johanneskoester Jun 22, 2022
33dbb58
test push
Jun 23, 2022
fa2e575
added 3prime reference seq fetch code
Jun 27, 2022
e560178
updated the script get-3prime-seqs.R with 'coding' seqs
Jul 7, 2022
5eaa67d
added script for histogram and 3prime from cds
Jul 26, 2022
d88dbdd
added heatmap script for top 50 var genes
Aug 3, 2022
391af4c
added histogram and heatmap scripts
Aug 5, 2022
bfbcdfb
Modified the histogram plot and heatmap scripts
Aug 17, 2022
a1cce49
Updated histogram plot and dependency workflows
Aug 23, 2022
e651835
modified workflow with is-3-prime-rna-seq: true/false
manuelphilip Aug 26, 2022
d990322
Merge branch '3-prime-rna' of https://github.com/snakemake-workflows/…
manuelphilip Aug 26, 2022
e5426c8
resovled merge issue happened
manuelphilip Aug 26, 2022
2e5be41
modifed script get_3prime-seq.py
manuelphilip Aug 26, 2022
5a179bf
updated workflow that includes filtering out non-canonical transcript…
manuelphilip Sep 1, 2022
5697b3c
fixed unfiltering of canonical transcripts and added rst file for pat…
manuelphilip Sep 2, 2022
26efc57
updated QC plot and workflow to get aligned reads from canonical tran…
manuelphilip Sep 28, 2022
42aa1ae
updated cutadapt rule for 3prime reads and dependencies
manuelphilip Oct 4, 2022
4412dd1
updated config file and dependencies
manuelphilip Oct 5, 2022
81c90ff
update code plot_ind-transcripts_histogram.py and its dependencies
manuelphilip Oct 6, 2022
e34a894
renamed file plot_ind-transcripts_histogram.py
manuelphilip Oct 6, 2022
e486860
Added `plot-qc: all` to config file
manuelphilip Oct 6, 2022
664b48e
Merge branch 'main' into 3-prime-rna
johanneskoester Oct 11, 2022
24e52ed
Update workflow/envs/QC.yaml
manuelphilip Oct 11, 2022
b2f6d4b
Update workflow/envs/aligned_pos.yaml
manuelphilip Oct 11, 2022
72daaa0
Update workflow/envs/canonical_reads.yaml
manuelphilip Oct 27, 2022
9139320
Update workflow/envs/get_canonical_ids.yaml
manuelphilip Oct 27, 2022
0d4faab
Update config/config.yaml
manuelphilip Oct 27, 2022
eba5fb4
Update config/config.yaml
manuelphilip Oct 27, 2022
739cd14
Update workflow/envs/heatmap.yaml
manuelphilip Oct 27, 2022
695236d
Update workflow/envs/pysam.yaml
manuelphilip Oct 27, 2022
ca55ffe
Update workflow/envs/r-fasta.yaml
manuelphilip Oct 27, 2022
16811e3
Update workflow/report/plot-QC.rst
manuelphilip Oct 27, 2022
e1954d3
Update workflow/rules/common.smk
manuelphilip Oct 27, 2022
40c1ac9
Update workflow/scripts/sleuth-diffexp.R
manuelphilip Oct 27, 2022
f31cb99
Added bwa rule and updated workflow and its dependencies.
manuelphilip Oct 28, 2022
a57424a
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Oct 28, 2022
c204a20
Added spia-datavzd to report and updated workflow and its dependencies
manuelphilip Nov 3, 2022
cdd0fc6
updated and renamed `plot_ind-transcripts_histogram.py` to `plot-ind…
manuelphilip Nov 3, 2022
aa90b63
modified the workflow and auxiliary files for the QC plot
manuelphilip Nov 16, 2022
835d05c
Updated kallisto rules and fix bugs in QC-plot script
manuelphilip Nov 24, 2022
1c1081a
Add datavzrd tables for diffexp, go_terms and updated dependencies
manuelphilip Nov 30, 2022
7aa5a84
updated `config.schema.yaml` file and fix `spia datavzrd bugs`
manuelphilip Dec 1, 2022
2680e4d
updated config.yaml
manuelphilip Dec 1, 2022
f7ef4dd
updated datavzrd version, diff exp tables
manuelphilip Dec 1, 2022
5e99ab5
Merge branch 'main' into 3-prime-rna
johanneskoester Dec 2, 2022
6add9aa
fixes
johanneskoester Dec 2, 2022
10d960f
fix
johanneskoester Dec 2, 2022
890620f
fix formatting of cutadapt rules
johanneskoester Dec 2, 2022
b5be7d7
minor
johanneskoester Dec 2, 2022
1dcdac4
fix config access for 3-prime-rna-seq keys
johanneskoester Dec 2, 2022
bdb9ec7
added labels to datavzrd output
johanneskoester Dec 2, 2022
f2796ce
fix lints
johanneskoester Dec 2, 2022
17c531c
fix quotes
johanneskoester Dec 2, 2022
0dbdd83
fix quant for non-3-prime data
johanneskoester Dec 2, 2022
50574b7
fixes and categories
johanneskoester Dec 2, 2022
f0577eb
Fix folder path in `kallisto_quant` rule
manuelphilip Dec 2, 2022
2ada6a2
updated datavzrd rule with volcano plot
manuelphilip Dec 5, 2022
a4a4a95
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Dec 5, 2022
141a7d6
Fix kallisto input once `3-prime-rna-seq` is set to false. Added volc…
manuelphilip Dec 9, 2022
c448806
fix formatting rule datavzrd
manuelphilip Dec 9, 2022
9f95844
fix snakemake format for rule datavzrd
manuelphilip Dec 9, 2022
80d7a97
fix bugs when `3-prime-rna-seq` set to `false`
manuelphilip Dec 15, 2022
bee1b6c
fix formatting
manuelphilip Dec 15, 2022
f50673c
fix `vega_plot_volcano.py` bug
manuelphilip Dec 15, 2022
748ac56
Create config.yaml
manuelphilip Feb 1, 2023
79bc8d7
Update config.yaml
manuelphilip Feb 1, 2023
8eee57f
fixed formatting
manuelphilip Feb 1, 2023
d21b87e
updated cutadapt wrapper version
manuelphilip Feb 1, 2023
6dc9af5
fix cutadapt se
manuelphilip Feb 1, 2023
d7c364b
added `extra` options to `params` in cutadapt rule
manuelphilip Feb 1, 2023
789f296
fix cutadapt `se` params
manuelphilip Feb 1, 2023
99a18bd
fixed cutadapt `se` bug
manuelphilip Feb 1, 2023
4e7d22f
Update config.yaml
manuelphilip Feb 1, 2023
19c4793
Added test case file for 3-prime-RNA data
manuelphilip Feb 1, 2023
78c476b
Updated folder path of .test config and snakefile
manuelphilip Feb 1, 2023
4e1a913
updated .test raw fastq file path
manuelphilip Feb 2, 2023
5c5346e
Fix `batch_effect` to Batch_effect` in config.yaml
manuelphilip Feb 2, 2023
7287bf3
Fix bugs
manuelphilip Feb 2, 2023
38caa9c
updated main.yaml
manuelphilip Feb 2, 2023
edc8ddd
Fix go significant terms sorting
manuelphilip Feb 2, 2023
c92afc3
Added 3prime specific smk files/updated dependencies
manuelphilip Feb 7, 2023
0f67ecb
Merge branch 'main' into 3-prime-rna
manuelphilip Feb 7, 2023
6fac807
updated workflow wrappers and dependencies
manuelphilip Feb 9, 2023
3bd7b92
added `pre-define-genelist` in .test/3prime-config/config.yaml
manuelphilip Feb 9, 2023
b7fd3e7
added `pre-define-genelist` in .test/config.yaml
manuelphilip Feb 9, 2023
e37fed3
updated `.test/3-prime-config/config.yaml`
manuelphilip Feb 9, 2023
fab67c4
fix 3prime smk file bugs
manuelphilip Feb 9, 2023
24432fd
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Feb 9, 2023
3721489
updated `go-enrichment-template.yaml` file
manuelphilip Feb 9, 2023
a555328
Merge branch 'main' into 3-prime-rna
manuelphilip Feb 10, 2023
d6023b9
Updated differential expression heatmap script and dependencies
manuelphilip Feb 13, 2023
b8bf9d1
updated `.test` folder config.yaml file
manuelphilip Feb 13, 2023
e8dee5e
Merge branch '3-prime-rna' of github.com:snakemake-workflows/rna-seq-…
manuelphilip Feb 13, 2023
f7d6438
fix formatting
manuelphilip Feb 13, 2023
ecafb97
Fix .test/config.yaml file
manuelphilip Feb 13, 2023
c7997b8
fix fgsea path in `.test/3-prime-config/`
manuelphilip Feb 13, 2023
dfd6b25
fix fgsea path in `.test/3-prime-config/`
manuelphilip Feb 14, 2023
78ea54b
Update config/config.yaml
manuelphilip Feb 14, 2023
119b2e1
Update workflow/rules/common.smk
manuelphilip Feb 14, 2023
b753f89
Update workflow/rules/common.smk
manuelphilip Feb 14, 2023
1dec9bb
Update workflow/rules/common.smk
manuelphilip Feb 14, 2023
39e32f0
Updated differential heatmap script and its dependencies
manuelphilip Feb 15, 2023
33b571f
fix formatting `common.smk`
manuelphilip Feb 15, 2023
5a3a3f7
Update workflow/report/plot-heatmap.rst
manuelphilip Feb 17, 2023
d48ad74
Update workflow/rules/diffexp.smk
manuelphilip Feb 17, 2023
0d60d7c
updated differential expression rule and script
manuelphilip Feb 21, 2023
887fcff
updated differential expression/spia script and dependencies
manuelphilip Feb 22, 2023
16f4439
updated main.yaml for free space
manuelphilip Feb 23, 2023
c41fc28
fix main.yaml
manuelphilip Feb 23, 2023
0b905dd
updated main.yaml
manuelphilip Feb 23, 2023
41573ff
added `remove-android` and `remove-haskell` to get more disk space
manuelphilip Feb 23, 2023
0c3e033
fix formatting main.yaml
manuelphilip Feb 23, 2023
23d521d
fix formatting `main.yaml`
manuelphilip Feb 23, 2023
ea92da8
update main.yaml
manuelphilip Feb 23, 2023
3b4cf95
update main.yaml
manuelphilip Feb 23, 2023
b7508fb
update main.yaml
manuelphilip Feb 23, 2023
67dea25
update main.yaml
manuelphilip Feb 23, 2023
f47d6b2
updated main.yaml
manuelphilip Feb 23, 2023
f6b10b8
updated main.yaml file
manuelphilip Feb 23, 2023
2cee69f
updated main.yaml
manuelphilip Feb 23, 2023
ce72da3
updated main.yaml
manuelphilip Feb 23, 2023
e0a565f
update `main.yaml` file
manuelphilip Mar 9, 2023
e6a6598
fix `main.yaml`file and `snakefile`
manuelphilip Mar 9, 2023
2b61454
Fix path for 3-prime `volcano_plot` in test path
manuelphilip Mar 9, 2023
f4cafd6
fix `main.yaml` file
manuelphilip Mar 9, 2023
f54c1d3
check space of `.test`
manuelphilip Mar 9, 2023
6db5cdb
check `.test`space
manuelphilip Mar 9, 2023
9ed3f07
updated `.test` folder space
manuelphilip Mar 9, 2023
68e60b4
check space in `.test`
manuelphilip Mar 9, 2023
f5ac8f4
updated `main.yaml` file
manuelphilip Mar 10, 2023
5a5c475
Updated `args` for number of cores due to lack of memory
manuelphilip Mar 10, 2023
caf308b
Split into individual jobs in `main.yaml`
manuelphilip Mar 10, 2023
68bd3c3
Fix `main.yaml`
manuelphilip Mar 10, 2023
a4fc8a0
Fix heatmap plot width in pdf
manuelphilip Mar 10, 2023
bff05ce
Fix heatmap width while plotting in pdf
manuelphilip Mar 10, 2023
e2471fd
updated config.yaml/diffexp.smk and dependencies based on the comments
Mar 14, 2023
0dd5406
updated snakemake report removed redundant files/labels defined.
Mar 22, 2023
4871dd6
updated label config
Mar 29, 2023
07f54cc
fix formatting
Mar 29, 2023
fb9dc11
Updated Volcano plot/enrichment and spia datavzrd tables
Apr 27, 2023
8639c25
Fix formatting diffexp.smk
Apr 27, 2023
764af83
fix spia output file
Apr 28, 2023
52f1dc2
Updated datavzrd version and its dependencies
May 5, 2023
a40a218
Fix diffexp.smk formatting
May 5, 2023
9a31873
updated `main.yaml`file
May 5, 2023
0c78cf6
Set pandas version in `QC.yaml` to 1
May 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 50 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
#name: My build action requiring more space
#on: push
#
#jobs:
# build:
# name: Build my artifact
# runs-on: ubuntu-latest
# steps:
# - name: Maximize build space
# uses: easimon/maximize-build-space@master
# with:
# root-reserve-mb: 512
# swap-size-mb: 1024
# remove-dotnet: 'true'
# remove-android: 'true'
# remove-haskell: 'true'
# - name: Checkout
# uses: actions/checkout@v2
#
# - name: Build
# run: |
# echo "Free space:"
# df -h
# echo "free space in ${{ github.workspace }}"
# du -hs $(ls -A) ${{ github.workspace }}/*
# rm -rf ${{ github.workspace }}/*
# echo "free space in .test"
# du -hs $(ls -A) .test/*


name: Tests

on:
Expand Down Expand Up @@ -30,7 +60,7 @@ jobs:
snakefile: workflow/Snakefile
args: "--lint"

run-workflow:
run-rna-workflow:
runs-on: ubuntu-latest
needs:
- linting
Expand All @@ -47,8 +77,26 @@ jobs:
with:
directory: .test
snakefile: workflow/Snakefile
args: "--use-conda --show-failed-logs --cores 2 --conda-cleanup-pkgs cache --all-temp"
args: "--use-conda --show-failed-logs --cores 1 --conda-cleanup-pkgs cache --all-temp"

run-3prime-rna-workflow:
runs-on: ubuntu-latest
needs:
- linting
- formatting
steps:

- name: Checkout repository
uses: actions/checkout@v2
with:
submodules: recursive

- name: Test 3-prime-workflow
uses: snakemake/snakemake-github-action@v1.23.0
with:
directory: .test/3-prime-config
snakefile: workflow/Snakefile
args: "--use-conda --show-failed-logs --cores 1 --conda-cleanup-pkgs cache --all-temp"
# Disable report testing for now since we mark all output files as temporary above.
# TODO: add some kind of test mode to report generation which does not really try to include
# results.
Expand Down
138 changes: 138 additions & 0 deletions .test/3-prime-config/config/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
samples: config/samples.tsv
units: config/units.tsv

experiment:
# If set to `true`, this option allows the workflow to analyse 3-prime RNA seq data obtained from Quantseq protocol by Lexogen.
# For more information https://www.lexogen.com/quantseq-3mrna-sequencing/
3-prime-rna-seq:
activate: true
# this allows to plot QC of aligned read postion for specific transcripts (or 'all' transcripts)
# Specify vendor of the used protocol. Currently, only lexogene is supported.
vendor: lexogen
plot-qc: all



resources:
ref:
# ensembl species name
species: homo_sapiens
# ensembl release version
release: "104"
# genome build
build: GRCh38
# pfam release to use for annotation of domains in differential splicing analysis
pfam: "33.0"
representative_transcripts: canonical
ontology:
# gene ontology to download, used e.g. in goatools
gene_ontology: "http://current.geneontology.org/ontology/go-basic.obo"

pca:
labels:
# columns of sample sheet to use for PCA
- condition

scatter:
# for use as diagnostic plots
# all samples are compared in pairs to assess their correlation
# scatter plots are only created if parameter 'activate' is set to 'true'
activate: true

diffexp:
# samples to exclude (e.g. outliers due to technical problems)
exclude:
# model for sleuth differential expression analysis
models:
model_X:
full: ~condition + batch_effect
reduced: ~batch_effect
# Binary valued covariate that shall be used for fold change/effect size
# based downstream analyses.
primary_variable: condition
base_level: untreated
# significance level to use for volcano, ma- and qq-plots
sig-level:
volcano-plot: 0.05
ma-plot: 0.05
qq-plot: 0.05
# Optional (comment in to use): provide a list of genes that shall be shown in a heatmap
# and for which bootstrap plots (see below) shall be created.
genes_of_interest:
activate: false
genelist: "resources/gene_list.tsv"

diffsplice:
activate: true
# codingCutoff parameter of isoformSwitchAnalyzer, see
# https://rdrr.io/bioc/IsoformSwitchAnalyzeR/man/analyzeCPAT.html
coding_cutoff: 0.725
# Should be set to true when using de-novo assembled transcripts.
remove_noncoding_orfs: false
# False discovery rate to control for.
fdr: 1.0
# Minimum size of differential isoform usage effect
# (see dIFcutoff, https://rdrr.io/github/kvittingseerup/IsoformSwitchAnalyzeR/man/IsoformSwitchTestDEXSeq.html)
min_effect_size: 0.0

enrichment:
goatools:
# tool is only run if set to `true`
activate: true
fdr_genes: 0.05
fdr_go_terms: 0.05
fgsea:
gene_sets_file: "../ngs-test-data/ref/dummy.gmt"
# tool is only run if set to `true`
activate: true
# if activated, you need to provide a GMT file with gene sets of interest
fdr_gene_set: 0.05
eps: 0.0001
spia:
# tool is only run if set to `true`
activate: true
# pathway database to use in SPIA, needs to be available for
# the species specified by resources -> ref -> species above
pathway_database: "panther"

bootstrap_plots:
# desired false discovery rate for bootstrap plots, i.e. a lower FDR will result in fewer boxplots generated
FDR: 0.01
# maximum number of bootstrap plots to generate, i.e. top n discoveries to plot
top_n: 3
color_by: condition
# for now, this will plot the sleuth-normalised kallisto count estimations with kallisto
# for all the transcripts of the respective genes

plot_vars:
# significance level used for plot_vars() plots
sig_level: 0.1

params:
kallisto: "-b 100"
# these cutadapt parameters need to contain the required flag(s) for
# the type of adapter(s) to trim, i.e.:
# * https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
# * `-a` for 3' adapter in the forward reads
# * `-g` for 5' adapter in the forward reads
# * `-b` for adapters anywhere in the forward reads
# also, separate capitalised letter flags are required for adapters in
# the reverse reads of paired end sequencing:
# * https://cutadapt.readthedocs.io/en/stable/guide.html#trimming-paired-end-reads
cutadapt-se:
adapters: "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
extra: "-q 20"
# reasoning behind parameters:
# For reads that are produced by 3’-end sequencing, depending on the protocol, it might be recommended to remove some leading bases (e.g. see https://www.nature.com/articles/s41598-019-55434-x#Sec10)
# * `--minimum-length 33`:
# * kallisto needs non-empty reads in current versions (fixed for future releases:
# https://github.com/pachterlab/kallisto/commit/64fe837ca86f3664496483bcd2787c9376584fed)
# * kallisto default k-mer length is 31 and 33 should give at least 3 k-mers for a read
# * `-e 0.005`: the default cutadapt maximum error rate of `0.2` is far too high, for Illumina
# data the error rate is more in the range of `0.005` and setting it accordingly should avoid
# false positive adapter matches
# * `--minimum-overlap 7`: the cutadapt default minimum overlap of `5` did trimming on the level
# of expected adapter matches by chance
cutadapt-pe:
adapters: "-a ACGGATCGATCGATCGATCGAT -g GGATCGATCGATCGATCGAT -A ACGGATCGATCGATCGATCGAT -G GGATCGATCGATCGATCGAT"
extra: "--minimum-length 33 -e 0.005 --overlap 7"
5 changes: 5 additions & 0 deletions .test/3-prime-config/config/samples.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
sample condition batch_effect
A treated batch1
B untreated batch1
C treated batch2
D untreated batch2
6 changes: 6 additions & 0 deletions .test/3-prime-config/config/units.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sample unit fragment_len_mean fragment_len_sd fq1 fq2
A 1 300 14 ../ngs-test-data/reads/a.chr21.2.fq
B 1 300 14 ../ngs-test-data/reads/b.chr21.1.fq
B 2 300 14 ../ngs-test-data/reads/b.chr21.2.fq
C 1 300 14 ../ngs-test-data/reads/a.chr21.2.fq
D 1 300 14 ../ngs-test-data/reads/b.chr21.2.fq
33 changes: 33 additions & 0 deletions .test/3-prime-config/workflow/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from snakemake.utils import min_version

min_version("7.17.0")


configfile: "config/config.yaml"


report: "report/workflow.rst"


# this container defines the underlying OS for each job when using the workflow
# with --use-conda --use-singularity
container: "docker://continuumio/miniconda3"


include: "rules/common.smk"
include: "rules/trim.smk"
include: "rules/trim_3prime.smk"
include: "rules/qc_3prime.smk"
include: "rules/ref.smk"
include: "rules/ref_3prime.smk"
include: "rules/quant.smk"
include: "rules/quant_3prime.smk"
include: "rules/diffexp.smk"
include: "rules/diffsplice.smk"
include: "rules/enrichment.smk"
include: "rules/datavzrd.smk"


rule all:
input:
all_input,
20 changes: 17 additions & 3 deletions .test/config/config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
samples: config/samples.tsv
units: config/units.tsv

experiment:
# If set to `true`, this option allows the workflow to analyse 3-prime RNA seq data obtained from Quantseq protocol by Lexogen.
# For more information https://www.lexogen.com/quantseq-3mrna-sequencing/
3-prime-rna-seq:
activate: false
# Specify vendor of the used protocol. Currently, only lexogene is supported.
vendor: lexogen
# this allows to plot QC of aligned read postion for specific transcripts (or 'all' transcripts)
plot-qc: all

resources:
ref:
# ensembl species name
Expand Down Expand Up @@ -44,6 +54,11 @@ diffexp:
volcano-plot: 0.05
ma-plot: 0.05
qq-plot: 0.05
# Optional (comment in to use): provide a list of genes that shall be shown in a heatmap
# and for which bootstrap plots (see below) shall be created.
genes_of_interest:
activate: true
genelist: "resources/gene_list.tsv"

diffsplice:
activate: true
Expand Down Expand Up @@ -86,8 +101,6 @@ bootstrap_plots:
color_by: condition
# for now, this will plot the sleuth-normalised kallisto count estimations with kallisto
# for all the transcripts of the respective genes
genes_of_interest:
- A4galt

plot_vars:
# significance level used for plot_vars() plots
Expand All @@ -108,6 +121,7 @@ params:
adapters: "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
extra: "-q 20"
# reasoning behind parameters:
# For reads that are produced by 3’-end sequencing, depending on the protocol, it might be recommended to remove some leading bases (e.g. see https://www.nature.com/articles/s41598-019-55434-x#Sec10)
# * `--minimum-length 33`:
# * kallisto needs non-empty reads in current versions (fixed for future releases:
# https://github.com/pachterlab/kallisto/commit/64fe837ca86f3664496483bcd2787c9376584fed)
Expand All @@ -119,4 +133,4 @@ params:
# of expected adapter matches by chance
cutadapt-pe:
adapters: "-a ACGGATCGATCGATCGATCGAT -g GGATCGATCGATCGATCGAT -A ACGGATCGATCGATCGATCGAT -G GGATCGATCGATCGATCGAT"
extra: "--minimum-length 33 -e 0.005 --overlap 7"
extra: "--minimum-length 33 -e 0.005 --overlap 7"
18 changes: 18 additions & 0 deletions .test/resources/gene_list.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
STAT1
IRF1
HLA-A
HLA-DRB1
TYR
PMEL
DCT
MLANA
MITF
CDK2
SOX10
ERBB3
LEF1
CTNNB1
CDH1
FN1
NGFR
AXL
23 changes: 21 additions & 2 deletions config/config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
samples: config/samples.tsv
units: config/units.tsv

experiment:
# If set to `true`, this option allows the workflow to analyse 3-prime RNA seq data obtained from Quantseq protocol by Lexogen.
# For more information https://www.lexogen.com/quantseq-3mrna-sequencing/
3-prime-rna-seq:
activate: false
# Specify vendor of the used protocol. Currently, only lexogen is supported.
vendor: lexogen
# this allows to plot QC of aligned read postion for specific transcripts (or 'all' transcripts)
plot-qc: all



resources:
ref:
# ensembl species name
Expand Down Expand Up @@ -52,6 +64,11 @@ diffexp:
volcano-plot: 0.05
ma-plot: 0.05
qq-plot: 0.05
# Optional (comment in to use): provide a list of genes that shall be shown in a heatmap
# and for which bootstrap plots (see below) shall be created.
genes_of_interest:
activate: false
genelist: "resources/gene_list.tsv"
manuelphilip marked this conversation as resolved.
Show resolved Hide resolved

diffsplice:
activate: true
Expand Down Expand Up @@ -95,14 +112,14 @@ bootstrap_plots:
color_by: condition
# for now, this will plot the sleuth-normalised kallisto count estimations with kallisto
# for all the transcripts of the respective genes
genes_of_interest:
- A4galt

plot_vars:
# significance level used for plot_vars() plots
sig_level: 0.1

params:
#For reads that are produced by 3’-end sequencing, the --single-overhang option does not discard
#reads where the expected fragment size goes beyond the transcript start
kallisto: "-b 100"
# these cutadapt parameters need to contain the required flag(s) for
# the type of adapter(s) to trim, i.e.:
Expand All @@ -113,10 +130,12 @@ params:
# also, separate capitalised letter flags are required for adapters in
# the reverse reads of paired end sequencing:
# * https://cutadapt.readthedocs.io/en/stable/guide.html#trimming-paired-end-reads

cutadapt-se:
adapters: "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
extra: "-q 20"
# reasoning behind parameters:
# For reads that are produced by 3’-end sequencing, depending on the protocol, it might be recommended to remove some leading bases (e.g. see https://www.nature.com/articles/s41598-019-55434-x#Sec10)
# * `--minimum-length 33`:
# * kallisto needs non-empty reads in current versions (fixed for future releases:
# https://github.com/pachterlab/kallisto/commit/64fe837ca86f3664496483bcd2787c9376584fed)
Expand Down
Loading