Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: exact matches in assignment #90

Merged
merged 13 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 0 additions & 80 deletions config/config.yaml

This file was deleted.

27 changes: 27 additions & 0 deletions config/example_assignment_bwa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
global: # generall configs effecting one or multiple parts
threads: 1
assignments:
split_number: 1 # number of files fastq should be split for parallelization
assignments:
exampleAssignment: # name of an example assignment (can be any string)
bc_length: 15
alignment_tool:
tool: bwa # bwa or exact
configs:
min_mapping_quality: 1 # integer >=0 Please use 1 when you have oligos that differ by 1 base in your reference/design file
sequence_length: # sequence length of design excluding adapters.
min: 166
max: 175
alignment_start: # start of an alignment in the reference. Here using 15 bp adapters. Can be different when using adapter free approaches
min: 1 # integer
max: 3 # integer
FW:
- resources/Assignment_BasiC/R1.fastq.gz
BC:
- resources/Assignment_BasiC/R2.fastq.gz
REV:
- resources/Assignment_BasiC/R3.fastq.gz
reference: resources/design.fa
configs:
default: {} # name of an example filtering config
24 changes: 24 additions & 0 deletions config/example_assignment_exact_lazy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
global: # generall configs effecting one or multiple parts
threads: 1
assignments:
split_number: 1 # number of files fastq should be split for parallelization
assignments:
exampleAssignment: # name of an example assignment (can be any string)
bc_length: 15
alignment_tool:
tool: exact # bwa or exact
configs:
sequence_length: 170 # sequence length of design excluding adapters.
alignment_start: 1 # start of the alignment in the reference
FW:
- resources/Assignment_BasiC/R1.fastq.gz
BC:
- resources/Assignment_BasiC/R2.fastq.gz
REV:
- resources/Assignment_BasiC/R3.fastq.gz
reference: resources/design.fa
configs:
lazy: # name of an example filtering config
min_support: 2 # default 3
fraction: 0.6 # default 0.75
22 changes: 22 additions & 0 deletions config/example_assignment_exact_linker.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
global: # generall configs effecting one or multiple parts
threads: 1
assignments:
split_number: 1 # number of files fastq should be split for parallelization
assignments:
exampleAssignment: # name of an example assignment (can be any string)
bc_length: 20
BC_rev_comp: true
linker: TCTAGACCGTCACTAACTAACAGTGGGTACCC
alignment_tool:
tool: exact # bwa or exact
configs:
sequence_length: 170 # sequence length of design excluding adapters.
alignment_start: 1 # start of the alignment in the reference
FW:
- resources/Assignment_BasiC/R1.fastq.gz
REV:
- resources/Assignment_BasiC/R3.fastq.gz
reference: resources/design.fa
configs:
default: {} # name of an example filtering config
51 changes: 19 additions & 32 deletions config/example_config.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,23 @@
---
global: # generall configs effecting one or multiple parts
global: # generall configs effecting one or multiple parts
threads: 1
assignments:
split_number: 1 # number of files fastq should be split for parallelization
split_number: 1 # number of files fastq should be split for parallelization
assignments:
exampleAssignment: # name of an example assignment (can be any string)
exampleAssignment: # name of an example assignment (can be any string)
bc_length: 15
sequence_length: # sequence length of design excluding adapters.
min: 195
max: 205
alignment_start: # start of an alignment in the reference. Here using 15 bp adapters. Can be different when using adapter free approaches
min: 15 # integer
max: 17 # integer
min_mapping_quality: 1 # integer >=0 Please use 1 when you have oligos that differ by 1 base in your reference/design file
alignment_tool:
tool: exact # bwa or exact
configs:
sequence_length: 170 # sequence length of design excluding adapters.
alignment_start: 1 # start of the alignment in the reference
# min_mapping_quality: 1 # integer >=0 Please use 1 when you have oligos that differ by 1 base in your reference/design file
# sequence_length: # sequence length of design excluding adapters.
# min: 195
# max: 205
# alignment_start: # start of an alignment in the reference. Here using 15 bp adapters. Can be different when using adapter free approaches
# min: 15 # integer
# max: 17 # integer
FW:
- resources/Assignment_BasiC/R1.fastq.gz
BC:
Expand All @@ -21,9 +26,7 @@ assignments:
- resources/Assignment_BasiC/R3.fastq.gz
reference: resources/design.fa
configs:
exampleAssignmentConfig: # name of an example filtering config
min_support: 3
fraction: 0.7
default: {} # name of an example filtering config
experiments:
exampleCount:
bc_length: 15
Expand All @@ -38,24 +41,8 @@ experiments:
fromWorkflow:
type: config
assignment_name: exampleAssignment
assignment_config: exampleAssignmentConfig
assignment_config: default
design_file: resources/design.fa
label_file: resources/labels.tsv # optional
label_file: resources/labels.tsv # optional
configs:
exampleConfig:
filter:
bc_threshold: 10
DNA:
min_counts: 1
RNA:
min_counts: 1
sampling: # optional, just for benmarking
DNA:
total: 30000000
threshold: 300
RNA:
total: 50000000
threshold: 300
variants: # optional
map: resources/variant_map.tsv
min_barcodes: [5, 10] # min BC for ref and alt sequence
default: {} # name of an example filtering config
18 changes: 15 additions & 3 deletions config/sbatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ assignment_fastq_split:
threads: 1
mem: 10G
queue: medium
assignment_mapping:
assignment_mapping_bwa:
time: "0-02:00"
threads: 30
mem: 10G
Expand All @@ -46,6 +46,11 @@ assignment_collectBCs:
threads: 20
mem: 10G
queue: medium
assignment_mapping_exact:
time: "0-01:00"
threads: 1
mem: 10G
queue: debug
assignment_statistic_totalCounts:
time: "0-01:00"
threads: 1
Expand Down Expand Up @@ -81,12 +86,16 @@ counts_umi_raw_counts:
mem: 6G
queue: "medium"
counts_noUMI_create_BAM:
time: "4-00:00"
time: 4-00:00
mem: 12G
queue: "medium"
queue: medium
counts_filter_counts:
time: 0-02:00
queue: medium
counts_final_counts_samplerer:
mem: 20G
queue: medium

#########################
### (ASSIGNED) COUNTS ###
#########################
Expand Down Expand Up @@ -133,6 +142,9 @@ statistic_counts_barcode_base_composition:
time: "1-00:00"
queue: "medium"
mem: 20G
statistic_counts_table:
time: 0-02:00
queue: medium

#############################
### Statistic/correlation ###
Expand Down
32 changes: 27 additions & 5 deletions docs/assignment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,24 @@ Example file:
>CRS4
TTAGACCGCCCTTTACCCCGAGAAAACTCAGCTACACACTC

Config File
-----------
Multiple mapping strategies are implemented to find the corresponding CRS sequence for each read. The mapping strategy can be chosen in the config file (bwa mem or exact matches). The config file also defines the filtering of the mapping results. The config file is a yaml file.

Example of an assignment file using bwa and the standard filtering:

.. literalinclude:: ../configs/example_assignment_bwa.yml
:language: yaml

Example of an assignment file using exact matches and the with and non-default filtering of barcodes:

.. literalinclude:: ../configs/example_assignment_exact_lazy.yml
:language: yaml

Example of an assignment file using exact matches and read 1 with BC, linker and oligo (no seperate BC index read):

.. literalinclude:: ../configs/example_assignment_exact_linker.yml
:language: yaml

snakemake
============================
Expand Down Expand Up @@ -67,7 +85,7 @@ all
assignment_attach_idx
Extract the index sequence and add it to the header.
assignment_bwa_ref
Create mapping reference for BWA from design file.
Create mapping reference for BWA from design file (code:`bwa` mapping approach).
assignment_collect
Collect mapped reads into one BAM.
assignment_collectBCs
Expand All @@ -82,12 +100,16 @@ assignmemt_hybridFWRead_get_reads_by_cutadapt
Get the barcode and read from the FW read using cutadapt (when no index BC read is present). Uses the paired end mode of cutadapt to write the FW and BC read.
assignment_merge
Merge the FW,REV and BC fastq files into one. Extract the index sequence from the middle and end of an Illumina run. Separates reads for Paired End runs. Merge/Adapter trim reads stored in BAM.
assignment_mapping
Map the reads to the reference.
assignment_mapping_bwa
Map the reads to the reference (code:`bwa` mapping approach).
assignment_idx_bam
Index the BAM file
Index the BAM file (code:`bwa` mapping approach).
assignment_flagstat
Run samtools flagstat. Results are in :code:`results/assignment/<assignment_name>/statistic/assignment/bam_stats.txt`
Run samtools flagstat. Results are in :code:`results/assignment/<assignment_name>/statistic/assignment/bam_stats.txt` (code:`bwa` mapping approach).
assignment_mapping_exact_reference
Create reference to map the exact design (code:`exact` mapping approach).
rule assignment_mapping_exact
Map the reads to the reference and sort using exact match (code:`exact` mapping approach).
assignment_getBCs
Get the barcodes (not filtered). Results are in :code:`results/assignment/<assignment_name>/barcodes_incl_other.sorted.tsv.gz`
assignment_statistic_totalCounts
Expand Down
Loading
Loading