Skip to content

Commit

Permalink
feat!: latest development for new release (#133)
Browse files Browse the repository at this point in the history
* chore: Update development (#128)

* docs: enhancing documentation

* docs: better quickstart

* chore: ubdate github actions to setup-micromamba

* docs: remove default channel from environment file

* docs: improvements, like QC report (#125)

* added .DS_Store to gitignore.

* Fixed the overflow of the features section by using the table.

* Fixed the broked report link.

* Sample QC report HTML file

* Added the link to the QC report in experiment.

* Added the assignment QC report.

* Add link to QC report in assignment documentation

* Update documentation in quickstart.rst. Fixed typos and gramatical mistakes.

* Update documentation in index.rst. Fix typos and grammatical mistakes.

* Fix typo in installation documentation

* Refactor documentation in config.rst

---------

Co-authored-by: Max <visze@users.noreply.github.com>

* docs: Fixed the link for the QC report in Experiment and Assignment (#126)

* added .DS_Store to gitignore.

* Fixed the overflow of the features section by using the table.

* Fixed the broked report link.

* fixed typo project

* Typo fix controlled

* Sample QC report HTML file

* Added the link to the QC report in experiment.

* Added the assignment QC report.

* Add link to QC report in assignment documentation

* Update documentation in quickstart.rst. Fixed typos and gramatical mistakes.

* Update documentation in index.rst. Fix typos and grammatical mistakes.

* Fix typo in installation documentation

* Refactor documentation in config.rst

* Update documentation links in assignment.rst and experiment.rst

* Testing the iframe html file.

* Update documentation links in assignment.rst and experiment.rst

---------

Co-authored-by: Max <visze@users.noreply.github.com>

* chore: delete not necessary files

* docs: automatic versioning

* style: automatic version printing of MPRAsnakeflow

* fix: memory resources for bbmap (#123)

* fix: add memory resources for bbmap

* set lower memm in bbmap workflow profile

* increasing memory for bmap

---------

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>
Co-authored-by: Max Schubach <max.schubach@bihealth.de>

* fix: Detach from anaconda (#122)

* fix: detach from anaconda. Remove defaults conda channels

* fixing linting errors

* update hashes in dockerfile from lining errors

---------

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>

* chore(master): release MPRAsnakeflow 0.1.1 (#124)

* chore(master): release MPRAsnakeflow 0.1.1

* Update .release-please-manifest.json

* Update version.txt

* Update CHANGELOG.md

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Max <visze@users.noreply.github.com>

* forgot to upgrade two envs

* docs: correct link in docs badge

---------

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>
Co-authored-by: Ali <69039717+bioinformaticsguy@users.noreply.github.com>
Co-authored-by: Max Schubach <max.schubach@bihealth.de>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* feat!: igvf outputs (#129)

* refactor: removed statistics from final barcode to oligo map

* refactor outputs

* fix scripts due to renaming headers

* fix assignment statistic due to new output

* refactor!: moving files. not attched counts are not used as well as median for scaling

* adding logs

---------

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>

* chore!: supporting only snakemake >=8.24.1 (#130)

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>

* refactor!: No min max length for bbmap. default mapq is 30. (#131)

Changes for bbmap
* no min an max for sequence length and start. (like exact matching)
* using default of 30 mapq instead of 35

* feat!: outlier removal (#132)

* feat!: outlier detection
Might break older config files

* docs: update documentation for bbmap, apptainer and outlier removal

* use abs for zscore

* trying to fix outlier via zscore

* mad code change

* change outlier removal default to zscore

---------

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>

* edit config

---------

Co-authored-by: Max Schubach <max.schubach@bih-charite.de>
Co-authored-by: Ali <69039717+bioinformaticsguy@users.noreply.github.com>
Co-authored-by: Max Schubach <max.schubach@bihealth.de>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
  • Loading branch information
5 people authored Nov 5, 2024
1 parent b7f4cfd commit bdfc557
Show file tree
Hide file tree
Showing 33 changed files with 671 additions and 399 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ logs
!config/*
!resources
!resources/**
resources/**/.local
resources/**/.cache
resources/**/.ipython
!workflow
!workflow/**
!.gitattributes
Expand All @@ -27,4 +30,4 @@ mix_data
*report.html
*.simg
*results
.DS_Store
.DS_Store
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Snakemake workflow: MPRAsnakeflow

[![Documentation Status](https://readthedocs.org/projects/mprasnakeflow/badge/?version=latest)](https://mprasnakeflow.readthedocs.io/latest/?badge=latest)
[![Snakemake](https://img.shields.io/badge/snakemake-≥7.2.1-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Snakemake](https://img.shields.io/badge/snakemake-≥8.24.1-brightgreen.svg)](https://snakemake.github.io/)
[![Tests](https://github.com/kircherlab/MPRAsnakeflow/actions/workflows/main.yml/badge.svg)](https://github.com/kircherlab/MPRAsnakeflow/actions/workflows/main.yml)

This pipeline processes sequencing data from Massively Parallel Reporter Assays (MPRA) to create count tables for candidate sequences tested in the experiment.
Expand Down Expand Up @@ -33,17 +33,17 @@ Create or adjust the `config/example_config.yaml` in the repository to your need

### Step 3: Install Snakemake

Install Snakemake (recommended version >= 8.x) using [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) or [mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) (recommended installation via [miniforge](https://github.com/conda-forge/miniforge)):
Install Snakemake (version >= 8.24.1) using [conda >24.7.1](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) (recommended installation via [miniforge](https://github.com/conda-forge/miniforge)):

mamba create -c bioconda -n snakemake snakemake
conda create -c bioconda -n snakemake snakemake

For installation details, see the [instructions in the Snakemake documentation](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html).

### Step 4: Execute workflow

Activate the conda environment:

mamba activate snakemake
conda activate snakemake

Test your configuration by performing a dry-run via

Expand All @@ -58,9 +58,6 @@ using `$N` cores or run it in a cluster environment (here SLURM) via the [slurm
snakemake --software-deployment-method conda --executor slurm --cores $N --configfile config.yaml --workflow-profile profiles/default

Please note that `profiles/default/config.yaml` has to be adapted to your needs (like partition names).
For snakemake 7.x this might work too using slurm sbatch (but depricated in newer snakemake versions:

snakemake --use-conda --configfile config.yaml --cluster "sbatch --nodes=1 --ntasks={cluster.threads} --mem={cluster.mem} -t {cluster.time} -p {cluster.queue} -o {cluster.output}" --jobs 100 --cluster-config config/sbatch.yaml


Please note that the log folder of the cluster environment has to be generated first, e.g:
Expand All @@ -71,7 +68,7 @@ For other cluster environments please check the [Snakemake](https://snakemake.re

If you not only want to fix the software stack but also the underlying OS, use

snakemake --sdm apptainer,conda --cores $N --configfile config.yaml --workflow-profile profiles/default
snakemake --sdm apptainer conda --cores $N --configfile config.yaml --workflow-profile profiles/default

in combination with any of the modes above. This will use a pre-build singularity container of MPRAsnakeflow with the conda ens installed in.

Expand Down
10 changes: 3 additions & 7 deletions config/example_assignment_bbmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,9 @@ assignments:
alignment_tool:
tool: bbmap
configs:
min_mapping_quality: 35 # integer >=0. 35 is default
sequence_length: # sequence length of design excluding adapters.
min: 166
max: 175
alignment_start: # start of an alignment in the reference/design_file. Here using 15 bp adapters. Can be different when using adapter free approaches
min: 1 # integer
max: 3 # integer
min_mapping_quality: 30 # 30 is default for bbmap
sequence_length: 171 # sequence length of design excluding adapters.
alignment_start: 1 # start of an alignment in the reference/design_file. Here using 15 bp adapters. Can be different when using adapter free approaches
FW:
- resources/Assignment_BasiC/R1.fastq.gz
BC:
Expand Down
4 changes: 2 additions & 2 deletions config/example_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ assignments:
exampleAssignment: # name of an example assignment (can be any string)
bc_length: 15
alignment_tool:
tool: exact # bbbmap, bwa or exact
tool: exact # bbmap, bwa or exact
configs:
sequence_length: 170 # sequence length of design excluding adapters.
sequence_length: 171 # sequence length of design excluding adapters.
alignment_start: 1 # start of the alignment in the reference/design_file
FW:
- resources/assoc_basic/data/SRR10800986_1.fastq.gz
Expand Down
2 changes: 1 addition & 1 deletion docs/assignment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Mandatory arguments:
:\-\-configfile:
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first). (default: None)
:\-\-sdm:
**Required to run MPRAsnakeflow.** : :code:`--sdm conda` or :code:`--sdm apptainer` Uses the defined conda environment per rule. We highly recommend to use apptainer where we build a predefined docker container with all software installewd within it. :code:`--sdm conda` teh conda envs will be installed by the first excecution of the workflow. If this flag is not set, the conda/apptainer directive is ignored. (default: False)
**Required to run MPRAsnakeflow.** : :code:`--sdm conda` or :code:`--sdm apptainer conda` Uses the defined conda environment per rule. We highly recommend to use apptainer where we build a predefined docker container with all software installewd within it. :code:`--sdm conda` teh conda envs will be installed by the first excecution of the workflow. If this flag is not set, the conda/apptainer directive is ignored. (default: False)
Recommended arguments:
:\-\-snakefile:
You should not need to specify this. By default, Snakemake will search for 'Snakefile', 'snakefile', 'workflow/Snakefile','workflow/snakefile' beneath the current working directory, in this order. Only if you definitely want a different layout, you need to use this parameter. This is very usefull when you want to have the results in a different folder than MPRAsnakeflow is in. (default: None)
Expand Down
6 changes: 3 additions & 3 deletions docs/cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ Using the slurm excecutor plugin running 300 jobs in parallel.
snakemake --sdm conda --configfile config/config.yaml -j 300 --workflow-profile profiles/default --executor slurm
Snakemake 7
-----------
Snakemake 7 (not supported anymore)
-------------------------------------

Here we used the :code:`--cluster` option which is not anymo,onger available in snakemake 8. You can also use the predefined `config/sbatch.yaml` but this might be outdated and we highly recommend to use resources with the workfloe profile.
Here we used the :code:`--cluster` option which is not available in snakemake 8. You can also use the predefined `config/sbatch.yaml` but this might be outdated and we highly recommend to use resources with the workfloe profile.

.. code-block:: bash
Expand Down
34 changes: 20 additions & 14 deletions docs/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,19 @@ For each assignment you want to process you have to give him a name like :code:`
Alignment tool configuration that is used to map the reads to the oligos.

:tool:
Alignment tool that is used. Currently :code:`bwa` and :code:`exact` are supported.
Alignment tool that is used. Currently :code:`bbmap` :code:`bwa`, :code:`exact` are supported. Default is :code:`bbmap`.
:configs:
Configurations of the alignment tool selected.

:sequence_length (bwa):
Defines the :code:`min` and :code:`max` of a :code:`sequence_length` specify. :code:`sequence_length` is basically the length of a sequence alignment to an oligo in the design file. Because there can be insertion and deletions we recommend to vary it a bit around the exact length (e.g. +-5). In theory, this option enables designs with multiple sequence lengths.
:alignment_start (bwa):
Defines the :code:`min` and :code:`max` of the start of the alignment in an oligo. When using adapters you have to set basically the length of the adapter. Otherwise, 1 will be the choice for most cases. We also recommend varying this value a bit because the start might not be exact after the adapter. E.g. by +-1.
:min_mapping_quality (bwa):
(Optional) Defines the minimum mapping quality (MAPQ) of the alignment to an oligo. When using oligos with only 1bp difference it is recommended to set it to 1. For regions only with larger edit distances 30 or 40 might be a good choice. Default :code:`1`.
:sequence_length (exact):
:min_mapping_quality (bwa, bbmap):
(Optional) Defines the minimum mapping quality (MAPQ) of the alignment to an oligo. MAPQs are different between bbmap and bwa. For bwa: When using oligos with only 1bp difference it is recommended to set it to 1. BBMap is better here and we can use for example 30 or 35- For regions only with larger edit distances 30 or 40 might be a good choice. Default :code:`30` (use bbmap).
:sequence_length (exact, bbmap):
Defines the :code:`sequence_length` which is the length of a sequence alignment to an oligo in the design file. Only one length design is supported.
:alignment_start (exact):
:alignment_start (exact, bbmap):
Defines the start of the alignment in an oligo. When using adapters you have to set basically the length of the adapter. Otherwise, 1 will be the choice for most cases.

:bc_length:
Expand Down Expand Up @@ -168,16 +168,22 @@ The experiment workflow is configured in the :code:`experiments` section. Each e

:bc_threshold:
Minimum number of different BCs required per oligo. A higher value normally increases the correlation betwene replicates but also reduces the number of final oligos. Default option is :code:`10`.
:DNA:
Settings for DNA

:min_counts:
Mimimum number of DNA counts per barcode. When set to :code:`0` a pseudo count is added. Default option is :code:`1`.
:RNA:
Settings for DNA
:min_dna_counts:
Mimimum number of DNA counts per barcode. When set to :code:`0` a pseudo count is added. Default option is :code:`1`.
:min_rna_counts:
Mimimum number of RNA counts per barcode. When set to :code:`0` a pseudo count is added. Default option is :code:`1`.
:outlier_detection:
(Optional) Outlier detection. Methods and strategies to remove outlier barcodes in the final counts. The following options are possible:

:method:
Method to remove outliers. Currently :code:`rna_counts_zscore`, :code:`ratio_mad` or :code:`none` (no outlier detection) are supported. Default option is :code:`rna_counts_zscore`.
:mad_bins:
(Optional) For method :code:`ratio_mad`: Number of bins for the median absolute deviation (MAD) method. Default option is :code:`20`.
:times_mad:
(Optional) For method :code:`ratio_mad`: Times the MAD to remove outliers. Default option is :code:`5`.
:times_zscore:
(Optional) For method :code:`rna_counts_zscore`: Times the zscore to remove outliers. Default option is :code:`3`.

:min_counts:
Mimimum number of RNA counts per barcode. When set to :code:`0` a pseudo count is added. Default option is :code:`1`.
:sampling:
(Optional) Options for sampling counts and barcodes. Just for debug reasons.

Expand Down
2 changes: 1 addition & 1 deletion docs/experiment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Mandatory arguments:
:\-\-configfile:
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first). (default: None)
:\-\-sdm:
**Required to run MPRAsnakeflow.** : :code:`--sdm conda` or :code:`--sdm apptainer` Uses the defined conda environment per rule. We highly recommend to use apptainer where we build a predefined docker container with all software installewd within it. :code:`--sdm conda` teh conda envs will be installed by the first excecution of the workflow. If this flag is not set, the conda/apptainer directive is ignored. (default: False)
**Required to run MPRAsnakeflow.** : :code:`--sdm conda` or :code:`--sdm apptainer conda` Uses the defined conda environment per rule. We highly recommend to use apptainer where we build a predefined docker container with all software installewd within it. :code:`--sdm conda` the conda envs will be installed by the first excecution of the workflow. If this flag is not set, the conda/apptainer directive is ignored. (default: False)
Recommended arguments:
:\-\-snakefile:
You should not need to specify this. By default, Snakemake will search for 'Snakefile', 'snakefile', 'workflow/Snakefile','workflow/snakefile' beneath the current working directory, in this order. Only if you definitely want a different layout, you need to use this parameter. This is very usefull when you want to have the results in a different folder than MPRAsnakeflow is in. (default: None)
Expand Down
12 changes: 6 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@
MPRAsnakeflow's documentation
====================================

.. image:: https://img.shields.io/badge/snakemake-≥7.7.1-brightgreen.svg
:target: https://snakemake.bitbucket.io
.. image:: https://img.shields.io/badge/snakemake-≥8.24.1-brightgreen.svg
:target: https://snakemake.github.io/

.. image:: https://img.shields.io/badge/mamba-≥4.6-brightgreen.svg
:target: https://docs.conda.io/en/latest/miniconda.html
.. image:: https://img.shields.io/badge/conda->24.7.1-brightgreen.svg
:target: https://github.com/conda-forge/miniforge


**Welcome!**

MPRAsnakeflow pipeline processes sequencing data from Massively Parallel Reporter Assays (MPRAs)
to create count tables for candidate sequences tested in the experiment.

MPRAsnakeflow is built on top of `Snakemake <https://snakemake.readthedocs.io/>`_ (version 8 preferred) and is configured via a ``.yaml`` file.
MPRAsnakeflow is built on top of `Snakemake <https://snakemake.readthedocs.io/>`_ (version ≥8.24.1 required) and is configured via a ``.yaml`` file.

Authors
Max Schubach (`@visze <https://github.com/visze>`_)
Expand Down Expand Up @@ -74,7 +74,7 @@ Features
* - Option
- Description
* - ``--software-deployment-method``
- When ``conda`` is set, the utility uses mamba to efficiently query repositories and query package dependencies. MPRAsnakeflow also can use containers via apptainer by using ``--software-deployment-method apptainer``. Recommended option: ``--software-deployment-method conda apptainer``
- When ``conda`` is set, the utility uses conda to efficiently query repositories and query package dependencies. MPRAsnakeflow also can use containers via apptainer by using ``--software-deployment-method apptainer conda``. This will use a container to run all rules but inside it will activate the pre-installed conda environments. Recommended option: ``--software-deployment-method apptainer conda``
* - ``--cores``
- This utility sets the number of cores (``$N``) to be used by MPRAsnakeflow.
* - ``--configfile``
Expand Down
10 changes: 5 additions & 5 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Package management

.. code-block:: bash
conda (mamba) 4.6 or above
conda >24.7.1 or above
Download here: https://github.com/conda-forge/miniforge

Expand All @@ -36,7 +36,7 @@ Workflow language

.. code-block:: bash
snakemake 8.16.0 or above (snakemake >=7.15.1 will also work but cli might be different as here documented)
snakemake 8.24.1 or above
Download here: https://snakemake.readthedocs.io/

Expand All @@ -47,17 +47,17 @@ Clone repository
Download here: https://github.com/kircherlab/MPRAsnakeflow.git


Set up snakemake environment with conda/mamba
Set up snakemake environment with conda
=============================================

This pipeline uses python2.7 and python3.6 with additional R scripts in a Snakemake pipeline. The ``.yml`` files provided will create the appropriate environments and is completely handled by MPRAsnakeflow. The whole pipeline is set up to run on a Linux system.
This pipeline uses python2.7 and python ≥3.7 with additional R scripts in a Snakemake pipeline. The ``.yml`` files provided will create the appropriate environments and is completely handled by MPRAsnakeflow. The whole pipeline is set up to run on a Linux system.

Install the the conda environment. The general conda environment is called ``snakemake``.

.. code-block:: bash
cd MPRAsnakeflow
mamba create -c conda-forge -c bioconda -n snakemake snakemake
conda create -c conda-forge -c bioconda -n snakemake snakemake
# activate snakemake
conda activate snakemake
Expand Down
8 changes: 2 additions & 6 deletions resources/assoc_basic/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,8 @@ assignments:
alignment_tool:
tool: bbmap
configs:
sequence_length:
min: 166
max: 175
alignment_start:
min: 1
max: 3
sequence_length: 171
alignment_start: 1
FW:
- data/SRR10800986_1.fastq.gz
BC:
Expand Down
8 changes: 2 additions & 6 deletions resources/combined_basic/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,8 @@ assignments:
alignment_tool:
tool: bbmap
configs:
sequence_length:
min: 166
max: 175
alignment_start:
min: 1
max: 3
sequence_length: 171
alignment_start: 1
FW:
- data/SRR10800986_1.fastq.gz
BC:
Expand Down
8 changes: 8 additions & 0 deletions resources/count_basic/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,11 @@ experiments:
design_file: design.fa
configs:
default: {}
outlierNone:
filter:
outlier_detection:
method: none
outlierZscore:
filter:
outlier_detection:
method: rna_counts_zscore
Loading

0 comments on commit bdfc557

Please sign in to comment.