This project generates an in silico metatranscriptomic dataset based on specified parameters.
It is recomended to install the package with conda install.
Build the package with:
conda build .
For this you need to have conda-build installed (conda install conda-build
)
Create new environment and install package:
conda create -n marbel
conda activate marbel
conda install --use-local marbel
You need to install R and the R library polyester. Polyester can be installed with
R
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("polyester")
Install the package:
pip install -e .
To get help on how to use the script, run:
marbel --help
Usage: marbel [OPTIONS]
Options:
--n-species INTEGER Number of species to be drawn for the metatranscriptomic in silico dataset [default: 20]
--n-orthogroups INTEGER Number of orthologous groups to be drawn for the metatranscriptomic in silico dataset [default: 1000]
--n-samples <INTEGER INTEGER>... Number of samples to be created for the metatranscriptomic in silico datasetthe first number is the number of samples for group 1 and the second is the number of samples for group 2 [default: 10, 10]
--outdir TEXT Output directory for the metatranscriptomic in silico dataset [default: simulated_reads]
--max-phylo-distance TEXT Maximum mean phylogenetic distance for orthologous groups. Specify stricter limit to avoid groups with a more diverse phylogenetic distance. [default: None]
--min-identity FLOAT Minimum mean sequence identity score for orthologous groups. Specify for more stringent identity requirements. [default: None]
--deg-ratio <FLOAT FLOAT>... Ratio of up- and down-regulated genes. The first value is the ratio of up-regulated genes, the second represents the ratio of down-regulated genes [default: 0.1, 0.1]
--seed INTEGER Seed for sampling. Set for reproducibility [default: None]
--read-length INTEGER Read length for the generated reads [default: 100]
--output-format [fastq.gz|fastq|fasta] Output format for the reads [default: fastq.gz]
--version Show the version and exit.
--help Show this message and exit.
marbel
marbel --n-species 30 --n-orthogroups 1500 --n-samples 15 20
This command will generate a dataset with:
- 30 species
- 1500 orthologous groups
- 15 samples for group 1
- 20 samples for group 2
Contributions are welcome! Please open an issue or submit a pull request for any changes.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Feel free to reach out if you have any questions or need further assistance with the usage of the tool.