This is a selection of scripts that will convert various bed files for ONT, oxBS and WGBS datasets into a format compliant with ChromHMM.
ChromHMM is great at binarizing at a simple level, but struggles for datasets that are not traditionally peak called. In addition to this, 'better' peak calling algorithms (like MACS) exist for ChIP-Seq and ATAC-Seq datasets. As such, a separate suite of scripts that binarize these datasets (into a format recognised by ChromHMM) is proposed here.
Note
In the following README (and greater repository), the word 'methylation' means any type of DNA methylation. As such, when more precise language is required, you will see instead '5mC' or '5hmC' (etc.). If at any point the wording feels ambiguous when it shouldn't be, please raise an issue.
In order to run these scripts you will need to first fill out the config file
(template provided in ./config-setup.txt
). It is recommended that you put
this config file near your data (note: this is not a requirement, you can
actually put this file anywhere you wish).
Next run the setup script with:
./setup
This setup script requires user input for removing SLURM directives and also when setting up conda environments. This was a conscious decision as you may want to check what is being installed by conda first. Also, this setup script will take quite some time due to the dependency tree (~49 packages) for R.
You will see the following message on success:
[1] "success"
If you do not see this success message, please open up an issue.
After completing setup, run scripts sequentially using SLURM workload manager:
sbatch path/to/script path/to/config/file
Note
If you want to get a quick summary of what a script does, run the script
without any positional parameters (you can just run it like a normal bash
script in this case, sbatch
is not required).
This pipeline requires a unix-flavoured OS and requires the following software to be installed. Versions are those that were used during testing, lower minor version numbers are likely to still work.
- bash (>=4.2.46(2))
- SLURM Workload Manager (>=20.02.3)
- Conda
- Any installation will do, this has worked on Miniconda 4.5.2 (from 2020)
- Make sure conda can be found on your
PATH
(check withwhich conda
)
- GNU awk (>=4.0.2)
- GNU gzip (>=1.5)
The following software and R packages are installed for you in the setup
script:
- Bedtools (v2.29.2)
- R (4.4.1)
- dplyr
- data.table
- fitdistrplus
- Supplementary scripts only:
Please consult the wiki for further documentation on specific scripts.