SkewX is a nextflow pipeline to measure skewed X inactivation from long-read sequencing of native DNA, either with Pacbio or Nanopore or technologies. It starts from bam files that include modified basecalls for 5mCG. It first calls heterozygous variants with DeepVariant and phases them into haplotypes with WhatsHap. Then it also clusters reads based on their methylation profile over CpG islands, and pools this haplotype and epiallele information to measure the skew in X inactivation.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
The required input is modbam files with 5mCG information. Then:
- If the reads are not already aligned, align to the reference genome with 'Minimap2'
- If multiple samples per individual are present, for instance multiple tissues, merge them into a single bam file
- Call variants with 'DeepVariant'
- Phase SNPs with 'WhatsHap'
- Haplotype and tag reads with 'WhatsHap'
- Cluster reads based on methylation profile with 'NanoMethViz'
- Measure skew in X inactivation and generate a report for each individual.
-
Install or module load
Nextflow
(>=21.10.3
) -
Install any of
Docker
,Singularity
(you can follow this tutorial),Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (you can useConda
both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs). -
IMPORTANT - ensure you mount singularity to your home directory (include "export NXF_SINGULARITY_HOME_MOUNT=true" in your .bashrc or to your session environment before launching pipeline - by default Singularity will not be able to find your home)
-
Ensure required files (.bed files, .fa reference) are properly specified as parameters in the config (nextflow.config)
-
Start running your own analysis!
nextflow main.nf --input samplesheet.csv --outdir skew_results --fasta chm13v2.0.fa --cgi CGIs_CHM13v2_chrX.bed -profile singularity
An example dataset is available in the test_data
directory of this repository. The dataset contains a small region of the mouse X chromosome, with a BAM file with methylation information. The pipeline can be run on this dataset with the following command:
nextflow main.nf --input test_data/samplesheet.csv --outdir skew_test_results --fasta test_data/mm10_chrX.fa --cgi test_data/mm10_chrX_CGI.bed -profile test
SkewX was originally written by Quentin Gouil, James Lancaster and Ed Yang.
We thank the following people for their extensive assistance in the development of this pipeline:
- Kathleen Zeglinski for her superior nextflow expertise
- Shian Su for implementing new features in NanoMethViz
If you use SkewX for your analysis, please cite it using the following doi: 10.1101/2024.03.20.585856
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.