Skip to content
/ SkewX Public

Measuring skewed X inactivation with long-read sequencing

License

Notifications You must be signed in to change notification settings

QGouil/SkewX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SkewX

Nextflow DOI

Introduction

SkewX is a nextflow pipeline to measure skewed X inactivation from long-read sequencing of native DNA, either with Pacbio or Nanopore or technologies. It starts from bam files that include modified basecalls for 5mCG. It first calls heterozygous variants with DeepVariant and phases them into haplotypes with WhatsHap. Then it also clusters reads based on their methylation profile over CpG islands, and pools this haplotype and epiallele information to measure the skew in X inactivation.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Pipeline summary

The required input is modbam files with 5mCG information. Then:

  1. If the reads are not already aligned, align to the reference genome with 'Minimap2'
  2. If multiple samples per individual are present, for instance multiple tissues, merge them into a single bam file
  3. Call variants with 'DeepVariant'
  4. Phase SNPs with 'WhatsHap'
  5. Haplotype and tag reads with 'WhatsHap'
  6. Cluster reads based on methylation profile with 'NanoMethViz'
  7. Measure skew in X inactivation and generate a report for each individual.

Quick Start

  1. Install or module load Nextflow (>=21.10.3)

  2. Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).

  3. IMPORTANT - ensure you mount singularity to your home directory (include "export NXF_SINGULARITY_HOME_MOUNT=true" in your .bashrc or to your session environment before launching pipeline - by default Singularity will not be able to find your home)

  4. Ensure required files (.bed files, .fa reference) are properly specified as parameters in the config (nextflow.config)

  5. Start running your own analysis!

    nextflow main.nf --input samplesheet.csv --outdir skew_results --fasta chm13v2.0.fa --cgi CGIs_CHM13v2_chrX.bed -profile singularity

Documentation

Example data

An example dataset is available in the test_data directory of this repository. The dataset contains a small region of the mouse X chromosome, with a BAM file with methylation information. The pipeline can be run on this dataset with the following command:

nextflow main.nf --input test_data/samplesheet.csv --outdir skew_test_results --fasta test_data/mm10_chrX.fa --cgi test_data/mm10_chrX_CGI.bed -profile test

Credits

SkewX was originally written by Quentin Gouil, James Lancaster and Ed Yang.

We thank the following people for their extensive assistance in the development of this pipeline:

  • Kathleen Zeglinski for her superior nextflow expertise
  • Shian Su for implementing new features in NanoMethViz

Citations

If you use SkewX for your analysis, please cite it using the following doi: 10.1101/2024.03.20.585856

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.