Skip to content

Latest commit

 

History

History
55 lines (44 loc) · 2.97 KB

README.md

File metadata and controls

55 lines (44 loc) · 2.97 KB

TSA-seq_toolkit

TSA-seq_toolkit is a pipeline for processing and analyzing genome-wide proximity mapping data generated by TSA-seq technique. In paper "TSA-Seq Mapping of Cytological Distances to Nuclear Speckles and Lamina Reveals Spatial and Functional Nuclear Organization" (In submission), we used TSA-seq_toolkit to:

  • Normalized and processed the SON TSA-seq data using a sliding window approach.
  • Partition the genome into SON-enriched and SON-depleted region.
  • Analyze the correlations of SON TSA-seq with other Genomic features.
  • Generated data that can be loaded in UCSC Genome Browser tracks for visualization.

TSA-seq_toolkit requires:

  • Python (tested in Python 2.7)
  • argparse
  • numpy
  • scipy
  • bx-python
  • R (tested in R 3.0)

Read Mapping

TSA-seq_toolkit does not do the read mapping step. It begins with bam file. For read mapping, any well-known mapping tools such as Bowtie, BWA and SOAP2 can be used. We do suggest to remove PCR duplicates after the mapping process and bam file should be sorted and indexed. For example, in the paper we used the following code to map reads to human genome and generate bam files

# Since K562 is derived from a female, we remove chromosome Y from human reference genome hg19 and named it hg19F
bowtie2 -p 8 -x hg19F -U SON_TSA-seq_pulldown.fastq -S SON_TSA-seq_pulldown.sam
# Convert Sam to Bam
samtools view -bS SON_TSA-seq_pulldown.sam > SON_TSA-seq_pulldown.bam
# You can also combine the above two steps if you do not want to save sam file on the disk using
# bowtie2 -p 8 -x hg19F -U SON_TSA-seq_pulldown.fastq | samtools view -bS - > SON_TSA-seq_pulldown.bam
# Sort bam file
samtools sort SON_TSA-seq_pulldown.bam SON_TSA-seq_pulldown_sort
# remove pcr duplicate from bam file
samtools rmdup SON_TSA-seq_pulldown_sort SON_TSA-seq_pulldown_rmdup.bam
# Index bam file
samtools index SON_TSA-seq_pulldown_rmdup.bam

Normalization

We then normalized and processed the TSA-seq data using a sliding window approaches. The basic idea is to calculate the fold change ratio in each sliding window between pulldown sample and input sample normalized by the total number of mapped reads. Details of method equation can be found in the paper. In the paper, we used window size of 20kb with sliding window step 100bp:

# Normalize SON TSA-seq pulldown with matched input
python tsatools_normalization.py -N 20000 -r 100 -l 100 -o LMNB_on_input_20k -e SON_TSA-seq_pulldown_rmdup.bam -c SON_TSA-seq_pulldown_input_rmdump.bam
# -N => window size in base pair (bp)
# -r => sliding window step (bp)
# -l => read length (bp)
# -e => bam file for pulldown sample
# -c => bam file for matched input sample
# -o => output file prefix

Three wig files will be generated after the code finish. One for the normalized TSA-seq score. The other two wig files show the signal profiles for pulldown sample and input sample under the resolution defined by -N and -r.

About

TSA-seq_toolkit is authored and maintained by Yang Zhang