Single-cell 3D genomics notes. Please, contribute and get in touch! See MDmisc notes for other programming and genomics-related notes.
-
scHiCExplorer - Single cell Hi-C data analysis toolbox, Python, from processing to normalization, clustering, compartment identification, visualization.
-
lh3/hickit - TAD calling, phase imputation, 3D modeling and more for diploid single-cell Hi-C (Dip-C) and general Hi-C. See https://doi.org/10.1016/j.cell.2020.12.032
-
tanlongzhi/dip-c - Tools to analyze Dip-C (or other 3C/Hi-C) data. See https://doi.org/10.1126/science.aat5641
-
nuc_processing - Chromatin contact paired-read single-cell Hi-C processing module for Nuc3D and NucTools. See https://doi.org/10.1038/nature21429
- scHiCNorm - scHi-C normalization using regression against known biases (cutting site density, mappability, CG) using six distributions. Filter out cells with less than 50,000 uniquely mapped reads, merge cells, 1Mb resolution. Ramani 2017 data, 74 matrices. Correlations are assumed to be driven by biases, and decrease in between-dataset correlation and increase in variability is judged as good.
- Liu, Tong, and Zheng Wang. “ScHiCNorm: A Software Package to Eliminate Systematic Biases in Single-Cell Hi-C Data.” Bioinformatics, (March 15, 2018)
-
BandNorm and 3DVI methods for normalizing and denoising single-cell Hi-C data. BandNorm - an R package, distance-centric band normalization approach, improves cell clustering. 3DVI - deep generative modeling framework using Poisson and Negative Binomial distributions to model scHi-C counts, accounting for library size and batch effect for each band matrix, learns low-dimensional representation of scHi-C data, denoises and enables 3D compartment identification (uses scvi-tools). Compared against library size scaling methods (global CellScale and local BandScale), and scHiCluster, scHiC Topics, Higashi (Table 1 - overview of 8 methods total). Evaluated clustering ARI/silhouette on Ramani2017, Kim2020, Lee2019, and Li2019 scHi-C datasets, 1Mb resolution (Supplementary Table 1). Differential TAD boundaries detection evaluated on TADcompare, diffHiC, CHESS using concordance with bulk data. Tweet
- Zheng, Ye, Siqi Shen, and Sunduz Keles. “Normalization and De-Noising of Single-Cell Hi-C Data with BandNorm and 3DVI.” Preprint. Bioinformatics, March 11, 2021
-
Fast-Higashi - scHi-C analysis for precise single cell clustering (rare cell type identification), trajectory inference, differential contact analysis (meta-interactions). Models scHi-C data using tensor decomposition (PARAFAC2 joint factorization algorithm, decomposes chromosome-specific 3-way tensors into four factors, Figure 1, Methods). Partial random walk with restart to impute the data. Applied to three 500kb scHi-C datasets (Tan et al. 2021, :Liu et al. 2021, Lee et al. 2019, and more). Compared with 3DVI, scHiCluster, Higashi (modularity score, ARI, adjusted mutual information, F1 scores), improves detection of rare cell types, trajectories, cell type-specific connections, aggregated A/B compartment analysis. Fast. Can initialize Higashi for better performance. Python/Pytorch code on Zenodo.
Paper
Zhang, Ruochi, Tianming Zhou, and Jian Ma. “Ultrafast and Interpretable Single-Cell 3D Genome Analysis with Fast-Higashi.” Cell Systems 13, no. 10 (October 2022): 798-807.e6. https://doi.org/10.1016/j.cels.2022.09.004.
-
Higashi - hypergraph representation learning for scHi-C embedding (learning node embedding of the hypergraph) and imputation (predicting missing hyperedges within the hypergraph). Whole scHi-C dataset as a hypergraph, with cell nodes and genomic bin nodes. Uses Hyper-SAGNN architecture. Imputation by borrowing information from k-nearest neighbors in the embedding space. Detects TAD-like structures. Applied to 4D Nucleome, Ramani, Nagano scHi-C data. Outperforms HiCCRep/MDS, scHiCluster. LDA in imputation, cell clustering. Can incorporate other omics modalities, and shown improved performance on single-nucleus methyl-3C (sn-m3C-seq) scHi-C and methylation data in human prefrontal cortex cells. Robust to downsampling. A/B compartments detection improved after imputation. Improved detection of TAD-like structures using insulation scores, genes associated with variable boundaries. Methods and supplementary detail network structure, input as triplets of attributes of one cell node and two genomic bin nodes, loss function, training.
- Zhang, Ruochi, Tianming Zhou, and Jian Ma. “Multiscale and Integrative Single-Cell Hi-C Analysis with Higashi.” Preprint. Bioinformatics, December 15, 2020.
-
Hyper-SAGNN - self-attention based graph neural network applicable to homogeneous and heterogeneous hypergraphs. Applied to scHi-C Ramani and Nagano data. Compared with DeepWalk, LINE, and HEBE. Not compared with hyper2vec and node2vec. Outperforms HiCrep+MDS and scHiCluster in measuring scHi-C similarity. Demo of other applications.
-
Zhang, Ruochi, Yuesong Zou, and Jian Ma. "Hyper-SAGNN: a self-attention based graph neural network for hypergraphs." arXiv preprint (November 6, 2019).
-
scHiCTools - scHiCTools - a set of tools for high-level analysis (clustering) of scHi-C data. Project single cells in a lower-dimensional Euclidean space. Three methods for smoothing scHi-C data (linear convolution, random walk, network enhancing), three projection methods (fastHiCRep, Selfish, newly developed InnerProduct), three embedding methods if assuming cells come from a continuous manifold (MDS, t-SNE, PHATE), or three clustering methods if assuming cells are from different clusters (k-means, spectral clustering, HiCluster). Brief Methods of each approach. Tested on Nagano 2017 cell-cycle dataset. InnerProduct captures cell similarity well, any embedding works good, linear convolution and random walk improve projections at high dropout rates. QC plots. ACROC - area under the curve of a circular ROC calculation. Input - text matrices. Python 3.
- Li, Xinjun, Fan Feng, Wai Yan Leung, and Jie Liu. “ScHiCTools: A Computational Toolbox for Analyzing Single-Cell Hi-C Data.” Preprint. Bioinformatics, September 18, 2019.
-
scHiCluster - single-cell Hi-C clustering algorithm based on imputation using linear convolution (neighborhood smoothing within a window of size 1 over 1Mb scHi-C matrices) and random walk with restarts. scHi-C challenges: variability, sparsity, coverage heterogeneity. Two-step imputation to resolve sparsity, top-ranked interactions after imputation to resolve heterogeneity. Tested on simulated (from bulk Hi-C controlling for sparsity, and pseudobulk) and experimental (Ramani, four human cell lines; Flyamer, mouse zygotes and oocytes; Nagano) scHi-C data. Against PCA, HiCRep+MDS, the eigenvector method, the decay profile method. Adjusted Rand Index to test clustering quality. TAD-like structures can be detected in imputed data (TopDom). At least 5k contacts per cell is sufficient. Python package. Input - sparse matrices, 1Mb resolution, or juicer-pre format for custom resolution.
- Zhou, Jingtian, Jianzhu Ma, Yusi Chen, Chuankai Cheng, Bokan Bao, Jian Peng, Terrence J. Sejnowski, Jesse R. Dixon, and Joseph R. Ecker. “Robust Single-Cell Hi-C Clustering by Convolution- and Random-Walk–Based Imputation.” Proceedings of the National Academy of Sciences, (July 9, 2019)
- scGSLoop - chromatin loop calling from scHi-C data. Graph representation of scHi-C data, the proximity-aware constrained variational graph autoencoder (PC-VGAE, GraphSAGE). Optional k-nearest neighborhood imputation. Compares with SnapHiC, better running time, memory usage, works especially well when the number of cells is low. Better enrichment in H3K27ac, H3K4me3 signals. PyTorch implementation.
Paper
Wang, Fuzhou, Hamid Alinejad‐Rokny, Jiecong Lin, Tingxiao Gao, Xingjian Chen, Zetian Zheng, Lingkuan Meng, Xiangtao Li, and Ka‐Chun Wong. “A Lightweight Framework For Chromatin Loop Detection at the Single‐Cell Level.” Advanced Science 10, no. 33 (November 2023): 2303502. https://doi.org/10.1002/advs.202303502.
-
SnapHiC - scHi-C analysis pipeline. Identifies chromatin loops at 10kb resolution. Imputes contact probability with the random walk with restart algorithm (scHiCluster method, considering the effective fragment size, GC content, mappability, details in Methods), distance-normalizes, applies the paired t-test using global and local background to identify loop candidates, groups the loop candidates using the Rodriguez and Lailo's algorithm, and identifies summits within each cluster. Considers global and local background to filter out false positives. Tested on 742 mouse embryonic stem cells, sn-methyl-3C-seq data from 2,869 human prefrontal cortical cells. Compared with HiCCUPS, discovers 4-70 times more cell-type-specific loops, achieves better F1, peak enrichment in APA analysis, CTCF convergent orientation, also detects known long-range interactions. Linking putative target genes and non-coding sequence variants associated with neuropsychiatric disorders. Ground truth for benchmarking: HiCCUPS loops plus long-range interadtions from PLAC-seq and HiChIP experiments from mESCs (MAPS pipeline). Compared with Hi-C-FastHiC, FitHiC2, HiC-ACT, on downsampled data.
Paper
Yu, Miao, Armen Abnousi, Yanxiao Zhang, Guoqiang Li, Lindsay Lee, Ziyin Chen, Rongxin Fang, et al. “SnapHiC: A Computational Pipeline to Identify Chromatin Loops from Single-Cell Hi-C Data.” Nature Methods, August 26, 2021. https://doi.org/10.1038/s41592-021-01231-2.Li, Xiaoqi, Lindsay Lee, Armen Abnousi, Miao Yu, Weifang Liu, Le Huang, Yun Li, and Ming Hu. “SnapHiC2: A Computationally Efficient Loop Caller for Single Cell Hi-C Data.” Computational and Structural Biotechnology Journal 20 (2022): 2778–83. https://doi.org/10.1016/j.csbj.2022.05.046. - SnapHiC2, fast reimplementation using a sliding window approach for random walk with restart. Enables data processing at 5kb resolution.
- DeDoc2 - scHi-C hierarchical TAD caller. Two variants, deDoc2.w and deDoc2.s, to predict higher and lower level TLDs. Minimize structural entropy of the whole chromosome of sliding window. Benchmarked on downsampled, simulated, and experimental scHi-C data, against Higashi, scHiCluster, deTOKI, SpectralTAD, deDoc, GRiNCH, Insulation Score. Robust to noise, no need for data imputation.
Paper
Li, Angsheng, Guangjie Zeng, Haoyu Wang, Xiao Li, and Zhihua Zhang. “DeDoc2 Identifies and Characterizes the Hierarchy and Dynamics of Chromatin TAD-Like Domains in the Single Cells.” Advanced Science (Weinheim, Baden-Wurttemberg, Germany) 10, no. 20 (July 2023): e2300366. https://doi.org/10.1002/advs.202300366.
- Review of 12 scHi-C 3D modeling methods (Table 1), classified as bulk Hi-C models and scHi-C models. Most scHi-C models are consensus methods. Validation strategies with artificial, simulated datasets, or 3D-FISH. References to scHiC datasets.
Paper
Banecki, Krzysztof, Sevastianos Korsak, and Dariusz Plewczynski. “Advancements and Future Directions in Single-Cell Hi-C Based 3D Chromatin Modeling.” Computational and Structural Biotechnology Journal 23 (December 2024): 3549–58. https://doi.org/10.1016/j.csbj.2024.09.026.
- DPDchrom - reconstruction of the 3D chromatin conformation from single-cell Hi-C data. Relies on dissipative particle dynamics (DPD). Incorporates expectation whether the conformation should be coil-like or globular (at the resolution of 10kb and lower). Explicitly accounts for solvent. Compared with the Stevens method, classical molecular dynamics (CMD) method. Benchmarked on artificial polymer models, DPDchrom performs better at low contact density (up to 95% accuracy). On experimental data - up to 65% accuracy. Propose the Modified Jaccard Index (Methds) to compare 3D structures irrespectively of spatial orientation and scale. Many practical aspects and parameters affecting reconstruction accuracy, data sparsity exponentially affects accuracy. S2 Table - list of single nucleus Hi-C datasets, S1 Appendix - Details of simulation methods and analysis, ORBITA protocol for snHi-C. Tweet by Pavel Kos
- Kos, Pavel I., Aleksandra A. Galitsyna, Sergey V. Ulianov, Mikhail S. Gelfand, Sergey V. Razin, and Alexander V. Chertovich. “Perspectives for the Reconstruction of 3D Chromatin Conformation Using Single Cell Hi-C Data.” PLOS Computational Biology, (November 18, 2021)
- scHi-CSim - a single-cell Hi-C simulator (Python), estimates statistical properties from experimental data and generate simulated data closely resembling experimental (cell type information, biological functions, enhancer-promoter interactions, loops, their statistical significance). Used for clustering benchmarking.
Paper
Fan, Shichen, Dachang Dang, Yusen Ye, Shao-Wu Zhang, Lin Gao, and Shihua Zhang. “scHi-CSim: A Flexible Simulator That Generates High-Fidelity Single-Cell Hi-C Data for Benchmarking.” Edited by Luonan Chen. Journal of Molecular Cell Biology 15, no. 1 (June 1, 2023): mjad003. https://doi.org/10.1093/jmcb/mjad003.
- scKTLD - TAD-like domain identification on single-cell Hi-C data using graph analysis. Hi-C contact matrix as the adjacency matrix, embeds the graph into a low-dimensional space using a kernel-based changepoint detection, optimized with Pruned Exact Linear Time (PELT). Four types of TAD detection methods, review of single-cell-specific. Experimental bulk (GM12878, K562, downsampled), single-cell Hi-C data, simulated data. ChIP-seq data (CTCF, Rad21, Smc3, H3K4me3) to justify biological relevance. Methods, math. Two hyperparameters, the dimension of the embeddings (128 deemed optimal), the penalty constant in changepoint detection. Normalization (KR or ICE) decreases performance. Compared with 7 TAD callers, including single-cell-specific deTOKI, scHiCluster, and Higashi. Comparison of TAD sets - adjusted mutual information, measure of concordance, TAD-adjR2. Enrichment in CTCF signal (within 500kb up/down flanking), compactness of TADs (the distribution of IFs within TADs). Boundaries in single-cell Hi-C data are heterogeneous irrespectively of cell type, but tend to overlap with boundaries in bulk Hi-C data.
Paper
Liu, Erhu, Hongqiang Lyu, Yuan Liu, Laiyi Fu, Xiaoliang Cheng, and Xiaoran Yin. “Identifying TAD-like Domains on Single-Cell Hi-C Data by Graph Embedding and Changepoint Detection,” https://doi.org/10.1093/bioinformatics/btae138
- Galitsyna, Aleksandra A, and Mikhail S Gelfand. “Single-Cell Hi-C Data Analysis: Safety in Numbers.” Briefings in Bioinformatics, August 18, 2021
- Single-cell Hi-C review, technology overview, analysis steps, challenges, tools. Mapping (split-read alignment, iterative mapping, read clipping, ORBITA), filtering spurious contacts, cells. Analysis, from 3D structure reconstruction, imputation, embedding, to clustering, pseudobulk analysis and AB compartments/TADs calling, deconvolution.
Zhou, Tianming, Ruochi Zhang, and Jian Ma. “The 3D Genome Structure of Single Cells.” Annual Review of Biomedical Data Science, (July 20, 2021) - Review of scHi-C technologies, computational methods.Table 1 - technologies (proximity ligation-based (e.g., sci-Hi-C, Dip-C), ligation-free (e.g., scSPRITE, ChIA-Drop), imaging-based (e.g., Oligopaint, OligoFISSEQ, hiFISH, HIPMap)), number of cells, depth. Data processing (demultiplexing, alignment, binning, filtering, storage, tool - scHiCExplorer), dimensionality reduction (HiCRep + MDS, scHiCluster, hypergraph-based Higashi + Hyper-SAGNN), imputation (scHiCluster, Higashi), Challenges in 3D structure modeling, sompartment annotation, domain/loop identification. Multi-way interaction analysis methods (MIA-Sig, MATCHA).
- Li, Xiao, Ziyang An, and Zhihua Zhang. “Comparison of Computational Methods for 3D Genome Analysis at Single-Cell Hi-C Level.” Methods, August 2019 - Assessment of Hi-C methods applied to single-cell Hi-C data. Overview of computational analysis of Hi-C data (normalization, A/B compartment, TAD, loop calling, differential analysis), scRNA-seq data properties. Tested on systematically downsampled data and on experimental scHi-C data. HiCnorm is most performing for normalization, Insulation Score fastHiC for TAD/loop calling. A/B compartments are poorly defined in scHi-C data, TADs can be identified at single-cell level, aggregation improves TAD detection. Adjusted mutual information and weight similarity for TAD similarity assessment. Other methods, like TAD boundary prediction from epigenomic features.
-
Kim, Hyeon-Jin, Galip Gürkan Yardımcı, Giancarlo Bonora, Vijay Ramani, Jie Liu, Ruolan Qiu, Choli Lee, et al. “Capturing Cell Type-Specific Chromatin Compartment Patterns by Applying Topic Modeling to Single-Cell Hi-C Data.” PLOS Computational Biology, (September 18, 2020) - Topic modeling (Latent Dirichlet allocation, LDA) on sciHi-C data. 4D Nucleome datasets, 500kb resolution, newly generated data from GM12878, H1ESC, HFF IMR90, HAP1 cells, >19,000 cells. Preprocessing and converting 500Mb scHi-C matrices to locus-pairs (LPs), then tSNE. Cell-topics, LP-topics representation. Topics can capture A/B compartments. LDA using the cisTopic package, procedure for selecting the number of topics. Comparison with scHiCluster, similar perfrormance.
-
Liu, Jie, Dejun Lin, Galip Gürkan Yardimci, and William Stafford Noble. “Unsupervised Embedding of Single-Cell Hi-C Data.” Bioinformatics, (July 1, 2018) - Embedding of scHi-C data. HiCRep with MDS performs best. Contact Probability Function as a means to compare Hi-C matrices. Methods for evaluating reproducibility also can be used to compare matrices, details of HiCRep, GenomeDISCO, HiC-Spector methods. Description of scHi-C datasets and their arrangement by cell cycle stage. 5K total reads per scHi-C matrix is sufficient for proper embedding.
- Single-cell 3D diploid chromatin conformation capture of human and mouse cerebellar cells (granule and Purkinje) across lifespan, coupled with simultaneous profiling of transcriptome and chromatin accessibility (integration with LIGER). Two 3D technologies: population-scale Dip-C (Pop-C) leverages the whole-genome sequencing capability of Dip-C; vDip-C, enables genomic profiling of rare cell populations (Purkinje) using a single viral vector containing a cell type–specific promoter, an ultrabright, fixation-resistant, monomeric fluorescent protein, and a nuclear membrane localization sequence. Granule cells show large structural transformation over the lifespan. scA/B generally correlates with cell type-specific gene expression. Raw and processed data at PRJNA933352 and GSE246785 (ArchR objects, contacts in text/pairs format).
Paper
Tan, Longzhi, Jenny Shi, Siavash Moghadami, Bibudha Parasar, Cydney P. Wright, Yunji Seo, Kristen Vallejo, et al. “Lifelong Restructuring of 3D Genome Architecture in Cerebellar Granule Cells.” Science 381, no. 6662 (September 8, 2023): 1112–19. https://doi.org/10.1126/science.adh3253.
-
scSPRITE - single-cell split-pool recognition of interactions by tag extension technology. Multi-way interactions. Applied to mESCs and detected chromosome territories, A/B compartments, TADs (heterogeneous), long-range interactions organized around various nuclear bodies. GEO GSE154353 - processed matrices. GitHub - Code
- Arrastia, Mary V., Joanna W. Jachowicz, Noah Ollikainen, Matthew S. Curtis, Charlotte Lai, Sofia A. Quinodoz, David A. Selck, Rustem F. Ismagilov, and Mitchell Guttman. "Single-cell measurement of higher-order 3D genome organization with scSPRITE." Nature Biotechnology (2021): 1-10.
-
Single-nucleus Hi-C data (scHi-C) of 88 Drosophila BG3 cells. 2-5M paired-end reads per cell, 10kb resolution. ORBITA pipeline to eliminate the effect of Phi29 DNA polymerase template switching. Chromatin compartments approx. 1Mb in size, non-hierarchical conserved TADs can be detected. Lots of biology, integration with other omics data. Raw and processed data in .cool format at GEO GSE131811
- Ulianov, Sergey V., Vlada V. Zakharova, Aleksandra A. Galitsyna, Pavel I. Kos, Kirill E. Polovnikov, Ilya M. Flyamer, Elena A. Mikhaleva, et al. “Order and Stochasticity in the Folding of Individual Drosophila Genomes.” Nature Communications 12, no. 1 (December 2021)
-
Four scHi-C datasets (Kim2020, Li2019, Ramani2017, and Lee2019) can be downloaded using the BandNorm package's function download_schic. Alternatively, direct download
-
Dip-C - diploid chromatin conformation capture, MALBAC-DT - multiple annealing and looping-based amplification cycles for digital transcriptomics. Single-cell 3D genomic (3646 cells) and transcriptomic (3517 cells) data of developing (7 time points post-natal) mouse brain (cortex and hippocampus). 3D genome separates cells into 13 structure types. Three main lineages: neurons, astrocytes, oligodendrocytes. Correlating gene cell type-specific expression with A/B compartmentalization helps to integrate gene expression and the 3D structure. WGCNA modules of correlated genes, dynamics of those modules across time. Integration with published 3D genome, transcriptome, methylome, and chromatin accessibility data. A major 3D genome transformation between P7 (postnatal day 7) and P28 days, genetics determines 3D structure irrespectively of sensory stimuli (visual cortex in dark-reared animals), many more in the text. Data processed with dip-c - tools to analyze Dip-C data, with documentation and hickit - TAD calling, phase imputation, 3D modeling and more for diploid single-cell Hi-C (Dip-C) and general Hi-C. Processed MALBAC-DT and Dip-C data. References to other published data in the "Key resourses" table.
Paper
Tan, Longzhi, Wenping Ma, Honggui Wu, Yinghui Zheng, Dong Xing, Ritchie Chen, Xiang Li, Nicholas Daley, Karl Deisseroth, and X. Sunney Xie. “[Changes in Genome Architecture and Transcriptional Dynamics Progress Independently of Sensory Experience during Post-Natal Brain Development](https://doi.org/10.1016/j.cell.2020.12.032).” Cell, January 2021
-
Arrastia, Mary V., Joanna W. Jachowicz, Noah Ollikainen, Matthew S. Curtis, Charlotte Lai, Sofia Quinodoz, David A. Selck, Mitchell Guttman, and Rustem F. Ismagilov. “A Single-Cell Method to Map Higher-Order 3D Genome Organization in Thousands of Individual Cells Reveals Structural Heterogeneity in Mouse ES Cells.” Preprint. Molecular Biology, August 12, 2020. - scSPRITE - Single-cell split-pool recognition of interactions by tag extension.Two triple-sets of split-pool barcoding, nuclear and spatial, DNA phosphate modified (DMP), odd, and even tagging. Paired-end sequencing, read 1 has genomic DNA and the DMP tag, read 2 has other 5 tags. Detects chromatin structures at all scales. Applied to >1000 mESC nuclei. Highly correlate with bulk SPRITE data. Captures multi-way interactions. Captures >30-fold more contacts than scHi-C with <10-fold reads. Processed single-cell matrices, ensemble, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE154353. A Snakemake workflow for data processing
-
Tan, Longzhi, Dong Xing, Chi-Han Chang, Heng Li, and X. Sunney Xie. “Three-Dimensional Genome Structures of Single Diploid Human Cells.” Science, (31 2018) - Dip-C technology for single-cell Hi-C, multiplex end-tagging amplification (META). Detects ~5 times more contacts. Possible to make haplotype-separated Hi-C maps, detect CNVs, resolve X-chromosome inactivation. ~10kb-resolution Hi-C matrices, 3D genome reconstruction at 20kb resolution. PCA on chromatin compartments separates cell types. Comparison with Nagano data, bulk Gm12878 Hi-C. Tools to analyze Dip-C (or other 3C/Hi-C) data, lh3/hickit. Processed data: Gm12878, 17 cells, PBMC, 18 cells.
-
Lando, David, Tim J. Stevens, Srinjan Basu, and Ernest D. Laue. “Calculation of 3D Genome Structures for Comparison of Chromosome Conformation Capture Experiments with Microscopy: An Evaluation of Single-Cell Hi-C Protocols.” Nucleus, (January 1, 2018) - scHi-C protocol evaluation (Flyamer, Stevens, Nagano. Ramani). 100kb data. Practical observations, code for processing the data. NucProcess - Python toolkit for processing scHi-C-seq data. NucDynamics - calculating genome structures from scHi-C data, explanation of read alignment/filtering at restriction fragment resolution.
-
Flyamer, Ilya M., Johanna Gassler, Maxim Imakaev, Hugo B. Brandão, Sergey V. Ulianov, Nezar Abdennur, Sergey V. Razin, Leonid A. Mirny, and Kikuë Tachibana-Konwalski. “Single-Nucleus Hi-C Reveals Unique Chromatin Reorganization at Oocyte-to-Zygote Transition.” Nature, (06 2017) - snHi-C method, single-nucleus Hi-C that provides >10-fold more contacts per cell than the previous method. Omitted biotin incorporation and enrichment for ligated fragments steps. Applied to mouse oocyte-to-zygote transition, separately to maternal and paternal genomes (different patterns of chromatin reorganization, A/B compartments present in paternal nuclei only). Single cells have variable chromatin structure, but global patterns emerge when averaging. Changes in slope of the distance-dependent decay. TADs and loops may be generated by different mechanisms than compartments. Decrease in loop, TAD, and compartment strength during maturation. hiclib data processing, lavaburst for TAD identification. Data, pooled sparse contact matrices (individual samples available under their own GSM accessions)
-
Nagano, Takashi, Yaniv Lubling, Tim J. Stevens, Stefan Schoenfelder, Eitan Yaffe, Wendy Dean, Ernest D. Laue, Amos Tanay, and Peter Fraser. “Single-Cell Hi-C Reveals Cell-to-Cell Variability in Chromosome Structure.” Nature, (October 3, 2013) - Single-cell Hi-C, protocol. 10 cells analyzed individually and as an ensemble. Large domains are stable, within-domain interactions are stochastic. Active marks correlate with enrichment of trans-chromosomal contacts. Data: Mouse Th1 cells, 11 samples.
-
Takashi Nagano et al., “Cell-Cycle Dynamics of Chromosomal Organization at Single-Cell Resolution,” Nature, (July 5, 2017) - Single-cell Hi-C, mouse embryonic stem cells, diploid and haploid, over cell cycle, 45 samples. Data on GEO and data on the pipeline page
- Haploid single cell Hi-C processed counts from Nagano et al. were obtained from the Tanay lab, schic_hap_2i_adj_files.tar.gz and schic_hap_serum_adj_files.tar.gz. Only cells with a total of 100,000 reads or more were used. Data were further filtered using HiFive single cell Hi-C filters. This involved removing fragment ends (fends) with no interactions, fends smaller than 21 bp or larger than 10 Kb, and all fends not originating from chromosomes 1 through 19 or X. Next, because only haploid cell data were used any fend with more than two interactions was removed and fends with exactly two interactions were removed if the interactions occurred with partner fends more than 40 fends apart; otherwise, the longer of the two interactions was kept. Finally, fends were partitioned into 1 Mb bins. (from Luperchio et al., “The Repressive Genome Compartment Is Established Early in the Cell Cycle before Forming the Lamina Associated Domains.”)
- scHiC 2.0: Sequence and analysis pipeline of single-cell Hi-C datasets
-
Ramani, Vijay, Xinxian Deng, Ruolan Qiu, Kevin L. Gunderson, Frank J. Steemers, Christine M. Disteche, William S. Noble, Zhijun Duan, and Jay Shendure. “Massively Multiplex Single-Cell Hi-C.” Nature Methods, (2017) - single-cell combinatorial indexed Hi-C protocol, exploratory data analysis, PCA, QC and filtering steps. Some experiments contained mixture of human and mouse cells. Four human cell lines, HeLa S3, HAP1, K562, and GM12878. These cell lines are distributed over five sequencing libraries labeled as ml1, ml2, ml3, pl1, pl2, where pairs ml1 and ml2, and pl1 and pl2 are sequencing experiments with the same library preparations, respectively, and hence present different batches. >10,000 cells. GEO GSE84920
-
Stevens, Tim J., David Lando, Srinjan Basu, Liam P. Atkinson, Yang Cao, Steven F. Lee, Martin Leeb, et al. “3D Structures of Individual Mammalian Genomes Studied by Single-Cell Hi-C.” Nature, March 13, 2017 - 100kb five single-cell HiC. TADs are dynamic, A/B compartments, LADs, enhancers/promoters are consistent. 3D clustering of active histone marks, highly expressed genes. Co-expression of genes within TAD boundaries. Supplementary material has processing pipeline description, TheLaueLab/nuc_processing. Videos
-
Ulianov, Sergey V., Kikue Tachibana-Konwalski, and Sergey V. Razin. “Single-Cell Hi-C Bridges Microscopy and Genome-Wide Sequencing Approaches to Study 3D Chromatin Organization.” BioEssays, (2017) - scRNA-seq, review of the technology and six papers that generated scHi-C data.
- HiRES technology, Hi-C and RNA-seq employed simultaneously. Single-cell Hi-C and RNA-seq profiling from the same cells. Single-cell 3D structures depend on cell cycle but also diverge in cell type-specific manner. Interactions between B compartments increase during development. 3D changes occur before transcriptional changes. Brain cells and developing mouse embryos, between day 7 (E7.0) and E11.5. 20kb resolution, agrees with Dip-C. SimpleDiff pipeline for differential chromatin interaction analysis (Wilcoxon on distance-specific Z-score-transformed contacts between groups of cells), excitatory vs. inhibitory adult mouse brain neuron analysis. GSE223917 - processed data, description. Processing Scripts, Python, R, command line.
Paper
Liu, Zhiyuan, Yujie Chen, Qimin Xia, Menghan Liu, Heming Xu, Yi Chi, Yujing Deng, and Dong Xing. “Linking Genome Structures to Functions by Simultaneous Single-Cell Hi-C and RNA-Seq.” Science 380, no. 6649 (June 9, 2023): 1070–76. https://doi.org/10.1126/science.adg3797.
- Single-cell DNA methylation (snmC-seq3) and 3D genome architecture (snm3C-seq) in the human brain. Additional snRNA-seq and snATAC-seq. 517K cells from 46 regions of three adult male brains. Epigenome-based classification of brain cell types, comparison of neurons and non-neurons in terms of loops, domains, compartments, differentially expressed genes, methylation patterns, association among modalities. Data at GEO GSE215353, at Brain initiative (download and visualize). Tools: scHiCcluster, GitHub1, GitHub2, GitHub3. Supplementary tables include differential loops between major cell types, candidate cis-regulatory elements.
Paper
Tian, Wei, Jingtian Zhou, Anna Bartlett, Qiurui Zeng, Hanqing Liu, Rosa G. Castanon, Mia Kenworthy, et al. “Single-Cell DNA Methylation and 3D Genome Architecture in the Human Brain.” Science 382, no. 6667 (October 13, 2023): eadf5357. https://doi.org/10.1126/science.adf5357.
-
Lee, Dong-Sung, Chongyuan Luo, Jingtian Zhou, Sahaana Chandran, Angeline Rivkin, Anna Bartlett, Joseph R. Nery, et al. “Simultaneous Profiling of 3D Genome Structure and DNA Methylation in Single Human Cells.” Nature Methods, September 9, 2019 - sn-m3C-seq - single-nucleus methyl-3C sequencing, extension of snmC-seq2 method, DpnII digestion Fluorescence-Activated Nuclei sorting and the following bisulfite conversion. Cell types can be distinguished by hierarchical clustering (mouse cell types, 4238 human prefrontal cortex cells separated into 14 populations - Astro, Endo, L2/3, L4, L5, L6, MG, MP, Ndnf, ODC, OPC, Pvalb, Sst, Vip, originating from two donors with ages of 21 and 29 years and in a total of five sequencing libraries). TAURUS-MH pipeline, outperforms BWA-METH. sn-m3C-seq methylation correlates well with bulk and single-cell methylation measures. More Hi-C contacts than published datasets. Comparing brain cell subpopulations, chromatin interactions overlap, methylation differ, hypomethylation is associated with increased interactions, differential domain boundaries are associated with differential methylation. mESC data (raw FASTQ, >600 samples, >60Gb), human brain data (raw FASTQ, >4K samples, >700Gb), .cool 10Mb resolution files. Protocol. Interactive methylation data, Hi-C data. Code scripts, TAURUS-MH pipeline, Twitter
-
Li, Guoqiang, Yaping Liu, Yanxiao Zhang, Naoki Kubo, Miao Yu, Rongxin Fang, Manolis Kellis, and Bing Ren. “Joint Profiling of DNA Methylation and Chromatin Architecture in Single Cells.” Nature Methods, August 5, 2019 - Methyl-HiC - in situ Hi-C and WGBS. mESC cells cultured in serum and leukemia inhibitory factor (LIF) condition (serum mESCs: serum 1 and serum 2) and mESCs cultured in LIF with GSK3 and MEK inhibitors (2i) condition. Comparable Hi-C matrices, TADs. 20% fewer CpGs overall, more CpGs in open chromatin. Proximal CpGs correlate irrespectively of loop anchors, weaker for inter-chromosomal interactions. Application to single-cell, mouse ESCs under different conditions. Relevant clustering, cluster-specific genes. Methods for wet-lab and computational processing. Bulk (replicates) and single-cell Methyl-HiC data. Scripts, Bhmem pipeline to map bisulfite-converted reads, Juicer pipeline for processing, VC normalization, HiCRep at 1Mb matrix similarity.
-
MERFISH - Super-resolution imaging technology, reconstruction 3D structure in single cells at 30kb resolution, 1.2Mb region of Chr21 in IMR90 cells. Distance maps obtained by microscopy show small distance for loci within, and larger between, TADs. TAD-like structures exist in single cells. 2.5Mb region of Chr21 in HCT116 cells, cohesin depletion does not abolish TADs, only alter their preferential positioning. Multi-point (triplet) interactions are prevalent. TAD boundaries are highly heterogeneous in single cells. , diffraction-limited and STORM (stochastic optical reconstruction microscopy) imaging. GitHub
- Bintu, Bogdan, Leslie J. Mateo, Jun-Han Su, Nicholas A. Sinnott-Armstrong, Mirae Parker, Seon Kinrot, Kei Yamaya, Alistair N. Boettiger, and Xiaowei Zhuang. “Super-Resolution Chromatin Tracing Reveals Domains and Cooperative Interactions in Single Cells.” Science, (October 26, 2018)
-
Single-cell level massively multiplexed FISH (MERFISH, sequential genome imaging) to measure 3D genome structure in context of gene expression and nuclear structures. Approx. 650 loci, 50kb resolution, on chr21 10.4-46.7Mb from the hg38 genome assembly, IMR90 cells, population average from approx. 12K chr21 copies, multiple rounds of hybridization. Investigation of TADs, A/B compartments, 87% agreement with bulk Hi-C. Association with cell type markers, transcription. Genome-scale imaging using barcodes, 1041 30kb loci covering autosomes and chrX of IMR90, over 5K cells, 5 replicates. Processed multiplexed FISH data and more, TXT format, GitHub
- Su, Jun-Han, Pu Zheng, Seon S. Kinrot, Bogdan Bintu, and Xiaowei Zhuang. “Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin.” Cell, August 2020
-
Parser of multiplexed single-cell imaging data from Bintu et al. 2018 and Su et al. 2020 - Take 3D coordinates of the regions as input and write the distance and contact matrices for these datasets.