Graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells
Graph abstraction is available within Scanpy. Central toplevel functions are:
This repository allows to reproduce analyses and figures of the preprint; all the analysis were done using Scanpy 0.2.9. The results obtained in later versions of Scanpy - much improved in stability and consistency etc. - are exactly the same as those with 0.2.9, even though the layout of figures changed. Use GitHub's history button if you want to see earlier versions.
In minimal_examples, we study clean simulated datasets with known ground truth. In particular, a dataset that contains a tree-like continuous manifold and disconnected clusters...
... and simple datasets that illustrate connectivity patterns of clusters.
Also, you find an explanation of how to zoom in into particular regions of the dataset.
Here, we consider two well-studied datasets on hematopoietic differentiation.
Data from Paul et al. (2015)
In paul15, we analyze data for myeloid progenitor development. This is the same data that has served as benchmark for Monocle 2 (Qiu et al., Nat. Meth., 2017) and DPT (Haghverdi et al., Nat. Meth., 2016).
Note: Unfortunately, Firefox does not display the svg heatmaps properly, all other browers do.
Data from Nestorowa, Hamey et al. (2016)
In nestorowa16, we analyze data for early hematopoietic differentation.
In planaria, we reconstruct the lineage tree of the whole cell atlas of planaria (Plass, Jordi et al., submitted, 2017).
In deep_learning, we use deep learning to generate a feature space and, by that, a distance metric, which induces a nearest-neighbor graph. For the problem of reconstructing cell-cycle Eulenberg, Köhler, et al., Nat. Commun. (2017), we find that graph abstraction correctly separates a small cluster of dead cells from the cell evolution through G1, S and G2 phase.
For all of the following scRNA-seq datasets (3K and 68K PBMC cells, all 10X Genomics), graph abstraction reconstructs correct lineage motifs. As the data is disconnected in large parts, a global lineage tree cannot be inferred.