Mowgli is a novel method for the integration of paired multi-omics data with any type and number of omics, combining integrative Nonnegative Matrix Factorization and Optimal Transport. Read the paper!
Mowgli is implemented as a Python package seamlessly integrated within the scverse ecosystem, in particular Muon and Scanpy.
On all operating systems, the easiest way to install Mowgli is via PyPI. Installation should typically take a minute and is continuously tested with Python 3.10 on an Ubuntu virtual machine.
pip install mowgli
git clone git@github.com:cantinilab/Mowgli.git
pip install ./Mowgli/
pytest .
Mowgli takes as an input a Muon object and populates its obsm
and uns
fields with the embeddings and dictionaries, respectively. Visit mowgli.rtfd.io for more documentation and tutorials.
You may download a preprocessed 10X Multiome demo dataset here.
A GPU is not required for small datasets, but is strongly recommended above 1,000 cells. On CPU, the cell lines demo (206 cells) should run in under 5 minutes and the PBMC demo (500 cells) should run in under 10 minutes (tested on a Ubuntu 20.04 machine with an 11th gen i7 processor).
import mowgli
import mudata as md
import scanpy as sc
# Load data into a Muon object.
mdata = md.read_h5mu("my_data.h5mu")
# Initialize and train the model.
model = mowgli.models.MowgliModel(latent_dim=15)
model.train(mdata)
# Visualize the embedding with UMAP.
sc.pp.neighbors(mdata, use_rep="W_OT")
sc.tl.umap(mdata)
sc.pl.umap(mdata)
@article{huizing2023paired,
title={Paired single-cell multi-omics data integration with Mowgli},
author={Huizing, Geert-Jan and Deutschmann, Ina Maria and Peyr{\'e}, Gabriel and Cantini, Laura},
journal={Nature Communications},
volume={14},
number={1},
pages={7711},
year={2023},
publisher={Nature Publishing Group UK London}
}
If you're looking for the repository with code to reproduce the experiments in our preprint, here is is!