diff --git a/README.md b/README.md index 484d8ff..63a1e0a 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,17 @@ COpy number and Mutations Phylogeny from Amplicon Single-cell Sequencing This tool can be used to infer a tree of somatic events (mutations, copy number variants and copy-neutral loss of heterozygosity) that occurred in a tumor. It is specifically designed to be used for MissionBio's Tapestri data, where a small number of amplicons (50-300) are sequenced for thousands of single-cells. -## Compilation +## Quick start +``` +git clone https://github.com/cbg-ethz/COMPASS.git +cd COMPASS +make +./COMPASS -i data/AML-59-001 -o tree_AML-59-001.gv --nchains 4 --chainlength 5000 --CNV 1 +dot -Tpng -o tree_AML-59-001.png tree_AML-59-001.gv +``` + +Graphviz is required in order to plot the tree, which can be installed on Ubuntu by running `sudo apt-get install graphviz ` -A makefile is provided. Alternatively, COMPASS can be compiled by simply running `g++ -std=c++11 -O2 -fopenmp *.cpp -o COMPASS` ## Usage @@ -26,4 +34,4 @@ COMPASS takes as input 2 files: * [sample_name]_variants.csv: each line corresponds to a variant. The first columns contain metadata and the remaining columns contain the counts of reference reads and alternative reads, separated by ":", for each cell. * [sample_name]_regions.csv: each line corresponds to a region (typically, a gene). The first column is CHR_REGIONNAME, and the remaining columns contain the number of reads in this region, for each cell. This file is only required in case CNVs are used (--CNV 1). -The `data` directory contains an example synthetic input. The `Experiments` directory contains scripts used to preprocess the loom files generated by the Tapestri pipeline, as well as workflows used to run simulations on synthetic data. +The `data` directory contains an example synthetic input, as well as two preprocessed real AML samples. The `Experiments` directory contains scripts used to preprocess the loom files generated by the Tapestri pipeline, as well as workflows used to run simulations on synthetic data.