Skip to content

Commit

Permalink
Merge pull request #140 from nbokulich/consistent-sweeps
Browse files Browse the repository at this point in the history
updated readme for modern era
  • Loading branch information
BenKaehler authored Jun 22, 2017
2 parents 9199710 + d2bbaee commit 857c9ee
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 20 deletions.
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,40 +4,31 @@

### A standardized and extensible evaluation framework for taxonomic classifiers

To view static versions of the reports presented in [Bokulich, et al., (Microbiome, under review)](https://peerj.com/preprints/934/), [start here](http://nbviewer.jupyter.org/github/nbokulich/tax-credit/blob/master/ipynb/Index.ipynb).
To view static versions of the reports , [start here](https://github.com/caporaso-lab/tax-credit/blob/master/ipynb/Index.ipynb).


Environment
-----------------
This repository contains python-3 code and Jupyter notebooks, but some taxonomy assignment methods (e.g., using QIIME-1 legacy methods) may require different python or software versions. Hence, we use conda parallel environments to support comparison of myriad methods in a single framework.

The first step is to create a conda environment with the necessary dependencies. This requires installing [miniconda 3](http://conda.pydata.org/miniconda.html) to manage parallel python environments. After miniconda (or another conda version) is installed, proceed with [installing QIIME 2](https://docs.qiime2.org/2.0.6/install/).
The first step is to install conda and install QIIME2 following the instructions provided [here](https://docs.qiime2.org/2017.6/install/native/).

An example of how to load different environments to support other methods can be see in the [QIIME-1 taxonomy assignment notebook](https://github.com/nbokulich/tax-credit/tree/master/ipynb/mock-community/generate-tax-assignments.ipynb).
An example of how to load different environments to support other methods can be see in the [QIIME-1 taxonomy assignment notebook](https://github.com/caporaso-lab/tax-credit/blob/master/ipynb/mock-community/taxonomy-assignment-qiime1.ipynb).


Setup and install
-----------------
The library code and IPython Notebooks are then installed as follows:

```
cd $HOME/projects
git clone https://github.com/gregcaporaso/tax-credit.git
cd $HOME/projects/tax-credit/code
sudo pip install .
```

To run the unit tests, you should install run:

```
cd $HOME/projects/tax-credit/code
nosetests .
cd tax-credit/
pip install .
```

Finally, download and unzip the reference databases:

```
cd $HOME/ref_dbs/
wget https://unite.ut.ee/sh_files/sh_qiime_release_20.11.2016.zip
wget ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz
unzip sh_qiime_release_20.11.2016.zip
Expand All @@ -46,11 +37,13 @@ tar -xzf gg_13_8_otus.tar.gz

Equipment
------------------
The analyses included here can all be run in standard, modern laptop, provided you don't mind waiting a few hours on the most memory-intensive step (taxonomy classification of millions of sequences). All analyses presented in ``tax-credit`` were run in a single afternoon using a MacBook Pro with the following specifications:
The analyses included here can all be run in standard, modern laptop, provided you don't mind waiting a few hours on the most memory-intensive step (taxonomy classification of millions of sequences). With the exception of the `q2-feature-classifier naive-bayes*` classifier sweeps, which were run on a high-performance cluster, all analyses presented in ``tax-credit`` were run in a single day using a MacBook Pro with the following specifications:
**OS** OS X 10.11.6 "El Capitan"
**Processor** 2.3 GHz Intel Core i7
**Memory** 8 GB 1600 MHz DDR3

If you intend to perform extensive parameter sweeps on a classifier (e.g., several hundred or more parameter combinations), you may want to consider running these analyses using cluster resources, if available.


Using the Jupyter Notebooks included in this repository
-------------------------------------------------------
Expand All @@ -65,6 +58,4 @@ The notebooks menu should open in your browser. From the main index, you can fol
Citing
------

tax-credit is currently unpublished, but for now if you use any of the data or code included in this repository, please cite the following paper:

Bokulich NA, Rideout JR, Kopylova E, Bolyen E, Patnode J, Ellett Z, McDonald D, Wolfe B, Maurice CF, Dutton RJ, Turnbaugh PJ, Knight R, Caporaso JG. (2015) A standardized, extensible framework for optimizing classification improves marker-gene taxonomic assignments. PeerJ PrePrints 3:e1156 https://dx.doi.org/10.7287/peerj.preprints.934v1
A publication is on its way! For now, if you use any of the data or code included in this repository, please cite https://github.com/caporaso-lab/tax-credit
2 changes: 1 addition & 1 deletion ipynb/Index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"These notebooks were used to perform the analyses presented in (Bokulich, Rideout, et al. (in preparation)), and can be used to reproduce the analyses in that paper, or to extend them to other data sets. \n",
"These notebooks were used to perform the analyses presented in (Bokulich, Kaehler, et al. (in preparation)), and can be used to reproduce the analyses in that paper, or to extend them to other data sets. \n",
"\n",
"To run any of the analysis notebooks, you'll need the [tax-credit project](https://github.com/caporaso-lab/tax-credit/). See the [README](https://github.com/caporaso-lab/tax-credit/blob/master/README.md) for installation instructions. For a static version of these notebooks that you can view as a webpage without installing anything, you can [view the notebooks on nbviewer](http://nbviewer.jupyter.org/github/caporaso-lab/tax-credit/blob/master/ipynb/Index.ipynb).\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion ipynb/mock-community/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Mock community evaluations

This notebook describes how to apply the mock community evaluations presented in (Bokulich, Rideout, et al. (in preparation)) to reproduce the analyses in that paper, or to extend them to other data sets.
This notebook describes how to apply the mock community evaluations presented in (Bokulich, Kaehler, et al. (in preparation)) to reproduce the analyses in that paper, or to extend them to other data sets.

## Structuring new results for comparison to precomputed results
To prepare results from another classifier for analysis, you'll need to have [BIOM](http://www.biom-format.org) files with taxonomy assignments as an observation metadata category called ``taxonomy``. An example of how to generate these is presented in the [data generation notebook](./mock-dataset-generation.ipynb) in this directory, which was used to generated the precomputed data in the [tax-credit repository](https://github.com/caporaso-lab/tax-credit/).
Expand Down

0 comments on commit 857c9ee

Please sign in to comment.