Skip to content

annotation pipeline for microbial genomes and metagenomes

License

Notifications You must be signed in to change notification settings

sjanssen2/micronota

 
 

Repository files navigation

micronota

https://coveralls.io/repos/biocore/micronota/badge.svg?branch=master&service=github https://travis-ci.org/biocore/micronota.svg?branch=master https://badges.gitter.im/Join%20Chat.svg

micronota is an open-source, BSD-licensed package to annotate microbial genomes and metagenomes.

As Python 3 matures and majority python packages support Python 3, the scientific Python community is in favor of dropping Python 2 compatibility. Thus, micronota will only support Python 3. This will allow micronota to have few dependency and avoid maintenance of Python 2 legacy code.

Introduction

micronota can annotate multiple features including coding genes, prophage, CRISPR, tRNA, rRNA and other ncRNAs. It has a customizable framework to integrate additional tools and databases. Generally, the annotation can be classified into 2 categories: structural annotation and functional annotation. Structural annotation is the identification of the genetic elements on the sequence and functional annotation is to assign functions to those elements.

Install

To install the latest release of micronota:

conda install micronota

Or you can install through pip:

pip install micronota

To install the latest developping version:

pip install git+git://github.com/biocore/micronota.git

Prepare Databases

To prepare (download and format) the files of TIGRFAM to the right form read by micronota:

micronota database prepare tigrfam --cache_dir ~/database

Print Configure Info

To check the micronota setup, you can run:

micronota info

It will print out the system info, databases available, external tools, and other configuration info.

Config Files

There are 3 types of config files user can set for workflow, logging, and parameters, respectively. All of the 3 config files can be specified on the command line (--cfg, --log, and --param) to override the default settings. Besides, you can also put the frequently used settings into global config files. The global config files should be put in the config directory of micronota. On linux it is usually ~/.config/micronota; on Mac, it is usually ~/Library/Application Support/micronota. You can also find the directory by running the following command

python -c "import click; print(click.get_app_dir('micronota'))"

If the directory does not exist, just simply create it with mkdir command.

It is always good to confirm your settings by printing out the setup:

micronota info
# OR if you provide it on command line:
micronota --cfg misc.cfg --param param.cfg info

workflow config

This is how default workflow config looks like. You can copy and modify that to create your own config file either put in the config dir (with file name of misc.cfg) or provided in command with --cfg. micronota will only read one workflow config file, which is the first one it finds in the order of command line > global > default.

Here is an example modified that you can use on command line or global config dir:

[general]
# use another dir as database dir
db_dir = better/db

[feature]
# run prodigal first
prodigal = 1
# don't run infernal
infernal = 0

# next to annotate CDS
[cds]
# run diamond tegother with uniref50
diamond = uniref50

The format of the config file is widely used in different OS platforms and described here. 0 / 1, no / yes , false / true, on / off can all be used to turn off or on each tool. If the tool need a database file to run with, specify the database instead of the indicator.

Logging config

This is how default logging config looks like. It is used to config logging utilitiy to print out useful info. You can change logging config similarly as you do for workflow config. The global file should be named as log.cfg in the config dir if you plan to define global logging config.

For example, if you want to reduce the verbosity of logging, you can change level to ERROR in your global logging config file:

[loggers]
keys=root

[handlers]
keys=consoleHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=ERROR
handlers=consoleHandler

[handler_consoleHandler]
class=StreamHandler
formatter=simpleFormatter
args=(sys.stdout,)

[formatter_simpleFormatter]
format=%(asctime)s %(name)s %(levelname)s %(message)s
datefmt=%Y-%m-%d %H:%M:%S

Parameter config

The parameter config is used to tune the parameters of each external tools. This is how the default parameter config looks like. You can specify the parameter for each individual tools. For example, if you want to run Prodigal with genetic translation table 1, instead of the default translation table, you can create a file param.cfg:

[prodigal]
# set translation table to 1
-t = 1

Different from the other 2 config files, all the param config files will be read by micronota in the order of default, global and command line param config, with the following one overriding its previous.

Sequence Features to Identify

FeaturesSupportedTools
coding geneyesProdigal
tRNAongoingAragorn
ncRNAyesInfernal
CRISPRongoingMinCED
ribosomal binding sitesongoingRBSFinder
prophageongoingPHAST
replication origintodoOri-Finder 1 (bacteria) & Ori-Finder 2 (archaea)
microsatellitestodonhmmer?
signal peptideongoingSignalP
transmembrane proteinsongoingTMHMM

Databases Supported

DatabasesSupported
TIGRFAMyes
UniRefyes
Rfamongoing

Getting help

To get help with micronota, you should use the micronota tag on Biostars. The developers regularly monitor the micronota tag on Biostars.

Developing

If you’re interested in getting involved in micronota development, see CONTRIBUTING.md.

See the list of micronota’s contributors.

Licensing

micronota is available under the new BSD license. See COPYING.txt for micronota’s license, and the licenses directory for the licenses of third-party software and databasese that are (either partially or entirely) distributed with micronota.

About

annotation pipeline for microbial genomes and metagenomes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%