diff --git a/README.md b/README.md index 0d7e0e5..2a9026e 100644 --- a/README.md +++ b/README.md @@ -42,35 +42,42 @@ Full CLI options (check out with ``geneplexus --help``) ```txt Run the GenePlexus pipline on a input gene list. -optional arguments: +options: -h, --help show this help message and exit - -i , --input_file Input gene list (.txt) file (one gene per line). (default: None) + -i , --input_file Input gene list (.txt) file. (default: None) -d , --gene_list_delimiter Delimiter used in the gene list. Use 'newline' if the genes are separated by new line, and use 'tab' if the genes are seperate by tabs. Other generic separator are also supported, e.g. ', '. (default: newline) - -n , --network Network to use. {format_choices(config.ALL_NETWORKS)} (default: STRING) - -f , --feature Types of feature to use. The choices are: {Adjacency, Embedding, - Influence} (default: Embedding) - -g , --gsc Geneset collection used to generate negatives and the modelsimilarities. - The choices are: {GO, DisGeNet} (default: GO) - -s , --small_edgelist_num_nodes - Number of nodes in the small edgelist. (default: 50) -dd , --data_dir Directory in which the data are stored, if set to None, then use the default data directory ~/.data/geneplexus (default: None) + -n , --network Network to use. The choices are: {BioGRID, STRING, IMP} (default: STRING) + -f , --feature Types of feature to use. The choices are: {SixSpeciesN2V} (default: + SixSpeciesN2V) + -s1 , --sp_trn Species of training data The choices are: {Human, Mouse, Fly, Worm, + Zebrafish, Yeast} (default: Human) + -s2 , --sp_res Species of results data The choices are: {Human, Mouse, Fly, Worm, + Zebrafish, Yeast} (default: Mouse) + -g1 , --gsc_trn Geneset collection used to generate negatives. The choices are: {GO, + Monarch, Mondo, Combined} (default: GO) + -g2 , --gsc_res Geneset collection used for model similarities. The choices are: {GO, + Monarch, Mondo, Combined} (default: GO) + -s , --small_edgelist_num_nodes + Number of nodes in the small edgelist. (default: 50) -od , --output_dir Output directory with respect to the repo root directory. (default: result/) -l , --log_level Logging level. The choices are: {CRITICAL, ERROR, WARNING, INFO, DEBUG} (default: INFO) + -ad, --auto_download_off + Turns off autodownloader which is on by default. (default: False) -q, --quiet Suppress log messages (same as setting log_level to CRITICAL). (default: False) -z, --zip-output If set, then compress the output directory into a Zip file. (default: False) --clear-data Clear data directory and exit. (default: False) --overwrite Overwrite existing result directory if set. (default: False) - --skip-mdl-sim Skip model similarity computation. This computation is not yet available - when using custom networks due to the lack of pretrained models for - comparison. (default: False) + --skip-mdl-sim Skip model similarity computation (default: False) + --skip-sm-edgelist Skip making small edgelist. (default: False) ``` # Dev diff --git a/docs/source/index.rst b/docs/source/index.rst index 54ed29c..6b7b4ac 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -17,6 +17,7 @@ PyGenePlexus notes/api notes/r notes/data + notes/faqs .. toctree:: :maxdepth: 1 diff --git a/docs/source/notes/data.rst b/docs/source/notes/data.rst index 2edbcfa..7bde86c 100644 --- a/docs/source/notes/data.rst +++ b/docs/source/notes/data.rst @@ -17,6 +17,33 @@ Features :term:`SixSpeciesN2V` GSCs [GO]_, [Monarch]_, [Mondo]_, Combined ======== ======================================================= +**Detailed species info:** + +.. list-table:: + :widths: 10 10 10 + + * - + - Specifc Name + - Taxon Id + * - Human + - Homo sapiens + - 9606 + * - Mouse + - Mus musculus + - 10090 + * - Fly + - Drosophila melanogaster + - 7227 + * - Zebrafish + - Danio rerio + - 7955 + * - Worm + - Caenorhabditis elegans + - 6239 + * - Yeast + - Saccharomyces cerevisiae + - 4932 + Due to the availability of the data, the following combinations are supported: .. list-table:: Available Network Options diff --git a/docs/source/notes/faqs.rst b/docs/source/notes/faqs.rst new file mode 100644 index 0000000..2298c5f --- /dev/null +++ b/docs/source/notes/faqs.rst @@ -0,0 +1,22 @@ +PyGenePlexus FAQs +===================== + +Frequently Asked Questions +-------------------------- + +**How are positive and negative genes determined?** + +In the supervised machine learning model, any gene from the user-supplied +gene list that is able to be converted to an Entrez ID and is also in the network is +considered part of the positive class. + +Genes in the negative class based on the chosen Geneset Context. The default Geneset +Context is Combined, which used all avilable geneset collections. + +GenePlexus then automatically selects the genes in the negative class by: + +#. Considering the total pool of possible negative genes to be any gene that has an annotation to at least one of the terms in the selected geneset collection. +#. Retaining all terms in the selected geneset collection that have between 10 and 200 genes annotated to them. +#. Removing genes that are in the positive class. +#. Performing a hypergeometric test between the genes in the positive class and the lists of genes annotated to every term in the selected geneset collection. If the value of this hypergeometric test is less than 0.05, all genes from the given term are also removed from the pool of possible negative genes. +#. Declaring all the remaining genes in the pool of possible negative genes as the negative class. \ No newline at end of file diff --git a/geneplexus/__init__.py b/geneplexus/__init__.py index 92cd744..1b3dcac 100644 --- a/geneplexus/__init__.py +++ b/geneplexus/__init__.py @@ -2,10 +2,10 @@ .. currentmodule:: geneplexus.GenePlexus -PyGenePlexus enables researchers to predict novel genes similar to their -genes of interest based on their patterns of connectivity in genome-scale -molecular interaction networks, and addtionaly translate these findings -across species. +PyGenePlexus enables researchers to predict genes similar to an uploaded +geneset of interest based on patterns of connectivity in genome-scale +molecular interaction networks, with the ability to translate these +findings across species. .. figure:: ../figures/mainfigure.png :scale: 20 %