<<<<<<< HEAD
CWL wrapper around Mutex.
Make sure you have GitHub, Docker, and cwltool installed in your system.
To pull the source code from repository:
git clone https://github.com/OpenGenomics/mutex-tool
Go into the project directory and use a text editor to generate the JSON file.
cd mutex-tool
cp mutex.template.json mutex.json
vi mutex.json
Replace '</full/path/to/data_matrix.txt>' and '</full/path/to/network_file.txt>' with the complete path to the input data and network files, respectively. Save work as 'mutex.json'. Additional optional arguments can be added to this file to override default settings. These settings include:
max_group_size
The maximum size of a result mutex group. Integer value. Default is 5.
first_level_random_iteration
Number of randomization to estimate null distribution of member p-values in mutex groups. Integer. Default is 10000.
second_level_random_iteration
Number of runs to estimate the null distribution of final scores. Integer. Default is 100. If FDR control on results is not
required and only the ranking of the result groups is sufficient, set this parameter to 0.
fdr_cutoff
Users can select a specific FDR cutoff. When not provided, or when set to a negative value, the FDR cutoff that maximizes the
expected value of true positives - false positives is used.
score_cutoff
search_on_signaling_network
Whether to reduce the search space using the signaling network. 'true' or 'false'. Default is 'true'.
genes_file
This parameter can be used to limit the search to a subset of genes. The file should contain a gene symbol per line.
gene_ranking_file
sample_to_tissue_mapping_file
gene_limit
minimum_alteration_count_threshold
Users first should prepare their dataset of gene alterations as a tab-delimited text file, where the first row contains column headings and first column contains gene symbols.
Symbol Sample1 Sample2 Sample3 ...
Gene1 0 1 0 ...
Gene2 2 0 4 ...
Gene3 0 0 0 ...
.
.
Use the following encoding for gene alterations.
0
: No alteration1
: Mutation2
: Amplification3
: Deletion4
: Mutation and amplification5
: Mutation and deletion
To customize the signaling network, make a tab-delimited network file containing 3 columns in the following format
Gene Symbol 1<tab>interaction-type<tab>Gene Symbol 2
Users should prepare a file named parameters.txt
and place it in a directory together with the dataset file. The parameters.txt
file should contain a line that points to the dataset (assume the name of the dataset file is dataset.txt
) as follows:
data-file
= dataset.txt
network-file
: To customize the signaling network, users should have a 3-column, tab-delimited text file as described above. The valid values
for interaction-type are controls-state-change-of
and controls-expression-of
. The first type is meant to be used for post-
translational modification relations between proteins, and the second relation is for transcriptional regulations.
The other possible parameters (below) are optional.
max-group-size
: The maximum size of a result mutex group. Integer value. Default is 5
.
first-level-random-iteration
: Number of randomization to estimate null distribution of member p-values in mutex groups. Integer. Default is 10000
.
second-level-random-iteration
: Number of runs to estimate the null distribution of final scores. Integer. Default is 0
, meaning that estimation is not
required.
fdr-cutoff
: Users can select a specific FDR cutoff. Only applicable when there is an estimation of null distribution of scores, i.e., when
second-level-random-teration
is greater than 0
. When not provided, or when set to a negative value, the FDR cutoff that
maximizes the expected value of true positives - false positives is used.
search-on-signaling-network
: Whether to reduce the search space using the signaling network. true or false. Default is true. If this is set to true, but no
network file is provided using the "network-file" argument, then a default signaling network that is composed from Pathway
Commons, SPIKE and SignaLink databases is used.
genes-file
: This parameter can be used to limit the search to a subset of genes. The file should contain a gene symbol per line.
cwltool mutex.cwl mutex.json
After a run, the result files are output to the working directory.
ranked-groups.txt
: Provides a ranked list of result groups, where the first column contains the score of the group. Use a text or spreadsheet editor to
visualize this file.
fdr-guide.txt
: Provides the mapping between a score cutoff, and the corresponding expected false discovery rates. Use a text or spreadsheet editor to
visualize this file.
result-groups.cus
: Provides a graph that shows result groups, their relations between, and one of their common targets, if a common target is not already in
the result group. To visualize this file open ChiBE, do "SIF -> Load SIF File", change the file
filter from sif
to cus
in the dialog, and select this file.
merged-network.sif
: Provides the minimal network that is produced using the result groups. Group boundaries are not displayed in this graph. Non-member common
targets are displayed in a pale color. To visualize this file open ChiBE, do "SIF -> Load SIF
File", and select this file. Note that ChiBE also uses the file merged-network.format
, so if you move the sif file, do not forget to
move the format file along with it.
Please refer to below paper.
Babur, Özgün, et al. "Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations." Genome biology 16.1 (2015): 45.
32f26c71f05023591ac527d3024bd5d465c9fc20