The dataset contains gene expressions, protein expressions and patients’ survival times across 4 cancers (kidney renal clear cell carcinoma, ovarian serous cystadenocarcinoma, skin cutaneous melanoma and head and neck squamous cell carcinoma. The genes/proteins are key members of 12 core pathways in those 4 cancers.
We retrieve the genomic, proteomic and clinical data from TCGA (https://tcga- data.nci.nih.gov/docs/publications/tcga/?) using TCGA-Assembler (Zhu et al., 2014)
The TCGA data portal has been moved to https://gdc.cancer.gov/access-data TCGA-Assembler (http://www.compgenome.org/TCGA-Assembler/) or other similar software can be used to download the data.
The code implements the MCMC algorithm of BEHAVIOR model described in Section 2. It allows for drawing posterior samples from the model and making prediction in survival times for testing dataset.
The Matlab compiled executable is submitted in a zip file with the manuscript, which can be run on any platform with or without Matlab license.
Free MATLAB Runtime (v9.1) can be downloaded and installed from (http://www.mathworks.com/products/compiler/mcr/).
Table 1 and Figures 2-6
The main function is BEHAVIOR which takes 6 inputs and returns 10 outputs which are necessary to reproduce our results in simulations and case studies.
parameter.csv
: in the format of [N,s] where N is the number of MCMC iterations and s is the seed used by random number generator.y.csv
: n by 2 survival variable with 1st column being survival/censoring times and 2nd column being censoring indicator (1=death,0=censored)P.csv
: n by p proteinsG.csv
: n by p genesPt.csv
: n by p proteins for test dataGt.csv
: n by p genes for test data
reg_coef
: protein effectreg_rate
: inclusion probability of proteinlambda
: thresholdlin_coef
: linear gene effectnlin_coef
: nonlinear gene effectconst_coef
: constant gene effectlin_rate
: inclusion probability of linear gene effectnlin_rate
: inclusion probability of nonlinear gene effectconst_rate
: inclusion probability of constant gene effectypred
: predictive values of y for training data Pt and Gt.