- The dataset I have selected for this assignment is from the CBio Portal for Cancer Genomics website.
- The full data set is in tar / gzip format that can be downloaded directly from https://www.cbioportal.org/study/summary?id=mel_mskimpact_2020.
- I will focus on the clinical data files (data_clinical_patient.txt, data_clinical_sample.txt, and data_mutations.txt), primarily the clinical patient file. These are tab delimited plain text files that can be accessed after the tar is decompressed.
- There is a mix of categorical (factors), continuous and numerical features (some to be inferred by R code).
- This data represents the targeted sequencing (MSK-IMPACT) of 696 melanoma tumour / normal pairs.
- There is a corresponding published medical journal entitled “Therapeutic Implications of Detecting MAPK-Activating Alterations in Cutaneous and Unknown Primary Melanomas”.
Run All Code Chunks in final-project-report.qmd
from RStudio project & render to HTML or PDF
- See skin cancer dataset analysis in full HTML report format at: Full Report