This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PR 1 of n: Molecular Subtyping - HGG (Defining Lesions) (#352)
* Add `01-HGG-molecular-subtyping-data-prep.Rmd` - add this analysis to `.circleci` * Fix command in `.circleci` * Minor `lintr` format changes * Log2 transform expression data - rerun notebook * Use `controlfreec` cn data - rerun notebook * Create a column better distinguishing specific HGG mutations - rerun notebook * Change `01` nb to look only at HGG defining lesions - remove `results/HGG_molecular_subtypes.tsv` - new output file `results/HGG_defining_lesions.tsv` contains binary columns for all samples distinguishing whether or not they contain any of the four HGG defining lesions - rename `01` nb to better represent its purpose/content - rename object `tmb_df` to `snv_df` * Edit analysis in `.circleci` to reflect nb name change * Remove unused lines of code * Update code to reflect V12 change * Address @jharenza comments * Add to modules at a glance table Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
- Loading branch information
1 parent
4a3bb7e
commit d26866f
Showing
5 changed files
with
4,423 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
141 changes: 141 additions & 0 deletions
141
analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
--- | ||
title: "High-Grade Glioma Molecular Subtyping - Defining Lesions" | ||
output: | ||
html_notebook: | ||
toc: TRUE | ||
toc_float: TRUE | ||
author: Chante Bethell for ALSF CCDL | ||
date: 2019 | ||
--- | ||
|
||
This notebook looks at the defining lesions for all samples for the issue of | ||
molecular subtyping high-grade glioma samples in the OpenPBTA dataset. | ||
|
||
# Usage | ||
|
||
This notebook is intended to be run via the command line from the top directory | ||
of the repository as follows: | ||
|
||
`Rscript -e "rmarkdown::render('analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd', clean = TRUE)"` | ||
|
||
# Set Up | ||
|
||
```{r} | ||
# Get `magrittr` pipe | ||
`%>%` <- dplyr::`%>%` | ||
``` | ||
|
||
## Directories and Files | ||
|
||
```{r} | ||
# Detect the ".git" folder -- this will in the project root directory. | ||
# Use this as the root directory to ensure proper sourcing of functions no | ||
# matter where this is called from | ||
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git")) | ||
# File path to results directory | ||
results_dir <- | ||
file.path(root_dir, "analyses", "molecular-subtyping-HGG", "results") | ||
if (!dir.exists(results_dir)) { | ||
dir.create(results_dir) | ||
} | ||
# Read in metadata | ||
metadata <- | ||
readr::read_tsv(file.path(root_dir, "data", "pbta-histologies.tsv")) | ||
# Select wanted columns in metadata for merging and assign to a new object | ||
select_metadata <- metadata %>% | ||
dplyr::select(Kids_First_Participant_ID, | ||
sample_id, | ||
Kids_First_Biospecimen_ID, | ||
disease_type_new) | ||
# Read in snv consensus mutation data | ||
snv_df <- | ||
data.table::fread(file.path(root_dir, | ||
"data", | ||
"pbta-snv-consensus-mutation.maf.tsv.gz")) | ||
``` | ||
|
||
# Prepare Data | ||
|
||
## SNV consensus mutation data - defining lesions | ||
|
||
```{r} | ||
# Filter the snv consensus mutatation data for the target lesions | ||
snv_lesions_df <- snv_df %>% | ||
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol, HGVSp_Short) %>% | ||
dplyr::mutate( | ||
H3F3A.K28M = dplyr::case_when(Hugo_Symbol == "H3F3A" & | ||
HGVSp_Short == "p.K28M" ~ "Yes", | ||
TRUE ~ "No"), | ||
HIST1H3B.K28M = dplyr::case_when( | ||
Hugo_Symbol == "HIST1H3B" & HGVSp_Short == "p.K28M" ~ "Yes", | ||
TRUE ~ "No" | ||
), | ||
H3F3A.G35R = dplyr::case_when(Hugo_Symbol == "H3F3A" & | ||
HGVSp_Short == "p.G35R" ~ "Yes", | ||
TRUE ~ "No"), | ||
H3F3A.G35V = dplyr::case_when(Hugo_Symbol == "H3F3A" & | ||
HGVSp_Short == "p.G35V" ~ "Yes", | ||
TRUE ~ "No") | ||
) %>% | ||
dplyr::select( | ||
-HGVSp_Short, | ||
-Hugo_Symbol | ||
) | ||
# Join the selected variables from the metadata with the snv consensus mutation | ||
# and defining lesions data.frame | ||
snv_lesions_df <- select_metadata %>% | ||
dplyr::right_join(snv_lesions_df, | ||
by = c("Kids_First_Biospecimen_ID" = "Tumor_Sample_Barcode")) %>% | ||
dplyr::select( | ||
-disease_type_new, | ||
dplyr::everything() | ||
) %>% | ||
dplyr::distinct() %>% | ||
dplyr::mutate( | ||
disease_type_reclassified = dplyr::case_when( | ||
H3F3A.K28M == "Yes" ~ "High-grade glioma, H3 K28 mutant", | ||
HIST1H3B.K28M == "Yes" ~ "High-grade glioma, H3 K28 mutant", | ||
H3F3A.G35R == "Yes" ~ "High-grade glioma, H3 G35 mutant", | ||
H3F3A.G35V == "Yes" ~ "High-grade glioma, H3 G35 mutant", | ||
TRUE ~ as.character(disease_type_new) | ||
) | ||
) | ||
# Display `snv_lesions_df` | ||
snv_lesions_df | ||
``` | ||
|
||
## Save final table of results | ||
|
||
```{r} | ||
# Save final data.frame to file | ||
readr::write_tsv(snv_lesions_df, | ||
file.path(results_dir, "HGG_defining_lesions.tsv")) | ||
``` | ||
|
||
## Inconsistencies in disease classification | ||
|
||
```{r} | ||
# Isolate the samples with the specified mutations that were not classified | ||
# as HGG or DIPG | ||
snv_lesions_df %>% | ||
dplyr::filter( | ||
grepl("High-grade glioma", disease_type_reclassified) & | ||
!(disease_type_new %in% c("High-grade glioma", | ||
"Brainstem glioma- Diffuse intrinsic pontine glioma")) | ||
) | ||
``` | ||
|
||
# Session Info | ||
|
||
```{r} | ||
# Print the session information | ||
sessionInfo() | ||
``` | ||
|
3,204 changes: 3,204 additions & 0 deletions
3,204
analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.nb.html
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.