Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
PR 1 of n: Molecular Subtyping - HGG (Defining Lesions) (#352)
Browse files Browse the repository at this point in the history
* Add `01-HGG-molecular-subtyping-data-prep.Rmd`

- add this analysis to `.circleci`

* Fix command in `.circleci`

* Minor `lintr` format changes

* Log2 transform expression data 

- rerun notebook

* Use `controlfreec` cn data

- rerun notebook

* Create a column better distinguishing specific HGG mutations

- rerun notebook

* Change `01` nb to look only at HGG defining lesions

- remove `results/HGG_molecular_subtypes.tsv`
- new output file `results/HGG_defining_lesions.tsv` contains binary columns for all samples distinguishing whether or not they contain any of the four HGG defining lesions
- rename `01` nb to better represent its purpose/content
- rename object `tmb_df` to `snv_df`

* Edit analysis in `.circleci` to reflect nb name change

* Remove unused lines of code

* Update code to reflect V12 change

* Address @jharenza comments

* Add to modules at a glance table

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
  • Loading branch information
cbethell and jaclyn-taroni committed Jan 4, 2020
1 parent 4a3bb7e commit d26866f
Show file tree
Hide file tree
Showing 5 changed files with 4,423 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,10 @@ jobs:
- run:
name: Process SV file
command: ./scripts/run_in_ci.sh Rscript analyses/sv-analysis/01-process-sv-file.R

- run:
name: Molecular Subtyping - HGG
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd', clean = TRUE)"

- run:
name: Oncoprint plotting
Expand Down
1 change: 1 addition & 0 deletions analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `independent-specimens.wgs.primary.tsv`, `independent-specimens.wgs.primary-plus.tsv`, `independent-specimens.wgswxs.primary.tsv`, `independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
| [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.wgs.primary-plus.tsv`, `pbta-snv-consensus-mutation.maf.tsv.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A
| [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv`, `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds`, `analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz`, `pbta-snv-consensus-mutation-tmb.tsv` | *In progress*; summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244) | N/A
| [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG) | `pbta-snv-consensus-mutation.maf.tsv.gz` | *In progress*; molecular subtyping of high-grade glioma samples [#249](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) | N/A
| [`molecular-subtyping-SHH-tp53`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-SHH-tp53) | `pbta-histologies` and `pbta-snv-consensus-mutation.maf.tsv.gz` | Identify the SHH-classified medulloblastoma samples that have TP53 mutations | N/A
| [`mutational-signatures`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutational-signatures) | `pbta-snv-consensus-mutation.maf.tsv.gz` | Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | N/A
| [`mutect2-vs-strelka2`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutect2-vs-strelka2) | `pbta-snv-mutect2.vep.maf.gz`, `pbta-snv-strelka2.vep.maf.gz` | *Deprecated*; comparison of only two SNV callers, subsumed by `snv-callers` | N/A
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
title: "High-Grade Glioma Molecular Subtyping - Defining Lesions"
output:
html_notebook:
toc: TRUE
toc_float: TRUE
author: Chante Bethell for ALSF CCDL
date: 2019
---

This notebook looks at the defining lesions for all samples for the issue of
molecular subtyping high-grade glioma samples in the OpenPBTA dataset.

# Usage

This notebook is intended to be run via the command line from the top directory
of the repository as follows:

`Rscript -e "rmarkdown::render('analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd', clean = TRUE)"`

# Set Up

```{r}
# Get `magrittr` pipe
`%>%` <- dplyr::`%>%`
```

## Directories and Files

```{r}
# Detect the ".git" folder -- this will in the project root directory.
# Use this as the root directory to ensure proper sourcing of functions no
# matter where this is called from
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))
# File path to results directory
results_dir <-
file.path(root_dir, "analyses", "molecular-subtyping-HGG", "results")
if (!dir.exists(results_dir)) {
dir.create(results_dir)
}
# Read in metadata
metadata <-
readr::read_tsv(file.path(root_dir, "data", "pbta-histologies.tsv"))
# Select wanted columns in metadata for merging and assign to a new object
select_metadata <- metadata %>%
dplyr::select(Kids_First_Participant_ID,
sample_id,
Kids_First_Biospecimen_ID,
disease_type_new)
# Read in snv consensus mutation data
snv_df <-
data.table::fread(file.path(root_dir,
"data",
"pbta-snv-consensus-mutation.maf.tsv.gz"))
```

# Prepare Data

## SNV consensus mutation data - defining lesions

```{r}
# Filter the snv consensus mutatation data for the target lesions
snv_lesions_df <- snv_df %>%
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol, HGVSp_Short) %>%
dplyr::mutate(
H3F3A.K28M = dplyr::case_when(Hugo_Symbol == "H3F3A" &
HGVSp_Short == "p.K28M" ~ "Yes",
TRUE ~ "No"),
HIST1H3B.K28M = dplyr::case_when(
Hugo_Symbol == "HIST1H3B" & HGVSp_Short == "p.K28M" ~ "Yes",
TRUE ~ "No"
),
H3F3A.G35R = dplyr::case_when(Hugo_Symbol == "H3F3A" &
HGVSp_Short == "p.G35R" ~ "Yes",
TRUE ~ "No"),
H3F3A.G35V = dplyr::case_when(Hugo_Symbol == "H3F3A" &
HGVSp_Short == "p.G35V" ~ "Yes",
TRUE ~ "No")
) %>%
dplyr::select(
-HGVSp_Short,
-Hugo_Symbol
)
# Join the selected variables from the metadata with the snv consensus mutation
# and defining lesions data.frame
snv_lesions_df <- select_metadata %>%
dplyr::right_join(snv_lesions_df,
by = c("Kids_First_Biospecimen_ID" = "Tumor_Sample_Barcode")) %>%
dplyr::select(
-disease_type_new,
dplyr::everything()
) %>%
dplyr::distinct() %>%
dplyr::mutate(
disease_type_reclassified = dplyr::case_when(
H3F3A.K28M == "Yes" ~ "High-grade glioma, H3 K28 mutant",
HIST1H3B.K28M == "Yes" ~ "High-grade glioma, H3 K28 mutant",
H3F3A.G35R == "Yes" ~ "High-grade glioma, H3 G35 mutant",
H3F3A.G35V == "Yes" ~ "High-grade glioma, H3 G35 mutant",
TRUE ~ as.character(disease_type_new)
)
)
# Display `snv_lesions_df`
snv_lesions_df
```

## Save final table of results

```{r}
# Save final data.frame to file
readr::write_tsv(snv_lesions_df,
file.path(results_dir, "HGG_defining_lesions.tsv"))
```

## Inconsistencies in disease classification

```{r}
# Isolate the samples with the specified mutations that were not classified
# as HGG or DIPG
snv_lesions_df %>%
dplyr::filter(
grepl("High-grade glioma", disease_type_reclassified) &
!(disease_type_new %in% c("High-grade glioma",
"Brainstem glioma- Diffuse intrinsic pontine glioma"))
)
```

# Session Info

```{r}
# Print the session information
sessionInfo()
```

Large diffs are not rendered by default.

Loading

0 comments on commit d26866f

Please sign in to comment.