Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PR 1 of n: Molecular Subtyping - HGG (Defining Lesions) #352

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
e39ca95
Add `01-HGG-molecular-subtyping-data-prep.Rmd`
cbethell Dec 18, 2019
ecc924f
Fix command in `.circleci`
cbethell Dec 18, 2019
ed68af3
Merge branch 'master' into hgg-molecular-subtyping-data-prep
cbethell Dec 18, 2019
02ba79e
Minor `lintr` format changes
cbethell Dec 18, 2019
254a279
Merge branch 'master' into hgg-molecular-subtyping-data-prep
cbethell Dec 18, 2019
c865960
Log2 transform expression data
cbethell Dec 18, 2019
2981f7c
Merge branch 'master' into hgg-molecular-subtyping-data-prep
cbethell Dec 18, 2019
5ec08c4
Use `controlfreec` cn data
cbethell Dec 18, 2019
fda9e31
Merge branch 'master' of https://github.com/cbethell/OpenPBTA-analysi…
cbethell Dec 19, 2019
e2befa3
Merge branch 'master' into hgg-molecular-subtyping-data-prep
jaclyn-taroni Dec 19, 2019
cd9bd8e
Merge branch 'master' into hgg-molecular-subtyping-data-prep
cbethell Dec 19, 2019
f0b7e4c
Merge branch 'hgg-molecular-subtyping-data-prep' of https://github.co…
cbethell Dec 19, 2019
a1a132e
Create a column better distinguishing specific HGG mutations
cbethell Dec 19, 2019
3a5adb4
Merge branch 'master' of https://github.com/cbethell/OpenPBTA-analysi…
cbethell Jan 3, 2020
85b4b13
Change `01` nb to look only at HGG defining lesions
cbethell Jan 3, 2020
2885f1f
Edit analysis in `.circleci` to reflect nb name change
cbethell Jan 3, 2020
900f803
Remove unused lines of code
cbethell Jan 3, 2020
8bdd4c6
Update code to reflect V12 change
cbethell Jan 3, 2020
e5e4cf2
Merge branch 'master' into hgg-molecular-subtyping-data-prep
jaclyn-taroni Jan 4, 2020
a10cded
Address @jharenza comments
jaclyn-taroni Jan 4, 2020
caf52e3
Add to modules at a glance table
jaclyn-taroni Jan 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,10 @@ jobs:
- run:
name: Process SV file
command: ./scripts/run_in_ci.sh Rscript analyses/sv-analysis/01-process-sv-file.R

- run:
name: Molecular Subtyping - HGG
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd', clean = TRUE)"

- run:
name: Oncoprint plotting
Expand Down
1 change: 1 addition & 0 deletions analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `independent-specimens.wgs.primary.tsv`, `independent-specimens.wgs.primary-plus.tsv`, `independent-specimens.wgswxs.primary.tsv`, `independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
| [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.wgs.primary-plus.tsv`, `pbta-snv-consensus-mutation.maf.tsv.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A
| [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv`, `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds`, `analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz`, `pbta-snv-consensus-mutation-tmb.tsv` | *In progress*; summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244) | N/A
| [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG) | `pbta-snv-consensus-mutation.maf.tsv.gz` | *In progress*; molecular subtyping of high-grade glioma samples [#249](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) | N/A
| [`molecular-subtyping-SHH-tp53`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-SHH-tp53) | `pbta-histologies` and `pbta-snv-consensus-mutation.maf.tsv.gz` | Identify the SHH-classified medulloblastoma samples that have TP53 mutations | N/A
| [`mutational-signatures`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutational-signatures) | `pbta-snv-consensus-mutation.maf.tsv.gz` | Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | N/A
| [`mutect2-vs-strelka2`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutect2-vs-strelka2) | `pbta-snv-mutect2.vep.maf.gz`, `pbta-snv-strelka2.vep.maf.gz` | *Deprecated*; comparison of only two SNV callers, subsumed by `snv-callers` | N/A
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
title: "High-Grade Glioma Molecular Subtyping - Defining Lesions"
output:
html_notebook:
toc: TRUE
toc_float: TRUE
author: Chante Bethell for ALSF CCDL
date: 2019
---

This notebook looks at the defining lesions for all samples for the issue of
molecular subtyping high-grade glioma samples in the OpenPBTA dataset.

# Usage

This notebook is intended to be run via the command line from the top directory
of the repository as follows:

`Rscript -e "rmarkdown::render('analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd', clean = TRUE)"`

# Set Up

```{r}
# Get `magrittr` pipe
`%>%` <- dplyr::`%>%`
```

## Directories and Files

```{r}
# Detect the ".git" folder -- this will in the project root directory.
# Use this as the root directory to ensure proper sourcing of functions no
# matter where this is called from
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))

# File path to results directory
results_dir <-
file.path(root_dir, "analyses", "molecular-subtyping-HGG", "results")

if (!dir.exists(results_dir)) {
dir.create(results_dir)
}

# Read in metadata
metadata <-
readr::read_tsv(file.path(root_dir, "data", "pbta-histologies.tsv"))

# Select wanted columns in metadata for merging and assign to a new object
select_metadata <- metadata %>%
dplyr::select(Kids_First_Participant_ID,
sample_id,
Kids_First_Biospecimen_ID,
disease_type_new)

# Read in snv consensus mutation data
snv_df <-
data.table::fread(file.path(root_dir,
"data",
"pbta-snv-consensus-mutation.maf.tsv.gz"))
```

# Prepare Data

## SNV consensus mutation data - defining lesions

```{r}
# Filter the snv consensus mutatation data for the target lesions
snv_lesions_df <- snv_df %>%
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol, HGVSp_Short) %>%
dplyr::mutate(
H3F3A.K28M = dplyr::case_when(Hugo_Symbol == "H3F3A" &
HGVSp_Short == "p.K28M" ~ "Yes",
TRUE ~ "No"),
HIST1H3B.K28M = dplyr::case_when(
Hugo_Symbol == "HIST1H3B" & HGVSp_Short == "p.K28M" ~ "Yes",
TRUE ~ "No"
),
H3F3A.G35R = dplyr::case_when(Hugo_Symbol == "H3F3A" &
HGVSp_Short == "p.G35R" ~ "Yes",
TRUE ~ "No"),
H3F3A.G35V = dplyr::case_when(Hugo_Symbol == "H3F3A" &
HGVSp_Short == "p.G35V" ~ "Yes",
TRUE ~ "No")
) %>%
dplyr::select(
-HGVSp_Short,
-Hugo_Symbol
)

# Join the selected variables from the metadata with the snv consensus mutation
# and defining lesions data.frame
snv_lesions_df <- select_metadata %>%
dplyr::right_join(snv_lesions_df,
by = c("Kids_First_Biospecimen_ID" = "Tumor_Sample_Barcode")) %>%
dplyr::select(
-disease_type_new,
dplyr::everything()
) %>%
dplyr::distinct() %>%
dplyr::mutate(
disease_type_reclassified = dplyr::case_when(
H3F3A.K28M == "Yes" ~ "High-grade glioma, H3 K28 mutant",
HIST1H3B.K28M == "Yes" ~ "High-grade glioma, H3 K28 mutant",
H3F3A.G35R == "Yes" ~ "High-grade glioma, H3 G35 mutant",
H3F3A.G35V == "Yes" ~ "High-grade glioma, H3 G35 mutant",
TRUE ~ as.character(disease_type_new)
)
)

# Display `snv_lesions_df`
snv_lesions_df
```

## Save final table of results

```{r}
# Save final data.frame to file
readr::write_tsv(snv_lesions_df,
file.path(results_dir, "HGG_defining_lesions.tsv"))
```

## Inconsistencies in disease classification

```{r}
# Isolate the samples with the specified mutations that were not classified
# as HGG or DIPG
snv_lesions_df %>%
dplyr::filter(
grepl("High-grade glioma", disease_type_reclassified) &
!(disease_type_new %in% c("High-grade glioma",
"Brainstem glioma- Diffuse intrinsic pontine glioma"))
)
```

# Session Info

```{r}
# Print the session information
sessionInfo()
```

Large diffs are not rendered by default.

Loading