Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Update ATRT subtyping to use minimal set of genes #462

Merged
merged 9 commits into from
Jan 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`immune-deconv`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/immune-deconv) | `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | Immune/Stroma characterization across PBTA (part of [#15](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/15)) | `results/deconv-output.RData`
| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `results/independent-specimens.wgs.primary.tsv` <br> `results/independent-specimens.wgs.primary-plus.tsv` <br> `results/independent-specimens.wgswxs.primary.tsv` <br> `results/independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
| [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.wgs.primary-plus.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A
| [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/focal-cn-file-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz` <br> `pbta-snv-consensus-mutation-tmb-all.tsv` <br> `pbta-cnv-cnvkit-gistic.zip` | *In progress*; summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244) | N/A
| [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz` <br> `pbta-snv-consensus-mutation-tmb-all.tsv` <br> `2019-01-28-consensus-cnv.zip` from [#453 (comment)](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/453#issuecomment-579340618) | Summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244); this analysis did not work | N/A
| [`molecular-subtyping-embryonal`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-embryonal) | `fusion_summary_embryonal_foi.tsv` <br> `pbta-histologies.tsv` <br> `analyses/focal-cn-file-preparation/cnvkit_annotated_cn_autosomes.tsv.gz` <br> `analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x_and_y.tsv.gz` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | *In progress*; molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors [#251](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/251) | N/A
| [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG) | `pbta-snv-consensus-mutation.maf.tsv.gz` <br> `analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz` <br> `pbta-fusion-putative-oncogenic.tsv` <br> `pbta-cnv-cnvkit-gistic.zip` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | *In progress*; molecular subtyping of high-grade glioma samples [#249](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) | N/A
| [`molecular-subtyping-SHH-tp53`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-SHH-tp53) | `pbta-histologies` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | Identify the SHH-classified medulloblastoma samples that have TP53 mutations | N/A
Expand Down
26 changes: 14 additions & 12 deletions analyses/molecular-subtyping-ATRT/00-subset-files-for-ATRT.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ if (!dir.exists(results_dir)) {
dir.create(results_dir)
}

scratch_dir <- file.path(root_dir, "scratch")

# Read in metadata
metadata <-
readr::read_tsv(file.path(root_dir, "data", "pbta-histologies.tsv"))
Expand Down Expand Up @@ -63,15 +65,13 @@ stranded_expression <-
)

# Read in focal CN data
## TODO: This section will be updated to read in focal CN data derived from
## copy number consensus calls.
cn_df <- readr::read_tsv(
file.path(
root_dir,
"analyses",
"focal-cn-file-preparation",
"results",
"cnvkit_annotated_cn_autosomes.tsv.gz"
"consensus_seg_annotated_cn_autosomes.tsv.gz"
)
)

Expand All @@ -82,15 +82,17 @@ tmb_df <-
"pbta-snv-consensus-mutation-tmb-all.tsv"))

# Read in GISTIC `broad_values_by_arm.txt` file
gistic_df <-
data.table::fread(unzip(
file.path(root_dir, "data", "pbta-cnv-cnvkit-gistic.zip"),
files = file.path(
"2019-12-10-gistic-results-cnvkit",
"broad_values_by_arm.txt"
),
exdir = file.path(root_dir, "scratch")
), data.table = FALSE)
# TODO: update once the consensus GISTIC results are in the data release
download.file(url = "https://github.com/AlexsLemonade/OpenPBTA-analysis/files/4123481/2019-01-28-consensus-cnv.zip",
destfile = file.path(scratch_dir, "2019-01-28-consensus-cnv.zip"),
quiet = TRUE)
unzip(file.path(scratch_dir, "2019-01-28-consensus-cnv.zip"),
exdir = file.path(scratch_dir, "2019-01-28-consensus-cnv"),
files = file.path("2019-01-28-consensus-cnv", "broad_values_by_arm.txt"))
gistic_df <- data.table::fread(file.path(scratch_dir,
"2019-01-28-consensus-cnv",
"broad_values_by_arm.txt"),
data.table = FALSE)

#### Filter metadata -----------------------------------------------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -198,51 +198,37 @@ collapsed_metadata %>%

```{r}
# Define target overexpressed gene vectors
# https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244#issuecomment-576850172
tyr_genes <-
c("TYR",
"MITF",
"DCT",
"VEGFA",
"DNAH11",
"SPEF1",
"POU3F4",
"POU3F2",
"PBX1")
"MSX2",
"STAT3",
"PRRX1",
"LMX1",
"OTX2")
shh_genes <-
c(
"MYCN",
"GLI2",
"CDK6",
"ASCL1",
"HES5/6",
"DLL1/3",
"ZBTB7A",
"RXF3",
"RXF2",
"MYBL2",
"MXI1",
"MEIS3",
"MEIS2",
"MAX",
"INSM1",
"FOXK1"
"HES5",
"HES6",
"DLL1",
"DLL3",
"LHX2",
"TEAD1"
)
myc_genes <-
c(
"MYC",
"HOTAIR",
"HOX",
"TCF7L2",
"STAT1",
"REST",
"RARG",
"RAD21",
"NR4A2",
"IRF9",
"IRF8",
"FOXC1",
"CEBPB",
"ATF4"
"TEAD3"
)

# Filter to only the genes of interest
Expand Down Expand Up @@ -394,6 +380,12 @@ rm(gistic_df, atrt_expression_cn_tmb_df)
# Save final table of results

```{r}
# For reordering the output, we will use the vector of genes as input but we
# need to account for genes that are missing from the expression matrix
tyr_genes <- intersect(colnames(final_df), tyr_genes)
shh_genes <- intersect(colnames(final_df), shh_genes)
myc_genes <- intersect(colnames(final_df), myc_genes)

# Save final data.frame
final_df <- final_df %>%
dplyr::select(
Expand All @@ -407,31 +399,13 @@ final_df <- final_df %>%
location_summary,
chr_22q_loss,
SMARCB1_focal_status,
TYR,
MITF,
DCT,
VEGFA,
DNAH11,
SPEF1,
POU3F4,
POU3F2,
PBX1,
!!! rlang::syms(tyr_genes),
SMARCA4_focal_status,
HALLMARK_NOTCH_SIGNALING,
MYCN,
GLI2,
CDK6,
ASCL1,
ZBTB7A,
MYBL2,
MXI1,
MEIS3,
MEIS2,
MAX,
INSM1,
FOXK1,
!!! rlang::syms(shh_genes),
HALLMARK_MYC_TARGETS_V1,
HALLMARK_MYC_TARGETS_V2,
!!! rlang::syms(myc_genes),
dplyr::everything()
)

Expand Down
Loading