This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update CNV segment to gene mapping: support both formats, use GTF, et…
…c. (#253) * Add chromosome 1:22 filtering step * Add notebook for including status in CNVkit * WIP update CN file prep * Remove outdated file * Use GTF file + exons; add cytoband; support both methods * Update module shell script and rerun * Add TODO notes * Remove chromosome filter; fixes to shell script * Add -f to gzip step * Add steps for saving annotation db Ignore due to file size * Fix how results are compressed * Add chromosome filtering option * Revert "Add steps for saving annotation db" This reverts commit 36cbb2b. * Revert "Revert "Add steps for saving annotation db"" This reverts commit b0f3615.
- Loading branch information
1 parent
390f1e0
commit 713d2b8
Showing
8 changed files
with
2,178 additions
and
51 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# AnnotationDbi object is too large to be committed | ||
annotation_files/txdb_from_gencode.v27.gtf.db |
60 changes: 60 additions & 0 deletions
60
analyses/focal-cn-file-preparation/00-add-ploidy-cnvkit.Rmd
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
--- | ||
title: "Add ploidy column, status to CNVkit output" | ||
output: html_notebook | ||
author: J. Taroni for ALSF CCDL | ||
date: 2019 | ||
--- | ||
|
||
The `pbta-histologies.tsv` file contains a `tumor_ploidy` column, which is tumor ploidy as inferred by ControlFreeC. | ||
The copy number information should be interpreted in the light of this information (see: [current version of Data Formats section of README](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/390f1e08e481da5ec0b2c62d886d5fd298bbf017#data-formats)). | ||
|
||
This notebook adds ploidy information to the CNVkit results and adds a status column that defines gain and loss broadly. | ||
|
||
```{r} | ||
library(dplyr) | ||
``` | ||
|
||
### Read in data | ||
|
||
```{r} | ||
cnvkit_file <- file.path("..", "..", "data", "pbta-cnv-cnvkit.seg.gz") | ||
cnvkit_df <- readr::read_tsv(cnvkit_file) | ||
``` | ||
|
||
```{r} | ||
histologies_file <- file.path("..", "..", "data", "pbta-histologies.tsv") | ||
histologies_df <- readr::read_tsv(histologies_file) | ||
``` | ||
|
||
### Add inferred ploidy information to CNVkit results | ||
|
||
```{r} | ||
add_ploidy_df <- histologies_df %>% | ||
select(Kids_First_Biospecimen_ID, tumor_ploidy) %>% | ||
inner_join(cnvkit_df, by = c("Kids_First_Biospecimen_ID" = "ID")) %>% | ||
select(-tumor_ploidy, everything()) | ||
``` | ||
|
||
### Add status column | ||
|
||
This is intended to mirror the information contained in the ControlFreeC output. | ||
|
||
```{r} | ||
add_ploidy_df <- add_ploidy_df %>% | ||
mutate(status = case_when( | ||
# when the copy number is less than inferred ploidy, mark this as a loss | ||
copy.num < tumor_ploidy ~ "loss", | ||
# if copy number is higher than ploidy, mark as a gain | ||
copy.num > tumor_ploidy ~ "gain", | ||
copy.num == tumor_ploidy ~ "neutral" | ||
)) | ||
head(add_ploidy_df, 10) | ||
``` | ||
|
||
### Write to `scratch` | ||
|
||
```{r} | ||
output_file <- file.path("..", "..", "scratch", "cnvkit_with_status.tsv") | ||
readr::write_tsv(add_ploidy_df, output_file) | ||
``` |
1,932 changes: 1,932 additions & 0 deletions
1,932
analyses/focal-cn-file-preparation/00-add-ploidy-cnvkit.nb.html
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file renamed
BIN
+81.9 MB
...file-preparation/results/annotated_cn.tsv → ...ults/cnvkit_annotated_cn_autosomes.tsv.gz
Binary file not shown.
Binary file added
BIN
+26.4 MB
analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters