Update CNV segment to gene mapping: support both formats, use GTF, etc. #253

jaclyn-taroni · 2019-11-08T21:19:09Z

Purpose/implementation

Here I'm making a number of updates to the focal CN file preparation:

Adding a notebook that includes ploidy and status information in the CNVkit file ( analyses/focal-cn-file-preparation/00-add-ploidy-cnvkit.Rmd)
Updating analyses/focal-cn-file-preparation/01-prepare-cn-file.R
- Use either ControlFreeC output from the data download or the CNVkit output from the above notebook
- Use the GTF annotation file to make a GenomicRanges object
- Merge using exons rather than genes
- Map from Ensembl gene identifiers (result of exons change) to gene symbols and cytobands

Issue

Related to #186 and to #217

Directions for reviewers

The sections for finding overlaps and mapping to different identifiers of the updated 01-prepare-cn-file.R should be looked at carefully.
It's possible the mapping to gene symbols should use the information in the GTF, rather than the org.Hs.eg.db package, but this was very easy to implement. Thoughts?
Are we okay with the new tabular format (see below)?

Results

No results yet, but I did make some scientific decisions worth noting here regarding defining gain and loss broadly in the CNVkit output : a) if copy number is less than ploidy, something is marked as a loss b) if copy number is greater than ploidy, something is marked as gain

In analyses/focal-cn-file-preparation/01-prepare-cn-file.R, I change any gain where copy number is greater than (ploidy * 2) to be labeled as an amplification.

The output file format is now:

biospecimen_id	status	copy_number	ploidy	gene_symbol	cytoband	ensembl
BS_007JTNB8	gain	3	2	DDX11L1	1p36.33	ENSG00000223972
BS_007JTNB8	gain	3	2	SH2D5	1p36.12	ENSG00000189410
BS_007JTNB8	gain	3	2	CSF3R	1p34.3	ENSG00000119535
BS_007JTNB8	gain	3	2	CC2D1B	1p32.3	ENSG00000154222
BS_007JTNB8	gain	3	2	FCGR1CP	1q21.1	ENSG00000265531

which is a bit different from what was discussed on #186

Docker and continuous integration

This has already been addressed in earlier pull requests.

jashapiro

Looks good, just one question about whether we need to filter chromosomes with the change to exons and a documentation note.
(Also, note that I updated the PR to fix gain/loss status info there)

jashapiro · 2019-11-09T14:55:26Z

analyses/focal-cn-file-preparation/01-prepare-cn-file.R

+chroms <- paste0("chr", 1:22)
+chrom_filter <- list(tx_chrom = chroms)


With GenomicFeatures::exons(), this should not be necessary, as no exon is mapped to multiple/alternate chromosomes. I don't know if any of the calls fall on non-canonical chromosomes, but we might not want to exclude them at this point.

My thought was it might be more efficient if we drop anything outside this filter, but I have no evidence whatsoever to suggest that I am right about that. I will take it out.

Okay - this step in CI takes much longer (granted I made other changes), but I'm thinking of implementing a filter for CI only per https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci.

To close the loop - that wasn't the issue and I was looking at the wrong branch 🙃 I will leave in the filtering changes though

Ignore due to file size

This reverts commit 36cbb2b.

This reverts commit b0f3615.

jaclyn-taroni · 2019-11-09T20:30:21Z

New format is:

biospecimen_id	status	copy_number	ploidy	ensembl	gene_symbol	cytoband
BS_007JTNB8	gain	3	2	ENSG00000223972	DDX11L1	1p36.33
BS_007JTNB8	gain	3	2	ENSG00000189410	SH2D5	1p36.12
BS_007JTNB8	gain	3	2	ENSG00000119535	CSF3R	1p34.3
BS_007JTNB8	gain	3	2	ENSG00000154222	CC2D1B	1p32.3
BS_007JTNB8	gain	3	2	ENSG00000265531	FCGR1CP	1q21.1

jaclyn-taroni added 7 commits November 8, 2019 14:34

Add chromosome 1:22 filtering step

c764edc

Add notebook for including status in CNVkit

05b0a90

WIP update CN file prep

2feea00

Remove outdated file

be0248e

Use GTF file + exons; add cytoband; support both methods

9cec863

Update module shell script and rerun

9bfe28e

Add TODO notes

a5f8a6b

jaclyn-taroni requested review from jashapiro and cbethell November 8, 2019 21:19

This was referenced Nov 8, 2019

SMARCB1 deletions in ATRT with current SEG to gene mapping #217

Closed

Proposed Analysis: map from SEG file to genes (and broader segments) #186

Closed

jashapiro approved these changes Nov 9, 2019

View reviewed changes

jaclyn-taroni added 7 commits November 9, 2019 11:09

Remove chromosome filter; fixes to shell script

4f827ca

Add -f to gzip step

f452653

Add steps for saving annotation db

36cbb2b

Ignore due to file size

Fix how results are compressed

741d55f

Add chromosome filtering option

eb183ae

Revert "Add steps for saving annotation db"

b0f3615

This reverts commit 36cbb2b.

Revert "Revert "Add steps for saving annotation db""

02e3e70

This reverts commit b0f3615.

jaclyn-taroni merged commit 713d2b8 into AlexsLemonade:master Nov 9, 2019

jaclyn-taroni deleted the 186-both-formats branch November 9, 2019 20:44

cbethell mentioned this pull request Nov 14, 2019

Plot focal CN expression #266

Merged

8 tasks

jaclyn-taroni mentioned this pull request Jan 28, 2020

Update focal CN file prep to use exons again and cover the consensus SEG case #479

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CNV segment to gene mapping: support both formats, use GTF, etc. #253

Update CNV segment to gene mapping: support both formats, use GTF, etc. #253

jaclyn-taroni commented Nov 8, 2019 •

edited by jashapiro

Loading

jashapiro left a comment •

edited

Loading

jashapiro Nov 9, 2019

jaclyn-taroni Nov 9, 2019

jaclyn-taroni Nov 9, 2019

jaclyn-taroni Nov 9, 2019

jaclyn-taroni commented Nov 9, 2019

		chroms <- paste0("chr", 1:22)
		chrom_filter <- list(tx_chrom = chroms)

Update CNV segment to gene mapping: support both formats, use GTF, etc. #253

Update CNV segment to gene mapping: support both formats, use GTF, etc. #253

Conversation

jaclyn-taroni commented Nov 8, 2019 • edited by jashapiro Loading

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

jashapiro left a comment • edited Loading

Choose a reason for hiding this comment

jashapiro Nov 9, 2019

Choose a reason for hiding this comment

jaclyn-taroni Nov 9, 2019

Choose a reason for hiding this comment

jaclyn-taroni Nov 9, 2019

Choose a reason for hiding this comment

jaclyn-taroni Nov 9, 2019

Choose a reason for hiding this comment

jaclyn-taroni commented Nov 9, 2019

jaclyn-taroni commented Nov 8, 2019 •

edited by jashapiro

Loading

jashapiro left a comment •

edited

Loading