Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: generate cytoband copy number status file for consumption #497

Closed
jaclyn-taroni opened this issue Feb 2, 2020 · 2 comments
Assignees
Labels
cnv Related to or requires CNV data in progress Someone is working on this issue, but feel free to propose an alternative approach! updated analysis

Comments

@jaclyn-taroni
Copy link
Member

Related to the open draft pull request #485 and #186

What analysis module should be updated and why?

The annotated output of focal-cn-file-preparation (e.g., the contents of results) contains some information about cytobands. Specifically, the ranges in a SEG file are mapped to gene symbols when there's overlap with exons. We then use the Ensembl gene IDs to map to cytobands via the org.Hs.eg.db package. In #485, I looked at assigning cytoband status based on the proportion of gene symbols annotated to that cytoband. The results sometimes disagree with the GISTIC arm status (GISTIC cutoff is 0.98 of arm for an event). We can approach this more directly and that will perhaps lead to more agreement between cytoband status and arm status.

The output of this analysis will feed into #488. Here's a photo from a whiteboarding session - the ❌ indicate things that should be removed and replaced by this ticket.

Image from iOS (24)

What changes need to be made? Please provide enough detail for another participant to make the update.

We can use bedtools intersect to look at the overlap between a data frame with copy number status (e.g., pbta-cnv-controlfreec.tsv.gz or SEG file processed in focal-cn-file-preparation) with cytoband data from UCSC (http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cytoBand.txt.gz).

Chatted with @jashapiro on Friday and I'm recording his advice here. When we use bedtools intersect (docs), the cytoband BED should be treated as A and B is the BED with status, but we will have multiple instances of B - the losses and gains should be treated separately. We will want to use the -wa option to retain the original entry and we will need to pick some threshold for the minimum overlap with a cytoband (-f).

What input data should be used? Which data were used in the version being updated?

analyses/copy_number_consensus_calls/results/pbta-cnv-consensus.seg.gz or the file included upon the next release (#432) that becomes scratch/consensus_seg_with_status.tsv in analyses/focal-cn-file-preparation/02-add-ploidy-consensus.Rmd.

When do you expect the revised analysis will be completed?

~1 week

Who will complete the updated analysis?

@jaclyn-taroni and @cbethell

@cansavvy
Copy link
Collaborator

Chatted with @jashapiro on Friday and I'm recording his advice here. When we use bedtools intersect (docs), the cytoband BED should be treated as A and B is the BED with status, but we will have multiple instances of B - the losses and gains should be treated separately. We will want to use the -wa option to retain the original entry and we will need to pick some threshold for the minimum overlap with a cytoband (-f).

To confirm the thought here, @jashapiro and/or @jaclyn-taroni, we ultimately want the ranges in bed with status to be treated separately, but annotated with the cytoband ranges reported in A? But we do not want ranges in B to be merged together based on A's status? If this is the case, this makes me think we want -wb instead. Or that these intersect ranges should be swapped: cytoband with status should be A and UCSC cytobands should be B and then -wa is fine?
Screen Shot 2020-03-11 at 11 39 19 AM

@jaclyn-taroni
Copy link
Member Author

Closed via #617

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cnv Related to or requires CNV data in progress Someone is working on this issue, but feel free to propose an alternative approach! updated analysis
Projects
None yet
Development

No branches or pull requests

3 participants