-
Notifications
You must be signed in to change notification settings - Fork 67
Update CNV segment to gene mapping: support both formats, use GTF, etc. #253
Update CNV segment to gene mapping: support both formats, use GTF, etc. #253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one question about whether we need to filter chromosomes with the change to exons and a documentation note.
(Also, note that I updated the PR to fix gain/loss status info there)
chroms <- paste0("chr", 1:22) | ||
chrom_filter <- list(tx_chrom = chroms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With GenomicFeatures::exons()
, this should not be necessary, as no exon is mapped to multiple/alternate chromosomes. I don't know if any of the calls fall on non-canonical chromosomes, but we might not want to exclude them at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was it might be more efficient if we drop anything outside this filter, but I have no evidence whatsoever to suggest that I am right about that. I will take it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - this step in CI takes much longer (granted I made other changes), but I'm thinking of implementing a filter for CI only per https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To close the loop - that wasn't the issue and I was looking at the wrong branch 🙃 I will leave in the filtering changes though
New format is:
|
Purpose/implementation
Here I'm making a number of updates to the focal CN file preparation:
analyses/focal-cn-file-preparation/01-prepare-cn-file.R
exons
rather thangenes
exons
change) to gene symbols and cytobandsIssue
Related to #186 and to #217
Directions for reviewers
01-prepare-cn-file.R
should be looked at carefully.org.Hs.eg.db
package, but this was very easy to implement. Thoughts?Results
No results yet, but I did make some scientific decisions worth noting here regarding defining gain and loss broadly in the CNVkit output : a) if copy number is less than ploidy, something is marked as a
loss
b) if copy number is greater than ploidy, something is marked asgain
In
analyses/focal-cn-file-preparation/01-prepare-cn-file.R
, I change any gain where copy number is greater than (ploidy * 2) to be labeled as an amplification.The output file format is now:
which is a bit different from what was discussed on #186
Docker and continuous integration
This has already been addressed in earlier pull requests.