-
Notifications
You must be signed in to change notification settings - Fork 67
Update/Revamp focal-cn-file-preparation
module
#452
Update/Revamp focal-cn-file-preparation
module
#452
Conversation
…into speed_up_granges
- compare cytoband calls - update `README` - update comment in shell script and add `exon_file` option - Incorporate GISTIC calls in final cn tables - rerun `01-prepare-cn-file.R`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool beans! We are getting there!! 🚀
copy_number = cnv_gr$copy_number[overlaps_ucsc@from], | ||
ploidy = cnv_gr$tumor_ploidy[overlaps_ucsc@from], | ||
ensembl = cnv_gr$org_id[overlaps_ucsc@from], | ||
cytoband = ucsc_cytoband_gr@elementMetadata@listData$cytoband[overlaps_ucsc@to], | ||
stringsAsFactors = FALSE | ||
) %>% | ||
dplyr::distinct() %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this distinct doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to remove this distinct or determine what it was doing, @cbethell ?
@cbethell @cansavvy - as you are working on this, keep in mind that there may be a period of time where we have a consensus SEG file that we want to annotate but no GISTIC file yet. (See #186 (comment) for more context.) Are there ways to make the GISTIC steps optional here? That change can be implemented in a later pull request, but it's a good to keep an eye on any design decisions here that may make that more difficult to implement later. |
…into speed_up_granges
- rename instances of `exon` to `cds` - update comments - add large files (`gistic-results` and `results/cnvkit_annotated_cn_autosomes.tsv.gz` which is 101.2 MB) to `.gitignore` - add inclusion of gistic results as an option - update shell script to include gistic option - remove `cytoband_org` variable - update `README.md` - refactor section of script reading in and wrangling GISTIC data
- minor update to comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbethell We are getting there! You did a good job trimming out some of the duplicate items! There's a few documentation adds I think we should make and I wanna take a look at your file that keeps changing sizes because I wanna get to the bottom of that before we call it a day.
# | ||
# Usage: bash run-prepare-cn.sh | ||
|
||
set -e | ||
set -o pipefail | ||
|
||
XYFLAG=${OPENPBTA_XY:-1} | ||
cds_file=../../scratch/gencode.v27.primary_assembly.annotation.bed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been meaning to do this where I have this in snv-callers
but we should probably include an indicator in this file name that tells people it's been filtered to CDS sequences only. Because it is a scratch file and isn't meant to stick around, I haven't made it a priority to change this. My bad.
- update comments - rename gene id column - update comment in shell script - update column documentation in README - run styler
- remove code in `.gitignore` file re large `cnvkit_annotated_autosomes.tsv.gz` file that no longer exists - remove `gzip` step in `run-prepare-cn.sh` as this is handled in the R script
- update `run-oncoprint.sh`, `00-subset-files-for-ATRT.R`, `02-HGG-molecular-subtyping-subset-files.R`
@cansavvy I made the above documentation changes and changed the format of the output compressed files to Let me know if you find something that I may have missed or that may need some refactoring. |
@cbethell has the file stopped changing sizes now? Were you able to get to the bottom of that? |
@cansavvy the file was always the same size, it seems that one of the columns just compressed differently. I realized that the smaller file that I thought was a new file at the end of the day yesterday was really just the old compressed file so it all makes sense now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool beans. This LGTM then! Assuming CircleCI passes.
…SEG case (#479) * Back to exons; before GISTIC Pre: #452 * Have expression script use gzipped; skip for now * Renumber scripts and notebooks * Add notebook for preparing consensus SEG file Add ploidy! * Make the cnvkit option name more general * Rip out CDS steps * Rip out GISTIC option * Add back in the GTF option * Remove the CDS + cytoband files * Rename ID column in consensus prep * Add consensus annotation to shell script; rerun * Remove numbering from RNA expression step * Update module README * Update modules at a glance entry * Add section summarizing seg.mean by status * Fix downstream modules * Update documentation to go back to gzipped file * Get a bit smarter about paths and filenames Also rerun + use $XYFLAG for the consensus step * Use compressed consensus SEG file * Rerun with new gzipped consensus SEG * Skip consensus file annotation in CI * Apply suggestions from code review Co-Authored-By: Candace Savonen <cansav09@gmail.com> * Update documentation * Remove outdated comment * Flesh out example usage Co-authored-by: Candace Savonen <cansav09@gmail.com>
Purpose/implementation Section
Per #186 (comment), update and revamp the
focal-cn-file-preparation
module.What GitHub issue does your pull request address?
This PR addresses issue #186.
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Note: I’m summarizing the GISTIC
broad_values_by_arm.txt
calls intoloss
,gain
, andneutral
, andamplification
in thebroad_status
column, the same coding used for the values in thestatus
column.Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes, this is ready for review.
Results
The results in this PR include the updated tsv files found in this module's
results
directory as follows:analyses/focal-cn-file-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz
analyses/focal-cn-file-preparation/results/cnvkit_annotated_cn_x_and_y.tsv.gz
analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz
analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_x_and_y.tsv.gz
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.