Update RNA expression of copy number losses #493

cbethell · 2020-01-31T16:03:54Z

Purpose/implementation Section

The purpose of this PR is to update/revamp the analysis involving RNA expression of copy number losses.
Command line options are implemented, the expression files are updated to use the collapsed files found in the data directory, and the custom functions are refactored.

What scientific question is your analysis addressing?

Do the RNA expression scores of copy number loss calls agree that these calls should be losses?

What GitHub issue does your pull request address?

This PR addresses issue #387.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The calculate_z_score and merge_expression functions found in the rna-expression-functions.R script are edited in this PR in an attempt to speed up the analysis. These functions should be checked for correctness.

Note: For the purpose of plotting, I am using the z-score values (as opposed to the log transformed values that were being used before this PR). I noticed that we were calculating the zscore values but not actively using them, if there are any objections to this I can revert this change.

Note 2: The plots using the stranded expression file in this PR were generated using the subsetted file in the testing directory (the script does not successfully run on my local machine due to the magnitude of the stranded data).

Is there anything that you want to discuss further?

Are there any suggestions for plots that may better display the RNA expression of the copy number loss instances? (I am currently looking into other possible options)
There are 6 plots generated per annotated cn file, which means that there is a total of 36 plots in this module's plots directory. That being said, should we only represent the plots that utilize the consensus seg annotated file? In doing so, we would also be improving the time it takes to run this module as it would not need to loop through the larger CNVkit and Controlfreec annotated files (my idea for this would be putting the annotated consensus files in their own directory and having the shell script loop over the files in there).

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes, this PR is ready for review.

Results

What types of results are included (e.g., table, figure)?

The results include plots for CNVkit, Controlfreec, and the consensus annotated files for each expression file (polyA and stranded).

What is your summary of the results?

Reproducibility Checklist

No new dependencies are included in this PR.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

I will update the README to include the plots generated with this module once we determine which plots we want to keep and/or remove.

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

- implement command line options in `rna-expression-validation.R` and `run-prepare-cn.sh` scripts - adjust custom functions scripts to reflect command line options

- perform log transforming and zscoring on matrix first in `calculate_z_score` function - propagate changes in `merge_expression` function and in `rna-expression-validation.R` script

- run script using polyA expression data for each of the files in `results` (stranded was not run as it was not able to successfully be completed on my local machine due to memory) - introduce a for loop in the shell script that will run through each of the files in `results` for polyA and stranded expression data (this is commented out until the consensus file has a subset that can be used in circleci) - old plots are replaced with the new plots reflecting the zscore polyA data

- uncomment for loop in shell script - add stranded plots for cnvkit, controlfreec, and consensus annotated files (autosomes and x/y files)

- add space in square bracket to fix syntax warning

- fix object name in rna-expresssion-functions.R script - run rna-expression-validation.R script on cnvkit autosome annotated data with polya expression

- updated `display-plots.md` to display only the consensus seg file plots

- update `README` files to reflect changes to the focal -cn-file-preparation module in PR AlexsLemonade#493

jaclyn-taroni

👍 I think this is correct and headed in the direction we want to go (e.g., skipping original caller files by default). As you noted on #387, we are looking to add density plots. We would not necessarily expect all losses to show changes when we look at z-scores (it depends on the gene!) and I have some ideas about where we would go next that I want to marinate on and discuss next week.

- update `README` files to reflect changes to the focal -cn-file-preparation module in PR #493 Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

cbethell and others added 8 commits January 31, 2020 10:40

Implement command line options

f3b64af

- implement command line options in `rna-expression-validation.R` and `run-prepare-cn.sh` scripts - adjust custom functions scripts to reflect command line options

Use collapsed expression files

c10c469

Make file path to independent specimens file a command line option

34db409

Perform log2 transformation and zscoring on matrix first

bc166da

- perform log transforming and zscoring on matrix first in `calculate_z_score` function - propagate changes in `merge_expression` function and in `rna-expression-validation.R` script

Merge branch 'master' into revamp-focal-cn-expression

36447e9

run script with testing (subsetted) stranded expression files

5ec3602

- uncomment for loop in shell script - add stranded plots for cnvkit, controlfreec, and consensus annotated files (autosomes and x/y files)

Merge branch 'master' into revamp-focal-cn-expression

2b95697

cbethell marked this pull request as ready for review February 3, 2020 15:26

cbethell changed the title ~~WIP: Update RNA expression of copy number losses~~ Update RNA expression of copy number losses Feb 3, 2020

cbethell requested a review from jaclyn-taroni February 3, 2020 16:48

jaclyn-taroni and others added 12 commits February 4, 2020 14:28

Focus on running consensus SEG file

74fb5d8

Forgot filename lead

b74210b

Add for loop to go through both annotated consensus files

ff4c0c6

- add space in square bracket to fix syntax warning

Add expression plotting for original caller files

d32c4a6

WIP: filter out conflicting status, copy number

9298d30

Use duplicate filtered copy number data.frame in function argument

cbe456c

- fix object name in rna-expresssion-functions.R script - run rna-expression-validation.R script on cnvkit autosome annotated data with polya expression

rerun rna-expression-validation.R script for remaining files

6f26b65

- updated `display-plots.md` to display only the consensus seg file plots

Merge branch 'master' into revamp-focal-cn-expression

d555a26

Update title of display-plots.md

0032a62

Merge branch 'master' into revamp-focal-cn-expression

e8e8f72

Make the change back to cn_df to save on memory

6a22e89

Rerun everything with full v14 data

aa70b48

cbethell added a commit to cbethell/OpenPBTA-analysis that referenced this pull request Feb 6, 2020

Update analyses/README and module's README to reflect changes

c31077f

- update `README` files to reflect changes to the focal -cn-file-preparation module in PR AlexsLemonade#493

cbethell mentioned this pull request Feb 6, 2020

Update focal-cn-file-preparation README #525

Merged

5 tasks

cbethell and others added 2 commits February 6, 2020 15:07

Remove flags in .circleci that are no longer defined in shell script

66054c1

Merge branch 'master' into revamp-focal-cn-expression

b6e67a8

cbethell mentioned this pull request Feb 7, 2020

Updated analysis: RNA expression of copy number losses #387

Open

Merge branch 'master' into revamp-focal-cn-expression

90f01cb

jaclyn-taroni approved these changes Feb 7, 2020

View reviewed changes

jaclyn-taroni merged commit 8b8ba2b into AlexsLemonade:master Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update RNA expression of copy number losses #493

Update RNA expression of copy number losses #493

cbethell commented Jan 31, 2020 •

edited

Loading

jaclyn-taroni left a comment •

edited

Loading

Update RNA expression of copy number losses #493

Update RNA expression of copy number losses #493

Conversation

cbethell commented Jan 31, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni left a comment • edited Loading

Choose a reason for hiding this comment

cbethell commented Jan 31, 2020 •

edited

Loading

jaclyn-taroni left a comment •

edited

Loading