Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: RNA expression of copy number losses #387

Open
jaclyn-taroni opened this issue Dec 31, 2019 · 2 comments
Open

Updated analysis: RNA expression of copy number losses #387

jaclyn-taroni opened this issue Dec 31, 2019 · 2 comments
Assignees
Labels
cnv Related to or requires CNV data improvement in progress Someone is working on this issue, but feel free to propose an alternative approach! transcriptomic Related to or requires transcriptomic data updated analysis

Comments

@jaclyn-taroni
Copy link
Member

What analysis module should be updated and why?

focal-cn-file-preparation, specifically the 02-rna-expression-validation.R.

Why should the module be updated?

As noted in #367 (comment), this step gets OOM-killed for me locally and we now include collapsed RNA-seq matrices in the data download that this script could use.

What changes need to be made? Please provide enough detail for another participant to make the update.

At the moment, it is not clear to me what step(s) requires a lot of RAM or takes a long time to run. This file is the first place I would look: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/focal-cn-file-preparation/util/rna-expression-functions.R

What input data should be used? Which data were used in the version being updated?

Previously, RSEM FPKM files were used. I propose that we use pbta-gene-expression-rsem-fpkm-collapsed.polya.rds and pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds.

We also are currently using the ControlFreeC file produced by that module (analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz), but we will eventually want to move to using some kind of consensus file (#128).

When do you expect the revised analysis will be completed?

This may take 1-3 days.

Who will complete the updated analysis?

Not sure.

@jaclyn-taroni jaclyn-taroni added cnv Related to or requires CNV data transcriptomic Related to or requires transcriptomic data labels Jan 18, 2020
@jaclyn-taroni
Copy link
Member Author

The focal-cn-file-preparation module is being revamped (and sped up!) over on #452. It's worth noting that those revisions may have addressed the major bottleneck for that analysis module. I think profiling and improving the RNA-seq expression levels of losses code is a good step to follow #452. As such, I am going to mark this in progress and assign @cbethell. As noted above, we also to look at the expression levels for copy number losses that are in the consensus calls.

@cbethell
Copy link
Contributor

cbethell commented Feb 7, 2020

In addition to the plots in PR #493, density plots were generated using the consensus SEG autosomes files, annotated in the focal-cn-file-preparation module. These plots use z-scored expression values (polyA and stranded expression files are handled separately) to look at the density of copy number calls, specifically looking to validate the losses.

The rendered notebook can be seen here.

The first two plots in the notebook are looking at calls across all genes in the annotated consensus SEG files. In these plots, there does not appear to be much differentiation between neutral and loss calls. Below them are facetted plots focusing on each driver gene. In some instances, these plots agree with the plots above, and in others the plots appear to look slightly more as we would expect (ex. MET).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cnv Related to or requires CNV data improvement in progress Someone is working on this issue, but feel free to propose an alternative approach! transcriptomic Related to or requires transcriptomic data updated analysis
Projects
None yet
Development

No branches or pull requests

2 participants