Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Planned data release: V5 #121

Closed
jaclyn-taroni opened this issue Sep 20, 2019 · 8 comments
Closed

Planned data release: V5 #121

jaclyn-taroni opened this issue Sep 20, 2019 · 8 comments

Comments

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Sep 20, 2019

Currently planned for 24-Sept-2019

Planned addition + changes from @jharenza :

@jaclyn-taroni
Copy link
Member Author

@jharenza we may want to split the expression data files up based on the RNA library based on @jashapiro 's results on #120 - notebook here

@jharenza
Copy link
Collaborator

ok - separate RDS files for FPKM from each algorithm, as well as counts?

@jaclyn-taroni
Copy link
Member Author

That seems like it would be the most flexible way to go as far as downstream options if it's straightforward to do.

@jharenza
Copy link
Collaborator

Ok, yeah once we merge, I can just separate into two RDS files.

@jharenza
Copy link
Collaborator

@jharenza we may want to split the expression data files up based on the RNA library based on @jashapiro 's results on #120 - notebook here

Thinking about this again - would it be easier for those working on RNA to do this separation using the clinical file? I ask because many other analyses using these files may have to re-combine (eg - fusion workflow will use both stranded and polyA and if we look for CNV deletion evidence on the basis of RNA expression <1 FPKM, we would recombine these files). I agree we should not cluster/perform certain analyses together, but trying to think of what makes more sense - for those specific analyses tainted by strand to separate or for us to separate all now and make sure other users know they may have to re-combine. Thoughts?

@jashapiro
Copy link
Member

I would still support separating the files, and having the users combine them as needed. This means that the user will have to explicitly acknowledge that they are combining two different data sets, and may give a slight bit of pause to consider whether that is appropriate. For example, the meaning of FPKM < 1 is likely to be different in the two data sets, and the using the same cutoffs may not be appropriate.

@jharenza
Copy link
Collaborator

@jashapiro - makes sense! We will plan on separating them then.

@jharenza
Copy link
Collaborator

@cgreene @jaclyn-taroni @jashapiro - we have the CHANGELOG and all files except the transcript counts file staged and ready to go - that merge is taking longer than anticipated, so @yuankunzhu will either create the PR without that file later tonight and add tomorrow or just create the whole PR tomorrow morning.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants