Planned data release: V5 #121

jaclyn-taroni · 2019-09-20T20:00:19Z

Currently planned for 24-Sept-2019

Planned addition + changes from @jharenza :

Vardict - VEP annotated MAF (related: vardict and lancet variant calls on deck #103)
Lancet – VEP annotated MAF (related: vardict and lancet variant calls on deck #103)
Updated arriba fusion file (related: (PR 1 of 5) Fusion Analysis: filtering #92 (comment))
RSEM gene level count matrix
RSEM transcript level count matrix (related: Planned Analysis: Isoform analysis / patterns #14 (comment))
Updated lumpy file - to fix T/N columns (related: Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27 (comment))

jaclyn-taroni · 2019-09-20T20:02:48Z

@jharenza we may want to split the expression data files up based on the RNA library based on @jashapiro 's results on #120 - notebook here

jharenza · 2019-09-20T20:09:24Z

ok - separate RDS files for FPKM from each algorithm, as well as counts?

jaclyn-taroni · 2019-09-20T20:17:52Z

That seems like it would be the most flexible way to go as far as downstream options if it's straightforward to do.

jharenza · 2019-09-20T20:18:45Z

Ok, yeah once we merge, I can just separate into two RDS files.

jharenza · 2019-09-23T00:42:04Z

@jharenza we may want to split the expression data files up based on the RNA library based on @jashapiro 's results on #120 - notebook here

Thinking about this again - would it be easier for those working on RNA to do this separation using the clinical file? I ask because many other analyses using these files may have to re-combine (eg - fusion workflow will use both stranded and polyA and if we look for CNV deletion evidence on the basis of RNA expression <1 FPKM, we would recombine these files). I agree we should not cluster/perform certain analyses together, but trying to think of what makes more sense - for those specific analyses tainted by strand to separate or for us to separate all now and make sure other users know they may have to re-combine. Thoughts?

jashapiro · 2019-09-23T01:14:18Z

I would still support separating the files, and having the users combine them as needed. This means that the user will have to explicitly acknowledge that they are combining two different data sets, and may give a slight bit of pause to consider whether that is appropriate. For example, the meaning of FPKM < 1 is likely to be different in the two data sets, and the using the same cutoffs may not be appropriate.

jharenza · 2019-09-23T01:42:46Z

@jashapiro - makes sense! We will plan on separating them then.

jharenza · 2019-09-24T21:23:21Z

@cgreene @jaclyn-taroni @jashapiro - we have the CHANGELOG and all files except the transcript counts file staged and ready to go - that merge is taking longer than anticipated, so @yuankunzhu will either create the PR without that file later tonight and add tomorrow or just create the whole PR tomorrow morning.

jaclyn-taroni added the planned data release label Sep 20, 2019

This was referenced Sep 23, 2019

Update create-subset-files post v5 release #124

Closed

Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

Closed

Sex prediction from RNASeq #123

Merged

yuankunzhu mentioned this issue Sep 25, 2019

Data release/add v5 release #127

Merged

jharenza closed this as completed Sep 27, 2019

This was referenced Sep 27, 2019

Ss gsea hallmark #133

Merged

Update how subset files for CI are generated #136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planned data release: V5 #121

Planned data release: V5 #121

jaclyn-taroni commented Sep 20, 2019 •

edited by jharenza

Loading

jaclyn-taroni commented Sep 20, 2019

jharenza commented Sep 20, 2019

jaclyn-taroni commented Sep 20, 2019

jharenza commented Sep 20, 2019

jharenza commented Sep 23, 2019

jashapiro commented Sep 23, 2019

jharenza commented Sep 23, 2019

jharenza commented Sep 24, 2019

Planned data release: V5 #121

Planned data release: V5 #121

Comments

jaclyn-taroni commented Sep 20, 2019 • edited by jharenza Loading

jaclyn-taroni commented Sep 20, 2019

jharenza commented Sep 20, 2019

jaclyn-taroni commented Sep 20, 2019

jharenza commented Sep 20, 2019

jharenza commented Sep 23, 2019

jashapiro commented Sep 23, 2019

jharenza commented Sep 23, 2019

jharenza commented Sep 24, 2019

jaclyn-taroni commented Sep 20, 2019 •

edited by jharenza

Loading