Planned data release: V11 #287

jharenza · 2019-11-22T22:15:43Z

What data file(s) does this issue pertain to?

V10 collapsed files were tables of genes removed, not the collapsed matrices, #248

What release are you using?

V10

Put a link to the relevant section of the OpenPBTA-manuscript here.

NA

Put your question or report your issue here.

Planned data to release:

RNA-Seq collapsed FPKM files, Collapsed RNA-seq matrices with unique gene symbols #248
Putative oncogenic fusion TSV, Planned Analysis: Filter and Annotate Fusions #39

jashapiro · 2019-11-22T22:22:06Z

As long as there is an update planned, I will be able to provide updated consensus SNV files. (With MNVs included)

jaclyn-taroni · 2019-11-23T11:32:48Z

Related: #275

jaclyn-taroni · 2019-11-25T19:55:20Z

New consensus files from @jashapiro: https://open-pbta.s3.us-east-1.amazonaws.com/data/snv-consensus/snv-consensus-20191125.zip

jharenza · 2019-11-26T12:56:30Z

Re: putative oncogenic fusion (above), I am thinking we release the final prioritized list for the oncoprints, which would also become a supplemental table (I guess we add to the manuscript doc later?). cc: @jaclyn-taroni

jaclyn-taroni · 2019-11-26T14:27:25Z

To clarify, you are referring to this file: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/49acc98f5ffd86853fc70f220623311e13e3ca9f/analyses/fusion_filtering/results/PutativeOncogenicFusion.tsv correct @jharenza?

What are you going to name this file? I lean towards something like pbta-fusion-putative-oncogenic.tsv.

Regarding the new consensus mutation files, I think since we will probably put the pbta-fusion-putative-oncogenic.tsv directly in the release folder, we should make some changes to how the consensus mutation files are distributed.

The file structure of snv-consensus-20191125.zip is as follows:

.
├── README.md
├── consensus_mutation.maf.tsv
└── consensus_mutation_tmb.tsv

Could we rename the contents to:
consensus_mutation.maf.tsv -> pbta-snv-consensus-mutation.maf.tsv (+ possibly compress this file)
consensus_mutation_tmb.tsv -> pbta-snv-consensus-mutation-tmb.tsv

And then include these in the "top directory" of the release folder? I propose we move the information in the README file in the folder to probably 1) the release notes for this release and 2) the main README of the repository. We may also want to include a Markdown file with the headers for these consensus files in doc/format as the first one is "MAF-like."

It may behoove us to create a section under Data Formats that talks specifically about these "derivative" data files (e.g., prioritized fusion list, consensus mutation files, collapsed expression matrices and maybe even the independent specimens files) and how we expect folks to use them.

I want to note that I'd prefer we really consider the documentation changes and wait until after the holiday to release v11 over getting this out today without documentation changes. We're also going to have to make a bunch of changes to how we generate the CI files before things consuming these files can get through reviewed and merged.

jaclyn-taroni · 2019-11-26T14:35:00Z

Moving the discussion over from #248 -- can we include the collapsed/summarized matrices and call them pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds and pbta-gene-expression-rsem-fpkm-collapsed.polya.rds please?

This is what I would expect based on reading the comments in code that generated them and we will ideally link the code in new section under Data Formats that I proposed in the comment above.

I think we can remove the following files that were included in v10 from the download: pbta-gene-expression-rsem-fpkm-collapsed_table.polya.rds and pbta-gene-expression-rsem-fpkm-collapsed_table.polya.rds

Sounds like @komalsrathi will be filing a pull request with some additional analyses today (#248 (comment)). We can link to the notebook/script that gets added in Data Formats as well and direct folks to the analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed_table.polya.rds and analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed_table.stranded.rds files if they need more information.

jharenza · 2019-11-26T18:24:43Z

@jaclyn-taroni yes to the renaming and the putative oncogenic fusion file. @yuankunzhu is on it!

jaclyn-taroni · 2019-11-26T18:34:12Z

Great, thank you. If that pull request allows edits from maintainers, we can have someone on our side make the doc/format change + consensus mutation README additions.

jharenza · 2019-12-02T21:53:44Z

closed with #293

### release-v12-20191217 - release date: 2019-12-17 - status: available - changes: - Add `data-file-descriptions.md` with data release to better track file types, origins, and workflows per [#334](#334) and [#336](#336) - Add stranded RNA-Seq for 23 PNOC samples and 21 CBTTC samples previously sequenced using a polyA library prep. Files updated: - pbta-fusion-arriba.tsv.gz - pbta-fusion-starfusion.tsv.gz - pbta-gene-expression-rsem-tpm.stranded.rds - pbta-gene-expression-rsem-fpkm.stranded.rds - pbta-isoform-expression-rsem-tpm.stranded.rds - pbta-isoform-counts-rsem-expected_count.stranded.rds - pbta-gene-counts-rsem-expected_count.stranded.rds - pbta-gene-expression-kallisto.stranded.rds - pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds - Add recurrently-fused genes by histology and matrix of recurrently-fused genes by biospecimen from [fusion filtering and prioritization analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) - Update consensus TMB files and MAF [#333]](#333) - Add RNA-Seq [collapsed matrices](#287) - wrong files (tables of transcripts removed) were included with [V10](#273) - Rename `WGS.hg38.mutect2.unpadded.bed` to `WGS.hg38.mutect2.vardict.unpadded.bed` to better reflect usage - Update `pbta-histologies.tsv` to add new RNA-Seq samples listed above, [#222 harmonize disease separators](#222), and reran [medulloblastoma classifier](https://github.com/d3b-center/medullo-classifier-package) using V12 RSEM fpkm collapsed files - BS_2Z1MKS84, BS_5VQP0E6K re-classified from Group4 to WNT and BS_3BDAG9YN, BS_8T7DZV2F, and BS_JTMXAMB7 re-classified from Group3 to WNT - Add CNVkit GISTIC results focal CN analyses, eg: [#244](#244) and [#8](#8)

* Release V12 data ### release-v12-20191217 - release date: 2019-12-17 - status: available - changes: - Add `data-file-descriptions.md` with data release to better track file types, origins, and workflows per [#334](#334) and [#336](#336) - Add stranded RNA-Seq for 23 PNOC samples and 21 CBTTC samples previously sequenced using a polyA library prep. Files updated: - pbta-fusion-arriba.tsv.gz - pbta-fusion-starfusion.tsv.gz - pbta-gene-expression-rsem-tpm.stranded.rds - pbta-gene-expression-rsem-fpkm.stranded.rds - pbta-isoform-expression-rsem-tpm.stranded.rds - pbta-isoform-counts-rsem-expected_count.stranded.rds - pbta-gene-counts-rsem-expected_count.stranded.rds - pbta-gene-expression-kallisto.stranded.rds - pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds - Add recurrently-fused genes by histology and matrix of recurrently-fused genes by biospecimen from [fusion filtering and prioritization analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) - Update consensus TMB files and MAF [#333]](#333) - Add RNA-Seq [collapsed matrices](#287) - wrong files (tables of transcripts removed) were included with [V10](#273) - Rename `WGS.hg38.mutect2.unpadded.bed` to `WGS.hg38.mutect2.vardict.unpadded.bed` to better reflect usage - Update `pbta-histologies.tsv` to add new RNA-Seq samples listed above, [#222 harmonize disease separators](#222), and reran [medulloblastoma classifier](https://github.com/d3b-center/medullo-classifier-package) using V12 RSEM fpkm collapsed files - BS_2Z1MKS84, BS_5VQP0E6K re-classified from Group4 to WNT and BS_3BDAG9YN, BS_8T7DZV2F, and BS_JTMXAMB7 re-classified from Group3 to WNT - Add CNVkit GISTIC results focal CN analyses, eg: [#244](#244) and [#8](#8) * Update release-notes.md fix link * Update data-files-description.md fix GISTIC table sectioning * Update data-files-description.md fix spacing on data description table * Update data-files-description.md fix more spacing in data file description file * Update download-data.sh add new release date to download script * Update the TMB file descriptions * Update TMB file formats section * Update fusion section of data formats Also more specific description of the by sample file * Add GISTIC file to data-formats * Update download-data.sh * Update download-data.sh * data description md is also included in md5sum * TMB exon -> coding sequence * Coding TMB CDS, not exon

jharenza added the data label Nov 22, 2019

jharenza self-assigned this Nov 22, 2019

jharenza added the planned data release label Nov 22, 2019

yuankunzhu mentioned this issue Nov 26, 2019

V11 Release #293

Merged

This was referenced Nov 27, 2019

Fix: use unzip -o in download script #295

Merged

04 project specific filtering #294

Closed

Updated analysis: snv-callers documentation #296

Closed

Updates to generating CI subset files for v11 #297

Merged

jharenza closed this as completed Dec 2, 2019

jharenza mentioned this issue Dec 17, 2019

Release V12 data #347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planned data release: V11 #287

Planned data release: V11 #287

jharenza commented Nov 22, 2019 •

edited

Loading

jashapiro commented Nov 22, 2019

jaclyn-taroni commented Nov 23, 2019

jaclyn-taroni commented Nov 25, 2019

jharenza commented Nov 26, 2019

jaclyn-taroni commented Nov 26, 2019

jaclyn-taroni commented Nov 26, 2019

jharenza commented Nov 26, 2019

jaclyn-taroni commented Nov 26, 2019

jharenza commented Dec 2, 2019

Planned data release: V11 #287

Planned data release: V11 #287

Comments

jharenza commented Nov 22, 2019 • edited Loading

What data file(s) does this issue pertain to?

What release are you using?

Put a link to the relevant section of the OpenPBTA-manuscript here.

Put your question or report your issue here.

jashapiro commented Nov 22, 2019

jaclyn-taroni commented Nov 23, 2019

jaclyn-taroni commented Nov 25, 2019

jharenza commented Nov 26, 2019

jaclyn-taroni commented Nov 26, 2019

jaclyn-taroni commented Nov 26, 2019

jharenza commented Nov 26, 2019

jaclyn-taroni commented Nov 26, 2019

jharenza commented Dec 2, 2019

jharenza commented Nov 22, 2019 •

edited

Loading