-
Notifications
You must be signed in to change notification settings - Fork 67
Planned data release: V11 #287
Comments
As long as there is an update planned, I will be able to provide updated consensus SNV files. (With MNVs included) |
Related: #275 |
New consensus files from @jashapiro: https://open-pbta.s3.us-east-1.amazonaws.com/data/snv-consensus/snv-consensus-20191125.zip |
Re: putative oncogenic fusion (above), I am thinking we release the final prioritized list for the oncoprints, which would also become a supplemental table (I guess we add to the manuscript doc later?). cc: @jaclyn-taroni |
To clarify, you are referring to this file: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/49acc98f5ffd86853fc70f220623311e13e3ca9f/analyses/fusion_filtering/results/PutativeOncogenicFusion.tsv correct @jharenza? What are you going to name this file? I lean towards something like Regarding the new consensus mutation files, I think since we will probably put the The file structure of
Could we rename the contents to: And then include these in the "top directory" of the release folder? I propose we move the information in the README file in the folder to probably 1) the release notes for this release and 2) the main README of the repository. We may also want to include a Markdown file with the headers for these consensus files in It may behoove us to create a section under Data Formats that talks specifically about these "derivative" data files (e.g., prioritized fusion list, consensus mutation files, collapsed expression matrices and maybe even the independent specimens files) and how we expect folks to use them. I want to note that I'd prefer we really consider the documentation changes and wait until after the holiday to release v11 over getting this out today without documentation changes. We're also going to have to make a bunch of changes to how we generate the CI files before things consuming these files can get through reviewed and merged. |
Moving the discussion over from #248 -- can we include the collapsed/summarized matrices and call them This is what I would expect based on reading the comments in code that generated them and we will ideally link the code in new section under I think we can remove the following files that were included in v10 from the download: Sounds like @komalsrathi will be filing a pull request with some additional analyses today (#248 (comment)). We can link to the notebook/script that gets added in |
@jaclyn-taroni yes to the renaming and the putative oncogenic fusion file. @yuankunzhu is on it! |
Great, thank you. If that pull request allows edits from maintainers, we can have someone on our side make the |
closed with #293 |
### release-v12-20191217 - release date: 2019-12-17 - status: available - changes: - Add `data-file-descriptions.md` with data release to better track file types, origins, and workflows per [#334](#334) and [#336](#336) - Add stranded RNA-Seq for 23 PNOC samples and 21 CBTTC samples previously sequenced using a polyA library prep. Files updated: - pbta-fusion-arriba.tsv.gz - pbta-fusion-starfusion.tsv.gz - pbta-gene-expression-rsem-tpm.stranded.rds - pbta-gene-expression-rsem-fpkm.stranded.rds - pbta-isoform-expression-rsem-tpm.stranded.rds - pbta-isoform-counts-rsem-expected_count.stranded.rds - pbta-gene-counts-rsem-expected_count.stranded.rds - pbta-gene-expression-kallisto.stranded.rds - pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds - Add recurrently-fused genes by histology and matrix of recurrently-fused genes by biospecimen from [fusion filtering and prioritization analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) - Update consensus TMB files and MAF [#333]](#333) - Add RNA-Seq [collapsed matrices](#287) - wrong files (tables of transcripts removed) were included with [V10](#273) - Rename `WGS.hg38.mutect2.unpadded.bed` to `WGS.hg38.mutect2.vardict.unpadded.bed` to better reflect usage - Update `pbta-histologies.tsv` to add new RNA-Seq samples listed above, [#222 harmonize disease separators](#222), and reran [medulloblastoma classifier](https://github.com/d3b-center/medullo-classifier-package) using V12 RSEM fpkm collapsed files - BS_2Z1MKS84, BS_5VQP0E6K re-classified from Group4 to WNT and BS_3BDAG9YN, BS_8T7DZV2F, and BS_JTMXAMB7 re-classified from Group3 to WNT - Add CNVkit GISTIC results focal CN analyses, eg: [#244](#244) and [#8](#8)
* Release V12 data ### release-v12-20191217 - release date: 2019-12-17 - status: available - changes: - Add `data-file-descriptions.md` with data release to better track file types, origins, and workflows per [#334](#334) and [#336](#336) - Add stranded RNA-Seq for 23 PNOC samples and 21 CBTTC samples previously sequenced using a polyA library prep. Files updated: - pbta-fusion-arriba.tsv.gz - pbta-fusion-starfusion.tsv.gz - pbta-gene-expression-rsem-tpm.stranded.rds - pbta-gene-expression-rsem-fpkm.stranded.rds - pbta-isoform-expression-rsem-tpm.stranded.rds - pbta-isoform-counts-rsem-expected_count.stranded.rds - pbta-gene-counts-rsem-expected_count.stranded.rds - pbta-gene-expression-kallisto.stranded.rds - pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds - Add recurrently-fused genes by histology and matrix of recurrently-fused genes by biospecimen from [fusion filtering and prioritization analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) - Update consensus TMB files and MAF [#333]](#333) - Add RNA-Seq [collapsed matrices](#287) - wrong files (tables of transcripts removed) were included with [V10](#273) - Rename `WGS.hg38.mutect2.unpadded.bed` to `WGS.hg38.mutect2.vardict.unpadded.bed` to better reflect usage - Update `pbta-histologies.tsv` to add new RNA-Seq samples listed above, [#222 harmonize disease separators](#222), and reran [medulloblastoma classifier](https://github.com/d3b-center/medullo-classifier-package) using V12 RSEM fpkm collapsed files - BS_2Z1MKS84, BS_5VQP0E6K re-classified from Group4 to WNT and BS_3BDAG9YN, BS_8T7DZV2F, and BS_JTMXAMB7 re-classified from Group3 to WNT - Add CNVkit GISTIC results focal CN analyses, eg: [#244](#244) and [#8](#8) * Update release-notes.md fix link * Update data-files-description.md fix GISTIC table sectioning * Update data-files-description.md fix spacing on data description table * Update data-files-description.md fix more spacing in data file description file * Update download-data.sh add new release date to download script * Update the TMB file descriptions * Update TMB file formats section * Update fusion section of data formats Also more specific description of the by sample file * Add GISTIC file to data-formats * Update download-data.sh * Update download-data.sh * data description md is also included in md5sum * TMB exon -> coding sequence * Coding TMB CDS, not exon
What data file(s) does this issue pertain to?
V10 collapsed files were tables of genes removed, not the collapsed matrices, #248
What release are you using?
V10
Put a link to the relevant section of the OpenPBTA-manuscript here.
NA
Put your question or report your issue here.
Planned data to release:
The text was updated successfully, but these errors were encountered: