Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

v19-release #1026

Merged
merged 6 commits into from
May 6, 2021
Merged

v19-release #1026

merged 6 commits into from
May 6, 2021

Conversation

jharenza
Copy link
Collaborator

@jharenza jharenza commented Apr 23, 2021

Changes with V19 release:

  • Remove per BS_JXF8A2A6 due to mis-identification per #862
    • Updates:
      • pbta-fusion-starfusion.tsv.gz
      • pbta-fusion-arriba.tsv.gz
      • pbta-gene-expression-rsem-fpkm.stranded.rds
      • pbta-gene-counts-rsem-expected_count.stranded.rds
      • pbta-gene-expression-rsem-tpm.stranded.rds
      • pbta-gene-expression-kallisto.stranded.rds
      • pbta-isoform-counts-rsem-expected_count.stranded.rds
      • pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
      • pbta-isoform-expression-rsem-tpm.stranded.rds
      • independent-specimens.rnaseq.primary-plus-stranded.tsv
      • pbta-star-log-final.tar.gz
      • pbta-star-log-manifest.tsv
      • pbta-mend-qc-manifest.tsv
      • pbta-mend-qc-results.tar.gz
      • pbta-fusion-putative-oncogenic.tsv
      • pbta-fusion-recurrently-fused-genes-byhistology.tsv
      • pbta-fusion-recurrently-fused-genes-bysample.tsv
      • independent-specimens.rnaseq.primary-plus-stranded.tsv
      • fusion_summary_ewings_foi.tsv
      • fusion_summary_lgat_foi.tsv
      • fusion_summary_ependymoma_foi.tsv
      • fusion_summary_embryonal_foi.tsv
  • Add INDEL renormalization of MAF files per #1024
    • Updates:
      • pbta-snv-strelka2.vep.maf.gz
      • pbta-snv-lancet.vep.maf.gz
      • pbta-snv-vardict.vep.maf.gz
      • pbta-snv-mutect2.vep.maf.gz
      • pbta-tcga-snv-strelka2.vep.maf.gz
      • pbta-tcga-snv-mutect2.vep.maf.gz
      • pbta-tcga-snv-lancet.vep.maf.gz
  • Update pbta-histologies.tsv:
    • Add PFS_days to pbta-histologies.tsv file per #963
    • Pull latest clinical data
    • Rerun molecular subtyping modules to date
    • Update PNOC003 cohort to "PNOC" for inclusion of later PNOC trial samples
    • Update CNS_region from manual review per #1025

Data Release Checklist

  • Is the table in doc/data-file-descriptions.md up to date?
  • Is doc/data-format.md up to date?
  • Is doc/release-notes.md up to date?
  • Is download-data.sh up to date?
  • Was download-data.sh tested and did it complete without error?
    -->

- release-notes.md
- data-files-description.md
@jharenza jharenza added the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Apr 26, 2021
@jashapiro
Copy link
Member

In the latest version of pbta-histologies.tsv, the experimental_strategy column has recoded Panel as Targeted Sequencing. Was this an intentional change? I don't think it was in the previous files I downloaded, because the consensus script ran fine, but it broke when I redownloaded files to rerun the caller comparison analysis for #1033 with the latest VarDict files.

I don't know if this will affect other scripts, but it may. Not a hard fix if we know what to look for.

@jharenza
Copy link
Collaborator Author

In the latest version of pbta-histologies.tsv, the experimental_strategy column has recoded Panel as Targeted Sequencing. Was this an intentional change? I don't think it was in the previous files I downloaded, because the consensus script ran fine, but it broke when I redownloaded files to rerun the caller comparison analysis for #1033 with the latest VarDict files.

I don't know if this will affect other scripts, but it may. Not a hard fix if we know what to look for.

This change may have happened with an upstream change adding that information into Kids First, as that is how the panel data is coded, so I would say we keep that and update the code using it on our end.

add mb path subtype file
@jharenza
Copy link
Collaborator Author

jharenza commented Apr 30, 2021

fyi @jashapiro and @kgaonkar6 - just a heads up - I had forgotten to add the medullo pathology subtypes file to this release per #746 , but added it to s3 just now, meaning you will have to delete the old release-notes.md when you download data again.

@jaclyn-taroni
Copy link
Member

@jharenza I think this is ready to go up next. I see this is marked as a work in progress, so I wanted to give you the opportunity to go over it one last time before it is reviewed & merged.

@jharenza
Copy link
Collaborator Author

jharenza commented May 5, 2021

Thanks, it's ready - was marked while @kgaonkar6 was generating the histology file, but all good!

@jaclyn-taroni jaclyn-taroni added merge next and removed don't merge work in progress Used to label (non-draft) pull requests that are not yet ready for review labels May 5, 2021
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like everything works. A couple of comments though from comparing the md5sum.txt files:

  • minor, but the new file is not alphabetical by filename; this made comparing a bit harder than it could have been.

  • BS_JXF8A2A6 does not seem to have been removed from the following files (possibly not exhaustive):

    • pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
    • pbta-mend-qc-manifest.tsv(presumably alsopbta-mend-qc-results.tar.gz)
    • pbta-star-log-manifest.tsv (presumably pbta-star-log-final.tar.gz) This says it is updated in the readme, but is unchanged.

The first may require a rerun of the collapsing script, but the others are presumably just deletion of some files within the packages.

@kgaonkar6
Copy link
Collaborator

This looks like everything works. A couple of comments though from comparing the md5sum.txt files:

  • minor, but the new file is not alphabetical by filename; this made comparing a bit harder than it could have been.
  • BS_JXF8A2A6 does not seem to have been removed from the following files (possibly not exhaustive):

I also have this in order I believe, with an addition of pbta-mb-pathology-subtypes.tsv in v19

  • pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
  • pbta-mend-qc-manifest.tsv(presumably alsopbta-mend-qc-results.tar.gz)
  • pbta-star-log-manifest.tsv (presumably pbta-star-log-final.tar.gz) This says it is updated in the readme, but is unchanged.

The first may require a rerun of the collapsing script, but the others are presumably just deletion of some files within the packages.

Thanks for catching this, I the files are now updated by removing BS_JXF8A2A6.

@jashapiro
Copy link
Member

jashapiro commented May 5, 2021

I just tried the download again, and got the following md5 check error:
pbta-star-log-manifest.tsv: FAILED

@kgaonkar6
Copy link
Collaborator

Sorry for the delay, I've updated that file now.

But I was also testing the mend tar file and it seems to error out in analyses/comparative-RNASeq-analysis/run-comparative-RNAseq.sh. I will ping back when I have the files correctly generated. Also I just realized I would actually have to re-run CI as well to update the mend-qc and star-log files in testing.

@jharenza
Copy link
Collaborator Author

jharenza commented May 5, 2021

But I was also testing the mend tar file and it seems to error out in analyses/comparative-RNASeq-analysis/run-comparative-RNAseq.sh. I will ping back when I have the files correctly generated.

Just re-capturing this issue #890, which we saw in v18 as well.

@kgaonkar6
Copy link
Collaborator

kgaonkar6 commented May 5, 2021

Both pbta-mend-qc-results.tar.gz and pbta-star-log-final.tar.gz are now tar gzipped in unix environment and updated on s3 and md5sum.txt.

Also adding the correct way to create pbta-mend-qc-results.tar.gz and pbta-star-log-final.tar.gz without any leading folder structure (thanks for the help @migbro!)

mkdir pbta-mend-qc-results
cd pbta-mend-qc-results
# untar files to this folder
tar -xvf ../pbta-mend-qc-results.tar.gz 
# tar from *within the folder* to have the correctly formatted tar file without any leading folder structure
tar -czf ../pbta-mend-qc-results.tar.gz *

@kgaonkar6
Copy link
Collaborator

The CI files are now updated in s3://kf-openaccess-us-east-1-prd-pbta/data/testing_v19.zip as well.

@jashapiro
Copy link
Member

The CI files are now updated in s3://kf-openaccess-us-east-1-prd-pbta/data/testing_v19.zip as well.

Some of the files seemed to be missing from that download (fusion_summary files), but I added them, generated the md5sum file and uploaded to S3. Rerunning tests now.

@jharenza
Copy link
Collaborator Author

jharenza commented May 6, 2021

hi @jashapiro - looks like this passed - does everything else look ok?

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants