Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Add shell script to generate analysis files that are included in data releases #1421

Conversation

jaclyn-taroni
Copy link
Member

@jaclyn-taroni jaclyn-taroni commented May 19, 2022

⚠️ Review #1419 and #1420 first ⚠️ this can not be stacked because these are from my branch.

Continuing to break up the changes described in #1399 (comment)

Here I'm adding scripts/generate-analysis-files-for-release.sh, a shell script for generating analysis files that are included in the data download. As a reminder, an analysis file is defined as:

Any file created by a script in analyses/*.

It is intended to be used as part of a series of steps during the data release process, which will be as follows:

  1. We start a release that has all of the PBTA data files (i.e., upstream files) included
  2. We run scripts/generate-analysis-files-for-release.sh, which should generate all the analysis files for a release and put them in scratch/analysis_files_for_release, and commit any changes to files that are included in the repository.
  3. We add all the analysis files to the release.
  4. We run scripts/run-for-subtyping.sh and commit any changes to files that are included in the repository. (PR coming soon!)
  5. We add pbta-histologies.tsv to the release

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +79 to +81
# Run hotspot detection
echo "Run hotspots detection"
bash ${analyses_dir}/hotspots-detection/run_overlaps_hotspots.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that this module (and output file pbta-snv-scavenged-hotspots.maf.tsv.gz) appears to be missing from here but it should be there, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they should, but maybe that should get fixed on #1424 and not here?

Copy link
Member

@sjspielman sjspielman May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open an issue. Edit: #1428

# Set the working directory to the directory of this file
cd "$(dirname "${BASH_SOURCE[0]}")"

# If RUN_LOCAL is used, the time-intensive steps are skipped because they cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and memory-intensive!

Comment on lines 39 to 42
# Compile all the files that need to be included in the release in one place
# in the scratch directory
compiled_dir=${scratch_dir}/analysis_files_for_release
mkdir -p ${compiled_dir}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move these lines up before you start running modules, right after other directories are defined in lines 19-21


# Run GISTIC step -- only the part that generates ZIP file
echo "Run GISTIC"
# Run a step that subs ploidy for NA to allow GISTIC to run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about the whether the CI if lines might be needed here.

# If this is CI, run the example included with GISTIC
# The sample size for the subset files are too small otherwise
IS_CI=${OPENPBTA_CI:-0}
if [[ "$IS_CI" -gt "0" ]]
then
# Environmental variables for MCR
ORIG_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mcr/v83/runtime/glnxa64:/opt/mcr/v83/bin/glnxa64:/opt/mcr/v83/sys/os/glnxa64
export XAPPLRESDIR=/opt/mcr/v83/X11/app-defaults
# We want this to fail if the GISTIC example fails only -- because we have
# some instances of running GISTIC that do not complete but do save some
# output
set -e
set -o pipefail
# Run the example that comes with GISTIC - that allows us to
cd /home/rstudio/gistic_install && ./run_gistic_example
# 'Undo' environmental variables for MCR
export LD_LIBRARY_PATH=$ORIG_LD_LIBRARY_PATH
unset XAPPLRESDIR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. I don't necessarily expect this script (scripts/generate-analysis-files-for-release.sh) to get run in CI.

We don't use analyses/run-gistic/run-gistic-module.sh at all here.

Copy link
Member

@sjspielman sjspielman May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily expect this script (scripts/generate-analysis-files-for-release.sh) to get run in CI.

Just looking for a confirmation of this, and whether it might ever change! 👍

@sjspielman sjspielman self-requested a review May 20, 2022 18:58
Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining small feedback items I left are quick and at your discretion, so I don't need to see again 🚀

@jaclyn-taroni jaclyn-taroni merged commit dd899a1 into AlexsLemonade:master May 24, 2022
@jaclyn-taroni jaclyn-taroni mentioned this pull request May 24, 2022
5 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants