Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Update focal-cn-preparation to use consensus SEG file in data download #1130

Conversation

jaclyn-taroni
Copy link
Member

@jaclyn-taroni jaclyn-taroni commented Aug 9, 2021

The focal-cn-preparation module was taking over an hour to run in CI. This is because it was using the consensus SEG file that is committed to the repository, which is not for a subset of samples but instead contains all samples. With the release of v20, there's no reason to continue to use the file in analyses/copy_number_consensus_call/results so here I am changing the path in the second notebook of focal-cn-preparation to use the file in the data download. In the context of CI, this will be the subset file.

Closes #1129.

# TODO: the consensus SEG file is not currently in the data download -- when it
# gets included we will have to change the file path here
consensus_seg_file <- file.path("..", "copy_number_consensus_call", "results",
consensus_seg_file <- file.path("..", "..", "data",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jharenza and @kgaonkar6 - is there any reason (that you are aware of) that prohibits me from making this change at this point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so - the latest file is in the download.

Copy link
Member Author

@jaclyn-taroni jaclyn-taroni Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to wait until all the v20 CNV changes go through though? Or would just getting #1123 through -> updating the release (if not done yet) be sufficient?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated v20 release data with the latest consensus seg with #1123 b9284650be04df3538e6c6dba29b8eb0 pbta-cnv-consensus.seg.gz

I added the relative path so that we can run the code while running all the preprocessing steps of subtyping, so maybe we can add a logic to change to relative path or not within the if (params$base_run ==0)

```{r}
# TODO: the consensus SEG file is not currently in the data download -- when it
# gets included we will have to change the file path here
consensus_seg_file <- file.path("..", "copy_number_consensus_call", "results",
"pbta-cnv-consensus.seg.gz")
if ( params$base_run ==0 ){
histologies_file <- file.path("..", "..", "data", "pbta-histologies.tsv")
} else {
histologies_file <- file.path("..", "..", "data", "pbta-histologies-base.tsv")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenario would you use the file in analyses/copy_number_consensus_call/results? When you use pbta-histologies-base.tsv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we will rerun the consensus seg file module first so the latest consensus seg file can be used as input for the focal-cn module

@jaclyn-taroni jaclyn-taroni changed the title DRAFT Jaclyn taroni/fix focal long running Update focal-cn-preparation to use consensus SEG file in data download Aug 10, 2021
@jaclyn-taroni jaclyn-taroni marked this pull request as ready for review August 10, 2021 23:37
@jaclyn-taroni
Copy link
Member Author

I'm not going to update the HTML for this notebook, because I expect that will happen as part of getting #1124 through anyway.

@jaclyn-taroni
Copy link
Member Author

It is curious, though, that the consensus CN steps happens right before this one and I would expect then that the file in results for that module would then be from the subset of samples. There might be something snakemake-y happening that I don't understand.

@jaclyn-taroni
Copy link
Member Author

Closing in favor of #1124

@jaclyn-taroni jaclyn-taroni deleted the jaclyn-taroni/fix-focal-long-running branch August 11, 2021 14:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Focal CN preparation takes over an hour to run in CI
3 participants