Part2 Freec as default: Cnv focal files update #2

kgaonkar6 · 2021-04-07T19:33:06Z

Purpose/implementation Section

What scientific question is your analysis addressing?

We updated the consensus seg file creation in AlexsLemonade#987 to use freec CN calls as default. In this PR we will re-run focal-cn-file-preparation to identify changes in focal calls because of the update.

What was your approach?

Just run run-prepare-cn.sh

What GitHub issue does your pull request address?

related to AlexsLemonade#964

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

🚨 I couldn't complete the plotting in the module.

I haven't run this module before, so I started with having 12GB memory in docker which killed 04-prepare-cn-file.R. I increased that to 15G and re-run the 04,05,06 but it seems rna-expression-validation.R would need more than 15G.

Is this something that was seen before ?

Is there anything that you want to discuss further?

🚨 plots are not complete for consensus calls because of the mem issue mentioned above.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

tables in results :Yes
figures in plots :NO

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Comparing consensus_seg_annotated_cn_autosomes.tsv.gz generated in this branch with the v18 version the change logs are in:
change_log.txt
column named_latest are from the file in the branch and _previous is v18 release file

Here's a rough plot for status change (majority of the changes seem to be from loss to gain) reported in the change_log.txt above

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jaclyn-taroni · 2021-04-08T14:17:01Z

I increased that to 15G and re-run the 04,05,06 but it seems rna-expression-validation.R would need more than 15G

I can't remember with absolute certainty, but I think 32GB is enough to do it.

…penPBTA-analysis into focal-cn-update

kgaonkar6 · 2021-04-20T15:45:26Z

Sorry for the delay,@jharenza @jaclyn-taroni I am having memory issues with this module 04-prepare-cn-file.R kills the docker. I tried to run the script line by line and it seems R/Docker gets killed/crashes at

OpenPBTA-analysis/analyses/focal-cn-file-preparation/04-prepare-cn-file.R

Line 55 in ecf0553

overlaps <- IRanges::mergeByOverlaps(cnv_gr, tx_exons)

.

I have tested with 15G in my local docker container and it gets killed so tried with 50G ec2 instance (t2.small with 12 CPUs) running docker as docker run --name cnv-consensus -d --memory=30g --memory-swap=-1 -e PASSWORD=pass -p 8787:8787 -v "$PWD":/home/rstudio/OpenPBTA ccdlopenpbta/open-pbta:latest in the volume folder which has 50G mem.

Weirdly I am getting this error on the ec2 instance while running

OpenPBTA-analysis/analyses/focal-cn-file-preparation/run-prepare-cn.sh

Line 37 in ecf0553

snakemake -j 10 --snakefile run-bedtools.snakemake

but not while running locally:

1 of 2239 steps (0.04%) done
/bin/bash: line 1:   473 Killed                  bedtools coverage -a ../../scratch/ucsc_cytoband.bed -b ../../scratch/cytoband_status/segments/consensus_callable.BS_KAD49R68.bed -sorted
       474 Done                    | sed 's/$/	BS_KAD49R68/' > ../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
[Mon Apr 19 21:10:42 2021]
Error in rule bed_coverage:
    jobid: 617
    output: ../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
    shell:
        bedtools coverage  -a ../../scratch/ucsc_cytoband.bed -b ../../scratch/cytoband_status/segments/consensus_callable.BS_KAD49R68.bed -sorted | sed 's/$/	BS_KAD49R68/'  > ../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job bed_coverage since they might be corrupted:
../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
/bin/bash: line 1:   477 Killed                  bedtools coverage -a ../../scratch/ucsc_cytoband.bed -b ../../scratch/cytoband_status/segments/consensus_callable.BS_XMP9XNR9.bed -sorted
       478 Done                    | sed 's/$/	BS_XMP9XNR9/' > ../../scratch/cytoband_status/coverage/consensus_callable.BS_XMP9XNR9.coverage.bed

This is the mem distirbution within docker container:

root@9feaed131e4e:/home/rstudio/OpenPBTA/analyses/focal-cn-file-preparation# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop1     7:1    0 96.6M  1 loop 
loop2     7:2    0 99.2M  1 loop 
loop3     7:3    0 55.5M  1 loop 
loop4     7:4    0 33.3M  1 loop 
loop5     7:5    0   25M  1 loop 
xvda    202:0    0   50G  0 disk 
└─xvda1 202:1    0   50G  0 part /home/rstudio/kitematic
xvdb    202:16   0   50G  0 disk /home/rstudio/OpenPBTA

I'm trying again with more memory will report back, more than happy to hear any suggestions.

jaclyn-taroni · 2021-04-20T22:06:47Z

I unfortunately do not have any suggestions. I am going to tag @jashapiro who may have more insight.

jashapiro · 2021-04-20T22:36:59Z

A t2.small instance only has 2GB RAM and 1 CPU (ignore the CPU credits per hour), so that will not be enough. You will probably need a .2xlarge instance (I tend to stick with m5a. instances for things like this; they are a touch cheaper) with 32GB RAM to run the module. Not sure on the disk space needs, but 50GB should be enough.

kgaonkar6 · 2021-04-21T15:52:58Z

Thank you @jashapiro! That fixed my issue with the ec2 errors, more some reason I was only looking at the cpu credits per hour 😅

kgaonkar6 · 2021-04-21T17:07:04Z

hmm spoke too soon, still gets killed at 04 with :

run-prepare-cn.sh: line 48: 7087 Killed Rscript --vanilla 04-prepare-cn-file.R --cnv_file ${scratch_dir}/consensus_seg_with_status.tsv --gtf_file $gtf_file --metadata $histologies_file --filename_lead "consensus_seg_annotated_cn" --seg

on ec2 m5.2xlarge, 50 EBS

jaclyn-taroni · 2021-04-21T17:09:51Z

I'd probably try going a bit bigger (e.g., m5a.4xlarge) as a next step.

kgaonkar6 · 2021-04-22T16:33:13Z

Updating to m5a.4 helped, also I realized that I was using neutral calls copy_number as 3 which was discussed before and that was creating a larger file to annotate and output. But it is now updated to copy_number == NA as seen in 6096f54#diff-cb6603f5ac5cd549a5ff342ce8d9b950a72351ca92dbb16c3ccff1b58e3637cd

Thank you for helping me figure this out! I've documented additional discussion about the output in AlexsLemonade#1010

Part4 Freec as default: Chromosome wide CNV plots per histology

Part3 Freec as default: Oncoprint update

kgaonkar6 added 3 commits April 6, 2021 17:49

re-run with freec update

8920cfb

re-run after mem fix

c5476ac

updated with freec as deafult

88f4a3b

kgaonkar6 changed the title ~~Focal cn update~~ Part2 Freec as default: Cnv focal files update Apr 7, 2021

kgaonkar6 mentioned this pull request Apr 7, 2021

Part1 Freec as default : Cnv consensus update AlexsLemonade/OpenPBTA-analysis#987

Closed

5 tasks

kgaonkar6 added 2 commits April 7, 2021 15:48

chrom plot update with freec default

75dbabc

chrom plot per histology

1fded09

This was referenced Apr 7, 2021

Part3 Freec as default: Oncoprint update #3

Merged

Part4 Freec as default: Chromosome wide CNV plots per histology #4

Merged

kgaonkar6 added 2 commits April 19, 2021 15:09

Merge branch 'cnv-consensus-update' of https://github.com/kgaonkar6/O…

e9e693e

…penPBTA-analysis into focal-cn-update

rerun without 02,03

ecf0553

Ubuntu and others added 7 commits April 21, 2021 22:59

Merge branch 'cnv-consensus-update' into focal-cn-update

f14edda

neutral as NA focal calls

d41965e

Merge branch 'focal-cn-update' into oncoprint

8a267b9

update oncoprint with neutral NA

c20e905

Merge branch 'oncoprint' into chrom_plot

f2eb8e3

update neutral as NA

8e81765

remove change log

eb69de5

kgaonkar6 added 2 commits May 7, 2021 11:18

Merge pull request #4 from kgaonkar6/chrom_plot

86ebece

Part4 Freec as default: Chromosome wide CNV plots per histology

Merge pull request #3 from kgaonkar6/oncoprint

d28c648

Part3 Freec as default: Oncoprint update

kgaonkar6 merged commit 0f1d2a8 into cnv-consensus-update May 7, 2021

kgaonkar6 deleted the focal-cn-update branch May 13, 2021 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part2 Freec as default: Cnv focal files update #2

Part2 Freec as default: Cnv focal files update #2

kgaonkar6 commented Apr 7, 2021 •

edited

Loading

jaclyn-taroni commented Apr 8, 2021

kgaonkar6 commented Apr 20, 2021 •

edited

Loading

jaclyn-taroni commented Apr 20, 2021

jashapiro commented Apr 20, 2021

kgaonkar6 commented Apr 21, 2021

kgaonkar6 commented Apr 21, 2021

jaclyn-taroni commented Apr 21, 2021 •

edited

Loading

kgaonkar6 commented Apr 22, 2021 •

edited

Loading

Part2 Freec as default: Cnv focal files update #2

Part2 Freec as default: Cnv focal files update #2

Conversation

kgaonkar6 commented Apr 7, 2021 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented Apr 8, 2021

kgaonkar6 commented Apr 20, 2021 • edited Loading

jaclyn-taroni commented Apr 20, 2021

jashapiro commented Apr 20, 2021

kgaonkar6 commented Apr 21, 2021

kgaonkar6 commented Apr 21, 2021

jaclyn-taroni commented Apr 21, 2021 • edited Loading

kgaonkar6 commented Apr 22, 2021 • edited Loading

kgaonkar6 commented Apr 7, 2021 •

edited

Loading

kgaonkar6 commented Apr 20, 2021 •

edited

Loading

jaclyn-taroni commented Apr 21, 2021 •

edited

Loading

kgaonkar6 commented Apr 22, 2021 •

edited

Loading