Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part2 Freec as default: Cnv focal files update #2

Merged
merged 16 commits into from
May 7, 2021

Conversation

kgaonkar6
Copy link
Owner

@kgaonkar6 kgaonkar6 commented Apr 7, 2021

Purpose/implementation Section

What scientific question is your analysis addressing?

We updated the consensus seg file creation in AlexsLemonade#987 to use freec CN calls as default. In this PR we will re-run focal-cn-file-preparation to identify changes in focal calls because of the update.

What was your approach?

Just run run-prepare-cn.sh

What GitHub issue does your pull request address?

related to AlexsLemonade#964

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

🚨 I couldn't complete the plotting in the module.

I haven't run this module before, so I started with having 12GB memory in docker which killed 04-prepare-cn-file.R. I increased that to 15G and re-run the 04,05,06 but it seems rna-expression-validation.R would need more than 15G.

Is this something that was seen before ?

Is there anything that you want to discuss further?

🚨 plots are not complete for consensus calls because of the mem issue mentioned above.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

tables in results :Yes
figures in plots :NO

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Comparing consensus_seg_annotated_cn_autosomes.tsv.gz generated in this branch with the v18 version the change logs are in:
change_log.txt
column named_latest are from the file in the branch and _previous is v18 release file

Here's a rough plot for status change (majority of the changes seem to be from loss to gain) reported in the change_log.txt above
Screen Shot 2021-04-22 at 10 29 20 AM

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6 kgaonkar6 changed the title Focal cn update Part2 Freec as default: Cnv focal files update Apr 7, 2021
@jaclyn-taroni
Copy link

I increased that to 15G and re-run the 04,05,06 but it seems rna-expression-validation.R would need more than 15G

I can't remember with absolute certainty, but I think 32GB is enough to do it.

@kgaonkar6
Copy link
Owner Author

kgaonkar6 commented Apr 20, 2021

Sorry for the delay,@jharenza @jaclyn-taroni I am having memory issues with this module 04-prepare-cn-file.R kills the docker. I tried to run the script line by line and it seems R/Docker gets killed/crashes at

overlaps <- IRanges::mergeByOverlaps(cnv_gr, tx_exons)
.

I have tested with 15G in my local docker container and it gets killed so tried with 50G ec2 instance (t2.small with 12 CPUs) running docker as docker run --name cnv-consensus -d --memory=30g --memory-swap=-1 -e PASSWORD=pass -p 8787:8787 -v "$PWD":/home/rstudio/OpenPBTA ccdlopenpbta/open-pbta:latest in the volume folder which has 50G mem.

Weirdly I am getting this error on the ec2 instance while running

snakemake -j 10 --snakefile run-bedtools.snakemake

but not while running locally:

1 of 2239 steps (0.04%) done
/bin/bash: line 1:   473 Killed                  bedtools coverage -a ../../scratch/ucsc_cytoband.bed -b ../../scratch/cytoband_status/segments/consensus_callable.BS_KAD49R68.bed -sorted
       474 Done                    | sed 's/$/	BS_KAD49R68/' > ../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
[Mon Apr 19 21:10:42 2021]
Error in rule bed_coverage:
    jobid: 617
    output: ../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
    shell:
        bedtools coverage  -a ../../scratch/ucsc_cytoband.bed -b ../../scratch/cytoband_status/segments/consensus_callable.BS_KAD49R68.bed -sorted | sed 's/$/	BS_KAD49R68/'  > ../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job bed_coverage since they might be corrupted:
../../scratch/cytoband_status/coverage/consensus_callable.BS_KAD49R68.coverage.bed
/bin/bash: line 1:   477 Killed                  bedtools coverage -a ../../scratch/ucsc_cytoband.bed -b ../../scratch/cytoband_status/segments/consensus_callable.BS_XMP9XNR9.bed -sorted
       478 Done                    | sed 's/$/	BS_XMP9XNR9/' > ../../scratch/cytoband_status/coverage/consensus_callable.BS_XMP9XNR9.coverage.bed

This is the mem distirbution within docker container:

root@9feaed131e4e:/home/rstudio/OpenPBTA/analyses/focal-cn-file-preparation# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop1     7:1    0 96.6M  1 loop 
loop2     7:2    0 99.2M  1 loop 
loop3     7:3    0 55.5M  1 loop 
loop4     7:4    0 33.3M  1 loop 
loop5     7:5    0   25M  1 loop 
xvda    202:0    0   50G  0 disk 
└─xvda1 202:1    0   50G  0 part /home/rstudio/kitematic
xvdb    202:16   0   50G  0 disk /home/rstudio/OpenPBTA

I'm trying again with more memory will report back, more than happy to hear any suggestions.

@jaclyn-taroni
Copy link

I unfortunately do not have any suggestions. I am going to tag @jashapiro who may have more insight.

@jashapiro
Copy link

A t2.small instance only has 2GB RAM and 1 CPU (ignore the CPU credits per hour), so that will not be enough. You will probably need a .2xlarge instance (I tend to stick with m5a. instances for things like this; they are a touch cheaper) with 32GB RAM to run the module. Not sure on the disk space needs, but 50GB should be enough.

@kgaonkar6
Copy link
Owner Author

Thank you @jashapiro! That fixed my issue with the ec2 errors, more some reason I was only looking at the cpu credits per hour 😅

@kgaonkar6
Copy link
Owner Author

hmm spoke too soon, still gets killed at 04 with :

run-prepare-cn.sh: line 48: 7087 Killed Rscript --vanilla 04-prepare-cn-file.R --cnv_file ${scratch_dir}/consensus_seg_with_status.tsv --gtf_file $gtf_file --metadata $histologies_file --filename_lead "consensus_seg_annotated_cn" --seg

on ec2 m5.2xlarge, 50 EBS

@jaclyn-taroni
Copy link

jaclyn-taroni commented Apr 21, 2021

I'd probably try going a bit bigger (e.g., m5a.4xlarge) as a next step.

@kgaonkar6
Copy link
Owner Author

kgaonkar6 commented Apr 22, 2021

Updating to m5a.4 helped, also I realized that I was using neutral calls copy_number as 3 which was discussed before and that was creating a larger file to annotate and output. But it is now updated to copy_number == NA as seen in 6096f54#diff-cb6603f5ac5cd549a5ff342ce8d9b950a72351ca92dbb16c3ccff1b58e3637cd

Thank you for helping me figure this out! I've documented additional discussion about the output in AlexsLemonade#1010

Part4 Freec as default: Chromosome wide CNV plots per histology
Part3 Freec as default: Oncoprint update
@kgaonkar6 kgaonkar6 merged commit 0f1d2a8 into cnv-consensus-update May 7, 2021
@kgaonkar6 kgaonkar6 deleted the focal-cn-update branch May 13, 2021 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants