Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent output directory requirements in CNV tools. #4825

Closed
sooheelee opened this issue May 29, 2018 · 15 comments
Closed

Ensure consistent output directory requirements in CNV tools. #4825

sooheelee opened this issue May 29, 2018 · 15 comments

Comments

@sooheelee
Copy link
Contributor

The exception is:

Traceback (most recent call last):
  File "/home/shlee/anaconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 98, in assert_output_path_writable
    filehandle = open(filename, 'w')
PermissionError: [Errno 13] Permission denied: '/home/shlee/gcc/hc24_soohee1k_chr1-model/write_tester'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/shlee/cohort_denoising_calling.7832183760446168530.py", line 151, in <module>
    args.output_model_path)()
  File "/home/shlee/anaconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_denoising_calling.py", line 28, in __init__
    io_commons.assert_output_path_writable(output_path)
  File "/home/shlee/anaconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 102, in assert_output_path_writable
    raise IOError("The output path \"{0}\" is not writeable".format(output_path))
OSError: The output path "/home/shlee/gcc/hc24_soohee1k_chr1-model" is not writeable
16:26:00.659 DEBUG ScriptExecutor - Result: 1
16:26:00.662 INFO  GermlineCNVCaller - Shutting down engine
[May 27, 2018 4:26:00 PM UTC] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 2,255.34 minutes.
Runtime.totalMemory()=8207728640
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: 
python exited with 1
Command Line: python /tmp/shlee/cohort_denoising_calling.7832183760446168530.py --ploidy_calls_path=/home/shlee/gcnv/coverage_1k/hc24_soohee1kall_ploidy-calls --output_calls_path=/home/shlee/gcc/hc24_soohee1k_chr1-calls --modeling_interval_list=/tmp/shlee/intervals1147946183347323472.tsv --output_model_path=/home/shlee/gcc/hc24_soohee1k_chr1-model --enable_explicit_gc_bias_modeling=False --read_count_tsv_files /tmp/shlee/sample-04516283083315244626.tsv /tmp/shlee/sample-17497576995757363646.tsv /tmp/shlee/sample-21271002324475135098.tsv /tmp/shlee/sample-36985602309924438312.tsv /tmp/shlee/sample-44773997237633003175.tsv /tmp/shlee/sample-55563425618633690228.tsv /tmp/shlee/sample-66588087553393228850.tsv /tmp/shlee/sample-77682253251273859622.tsv /tmp/shlee/sample-85102983131590855033.tsv /tmp/shlee/sample-91827141265495425349.tsv /tmp/shlee/sample-103880243493985861748.tsv /tmp/shlee/sample-116680765766192550894.tsv /tmp/shlee/sample-12124239099888077446.tsv /tmp/shlee/sample-137882283449137233563.tsv /tmp/shlee/sample-143394886316760954899.tsv /tmp/shlee/sample-152734003651556671536.tsv /tmp/shlee/sample-164992531160419965633.tsv /tmp/shlee/sample-179066010714184030831.tsv /tmp/shlee/sample-186093298660325983992.tsv /tmp/shlee/sample-195129395964281545047.tsv /tmp/shlee/sample-205662400342674148944.tsv /tmp/shlee/sample-214794162996551180382.tsv /tmp/shlee/sample-221249794311427321965.tsv /tmp/shlee/sample-233870240722976827789.tsv --psi_s_scale=1.000000e-04 --mapping_error_rate=1.000000e-02 --depth_correction_tau=1.000000e+04 --q_c_expectation_mode=hybrid --max_bias_factors=5 --psi_t_scale=1.000000e-03 --log_mean_bias_std=1.000000e-01 --init_ard_rel_unexplained_variance=1.000000e-01 --num_gc_bins=20 --gc_curve_sd=1.000000e+00 --active_class_padding_hybrid_mode=50000 --enable_bias_factors=True --disable_bias_factors_in_active_class=False --p_alt=1.000000e-06 --cnv_coherence_length=1.000000e+04 --max_copy_number=5 --p_active=0.010000 --class_coherence_length=10000.000000 --learning_rate=5.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.900000e-01 --log_emission_samples_per_round=50 --log_emission_sampling_rounds=10 --log_emission_sampling_median_rel_error=5.000000e-03 --max_advi_iter_first_epoch=100 --max_advi_iter_subsequent_epochs=100 --min_training_epochs=10 --max_training_epochs=50 --initial_temperature=2.000000e+00 --num_thermal_epochs=20 --convergence_snr_averaging_window=500 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=10 --caller_update_convergence_threshold=1.000000e-03 --caller_admixing_rate=7.500000e-01 --disable_caller=false --disable_sampler=false --disable_annealing=false
        at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
        at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
        at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
        at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
        at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
        at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.executeGermlineCNVCallerPythonScript(GermlineCNVCaller.java:450)
        at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:288)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

Posting this as per @samuelklee's request.

@sooheelee sooheelee changed the title gCNV does not overwrite existing file name and errors at the end of the run GermlineCNVCaller does not overwrite existing file name and errors at the end of the run May 29, 2018
@samuelklee
Copy link
Contributor

@sooheelee did you encounter this when running at the command line in a python environment, or when using the Docker image? And did the /home/shlee/gcc/hc24_soohee1k_chr1-model/ directory exist prior to running?

@sooheelee
Copy link
Contributor Author

@samuelklee, in the command line with source activate gatk on and within a tmux session. From my notes for this error, I see

(gatk) shlee@brie:~$ 

So definitely not a Docker run.

@Yu-jinKim
Copy link

Hello, i've had a similar issue:

[February 25, 2019 3:59:35 PM GMT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 106.19 minutes. Runtime.totalMemory()=745537536 org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException: python exited with 137 Command Line: python /tmp/cohort_denoising_calling.5199478827672377510.py --ploidy_calls_path=/mnt/storage/home/kimy/projects/CNV_calling/results/190225.181217_K00178.ploidy/cohort-calls --output_calls_path=/mnt/storage/home/kimy/projects/CNV_calling/results/190225.181217_K00178.CNVCaller/cohort-calls --output_tracking_path=/mnt/storage/home/kimy/projects/CNV_calling/results/190225.181217_K00178.CNVCaller/cohort-tracking --modeling_interval_list=/tmp/intervals6808861825847823427.tsv --output_model_path=/mnt/storage/home/kimy/projects/CNV_calling/results/190225.181217_K00178.CNVCaller/cohort-model --enable_explicit_gc_bias_modeling=False --read_count_tsv_files /tmp/sample-03370867836644351516.tsv /tmp/sample-11290385693458589644.tsv /tmp/sample-26680414097017365398.tsv /tmp/sample-34861302842039719789.tsv /tmp/sample-46167771232071802189.tsv /tmp/sample-5397722415827930713.tsv /tmp/sample-68905563219726535027.tsv /tmp/sample-71822298944654562942.tsv /tmp/sample-82852469154924336573.tsv /tmp/sample-94435864773747747133.tsv /tmp/sample-10291209000691874035.tsv /tmp/sample-112960128087988103802.tsv /tmp/sample-125642393799602752840.tsv /tmp/sample-137913221119280339269.tsv /tmp/sample-148798104107874397064.tsv /tmp/sample-151707670833440725046.tsv /tmp/sample-165190254391986506906.tsv /tmp/sample-171669462045718151420.tsv /tmp/sample-1886514412161155023.tsv /tmp/sample-197025230919736630167.tsv /tmp/sample-202669755366650912708.tsv /tmp/sample-218211154753873225756.tsv /tmp/sample-227491296710974059874.tsv /tmp/sample-235757810671885317124.tsv --psi_s_scale=1.000000e-04 --mapping_error_rate=1.000000e-02 --depth_correction_tau=1.000000e+04 --q_c_expectation_mode=hybrid --max_bias_factors=5 --psi_t_scale=1.000000e-03 --log_mean_bias_std=1.000000e-01 --init_ard_rel_unexplained_variance=1.000000e-01 --num_gc_bins=20 --gc_curve_sd=1.000000e+00 --active_class_padding_hybrid_mode=50000 --enable_bias_factors=True --disable_bias_factors_in_active_class=False --p_alt=1.000000e-06 --cnv_coherence_length=1.000000e+04 --max_copy_number=5 --p_active=0.010000 --class_coherence_length=10000.000000 --learning_rate=1.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.900000e-01 --log_emission_samples_per_round=50 --log_emission_sampling_rounds=10 --log_emission_sampling_median_rel_error=5.000000e-03 --max_advi_iter_first_epoch=5000 --max_advi_iter_subsequent_epochs=200 --min_training_epochs=10 --max_training_epochs=50 --initial_temperature=1.500000e+00 --num_thermal_advi_iters=2500 --convergence_snr_averaging_window=500 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=10 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=1.000000e+00 --disable_caller=false --disable_sampler=false --disable_annealing=false Stdout: 14:13:50.032 INFO cohort_denoising_calling - Loading 24 read counts file(s)... 14:13:53.719 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata... 14:13:58.626 INFO gcnvkernel.tasks.task_cohort_denoising_calling - Instantiating the denoising model (warm-up)... 14:14:04.543 INFO gcnvkernel.models.fancy_model - Global model variables: {'W_tu', 'psi_t_log__', 'ard_u_log__', 'log_mean_bias_t'} 14:14:04.544 INFO gcnvkernel.models.fancy_model - Sample-specific model variables: {'z_su', 'psi_s_log__', 'read_depth_s_log__'} 14:14:04.544 WARNING gcnvkernel.tasks.inference_task_base - No log emission sampler given; skipping the sampling step 14:14:04.544 WARNING gcnvkernel.tasks.inference_task_base - No caller given; skipping the calling step 14:14:04.544 INFO gcnvkernel.tasks.inference_task_base - Instantiating the convergence tracker... 14:14:04.544 INFO gcnvkernel.tasks.inference_task_base - Setting up DA-ADVI... 14:14:10.902 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up)) starting...: 0%| | 0/5000 [00:00<?, ?it/s] 14:14:12.877 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: N/A, SNR: N/A, T: 1.50: 0%| | 1/5000 [00:01<2:44:32, 1.97s/it] 14:14:14.753 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -145.294 +/- 0.000, SNR: 35869952999211676.0, T: 1.50: 0%| | 2/5000 [00:03<2:40:21, 1.93s/it] 14:14:16.609 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -110.174 +/- 10.941, SNR: 568.3, T: 1.50: 0%| | 3/5000 [00:05<2:38:25, 1.90s/it] 14:14:18.444 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -125.004 +/- 11.232, SNR: 292.3, T: 1.50: 0%| | 4/5000 [00:07<2:36:59, 1.89s/it] 14:14:20.289 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -144.050 +/- 15.199, SNR: 8.9, T: 1.50: 0%| | 5/5000 [00:09<2:36:16, 1.88s/it] 14:14:22.116 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -144.733 +/- 13.469, SNR: 1.3, T: 1.50: 0%| | 6/5000 [00:11<2:35:33, 1.87s/it] 14:14:23.926 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -147.237 +/- 12.420, SNR: 24.0, T: 1.50: 0%| | 7/5000 [00:13<2:34:49, 1.86s/it] 14:14:25.759 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -150.242 +/- 11.819, SNR: 53.3, T: 1.50: 0%| | 8/5000 [00:14<2:34:30, 1.86s/it] 14:14:27.583 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.414 +/- 11.116, SNR: 41.3, T: 1.50: 0%| | 9/5000 [00:16<2:34:10, 1.85s/it] 14:14:29.518 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -149.789 +/- 10.551, SNR: 55.2, T: 1.50: 0%| | 10/5000 [00:18<2:34:49, 1.86s/it] 14:14:31.282 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -150.978 +/- 10.081, SNR: 67.5, T: 1.50: 0%| | 11/5000 [00:20<2:34:02, 1.85s/it] 14:14:33.089 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -147.531 +/- 10.426, SNR: 41.0, T: 1.50: 0%| | 12/5000 [00:22<2:33:42, 1.85s/it] 14:14:34.882 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.847 +/- 10.066, SNR: 51.5, T: 1.50: 0%| | 13/5000 [00:23<2:33:18, 1.84s/it] 14:14:36.873 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.074 +/- 9.669, SNR: 48.5, T: 1.50: 0%| | 14/5000 [00:25<2:34:09, 1.86s/it] 14:14:38.734 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -149.072 +/- 9.380, SNR: 56.4, T: 1.50: 0%| | 15/5000 [00:27<2:34:09, 1.86s/it] 14:14:40.604 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -149.567 +/- 9.052, SNR: 61.6, T: 1.50: 0%| | 16/5000 [00:29<2:34:11, 1.86s/it] 14:14:42.485 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.402 +/- 8.903, SNR: 55.7, T: 1.50: 0%| | 17/5000 [00:31<2:34:17, 1.86s/it] 14:14:44.272 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -149.207 +/- 8.697, SNR: 61.7, T: 1.50: 0%| | 18/5000 [00:33<2:33:56, 1.85s/it] 14:14:46.178 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -149.406 +/- 8.426, SNR: 64.8, T: 1.50: 0%| | 19/5000 [00:35<2:34:07, 1.86s/it] 14:14:48.136 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -147.117 +/- 9.040, SNR: 48.9, T: 1.50: 0%| | 20/5000 [00:37<2:34:31, 1.86s/it] 14:14:50.017 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -146.929 +/- 8.773, SNR: 49.4, T: 1.50: 0%| | 21/5000 [00:39<2:34:33, 1.86s/it] 14:14:51.976 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -147.356 +/- 8.553, SNR: 52.7, T: 1.50: 0%| | 22/5000 [00:41<2:34:53, 1.87s/it] 14:14:53.981 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.649 +/- 8.651, SNR: 58.1, T: 1.50: 0%| | 23/5000 [00:43<2:35:21, 1.87s/it] 14:14:55.909 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.848 +/- 8.430, SNR: 60.5, T: 1.50: 0%| | 24/5000 [00:45<2:35:31, 1.88s/it] 14:14:57.761 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -148.098 +/- 8.342, SNR: 57.8, T: 1.50: 0%| | 25/5000 [00:46<2:35:24, 1.87s/it] 14:14:59.742 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -147.870 +/- 8.153, SNR: 58.2, T: 1.49: 1%| | 26/5000 [00:48<2:35:43, 1.88s/it] 14:15:01.759 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -147.645 +/- 7.977, SNR: 58.5, T: 1.49: 1%| | 27/5000 [00:50<2:36:07, 1.88s/it] 14:15:03.674 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -146.458 +/- 8.181, SNR: 52.3, T: 1.49: 1%| | 28/5000 [00:52<2:36:10, 1.88s/it] 14:15:05.669 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -145.601 +/- 8.211, SNR: 48.7, T: 1.49: 1%| | 29/5000 [00:54<2:36:27, 1.89s/it] 14:15:07.503 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -144.395 +/- 8.461, SNR: 42.9, T: 1.49: 1%| | 30/5000 [00:56<2:36:16, 1.89s/it] 14:15:09.330 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -143.653 +/- 8.457, SNR: 40.3, T: 1.49: 1%| | 31/5000 [00:58<2:36:05, 1.88s/it] 14:15:11.195 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -143.713 +/- 8.287, SNR: 41.3, T: 1.49: 1%| | 32/5000 [01:00<2:36:00, 1.88s/it] 14:15:13.165 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -144.466 +/- 8.290, SNR: 43.8, T: 1.49: 1%| | 33/5000 [01:02<2:36:11, 1.89s/it] 14:15:15.037 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -144.273 +/- 8.147, SNR: 43.9, T: 1.49: 1%| | 34/5000 [01:04<2:36:07, 1.89s/it] 14:15:17.018 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -143.297 +/- 8.331, SNR: 39.8, T: 1.49: 1%| | 35/5000 [01:06<2:36:18, 1.89s/it] 14:15:18.990 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -143.084 +/- 8.201, SNR: 39.7, T: 1.49: 1%| | 36/5000 [01:08<2:36:28, 1.89s/it] 14:15:20.885 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -143.530 +/- 8.123, SNR: 41.4, T: 1.49: 1%| | 37/5000 [01:09<2:36:27, 1.89s/it] 14:15:22.790 INFO gcnvkernel.tasks.inference_task_base - (denoising (warm-up) epoch 1) ELBO: -143.881 +/- 8.026, SNR: 43.0, T: 1.49: 1%| | 38/5000 [01:11<2:36:26, 1.89s/it] 14:15:24.618 INFO gcnvkernel.tasks.inference_task_base - (denoisi Stderr: at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75) at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126) at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170) at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151) at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121) at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.executeGermlineCNVCallerPythonScript(GermlineCNVCaller.java:441) at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:288) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291) Using GATK jar /mnt/storage/apps/software/gatk/4.1.0.0/gatk-package-4.1.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/storage/apps/software/gatk/4.1.0.0/gatk-package-4.1.0.0-local.jar GermlineCNVCaller --run-mode COHORT --contig-ploidy-calls results/190225.181217_K00178.ploidy/cohort-calls/ --input results/200219_X008378.counts.tsv --input results/200219_X008388.counts.tsv --input results/200219_X008389.counts.tsv --input results/200219_X008409.counts.tsv --input results/200219_X008410.counts.tsv --input results/200219_X008411.counts.tsv --input results/200219_X008412.counts.tsv --input results/200219_X008415.counts.tsv --input results/200219_X008417.counts.tsv --input results/200219_X008420.counts.tsv --input results/200219_X008422.counts.tsv --input results/200219_X008423.counts.tsv --input results/200219_X008429.counts.tsv --input results/200219_X008430.counts.tsv --input results/200219_X008432.counts.tsv --input results/200219_X008458.counts.tsv --input results/200219_X008493.counts.tsv --input results/200219_X008504.counts.tsv --input results/200219_X008512.counts.tsv --input results/200219_X008522.counts.tsv --input results/200219_X008523.counts.tsv --input results/200219_X008525.counts.tsv --input results/200219_X008528.counts.tsv --input results/200219_X008543.counts.tsv --output results/190225.181217_K00178.CNVCaller --output-prefix cohort

I created the 190225.181217_K00178.CNVCaller folder beforehand because otherwise I'd get this error:

08:37:16.407 INFO GermlineCNVCaller - Start Date/Time: February 26, 2019 8:37:04 AM GMT 08:37:16.407 INFO GermlineCNVCaller - ------------------------------------------------------------ 08:37:16.407 INFO GermlineCNVCaller - ------------------------------------------------------------ 08:37:16.408 INFO GermlineCNVCaller - HTSJDK Version: 2.18.2 08:37:16.408 INFO GermlineCNVCaller - Picard Version: 2.18.25 08:37:16.408 INFO GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2 08:37:16.408 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 08:37:16.408 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 08:37:16.408 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 08:37:16.408 INFO GermlineCNVCaller - Deflater: IntelDeflater 08:37:16.409 INFO GermlineCNVCaller - Inflater: IntelInflater 08:37:16.409 INFO GermlineCNVCaller - GCS max retries/reopens: 20 08:37:16.409 INFO GermlineCNVCaller - Requester pays: disabled 08:37:16.409 INFO GermlineCNVCaller - Initializing engine 08:37:21.698 INFO GermlineCNVCaller - Done initializing engine 08:37:22.015 INFO GermlineCNVCaller - Retrieving intervals from read-count file (results/200219_X008378.counts.tsv)... 08:37:22.119 INFO GermlineCNVCaller - No annotated intervals were provided... 08:37:22.120 INFO GermlineCNVCaller - No GC-content annotations for intervals found; explicit GC-bias correction will not be performed... 08:37:22.194 INFO GermlineCNVCaller - Running the tool in the COHORT mode... 08:37:22.195 INFO GermlineCNVCaller - Shutting down engine [February 26, 2019 8:37:22 AM GMT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 0.29 minutes. Runtime.totalMemory()=330301440 java.lang.IllegalArgumentException: Output directory results/190226.181217_K00178.CNVCaller does not exist. at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724) at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.validateArguments(GermlineCNVCaller.java:361) at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:281) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205) at org.broadinstitute.hellbender.Main.main(Main.java:291) Using GATK jar /mnt/storage/apps/software/gatk/4.1.0.0/gatk-package-4.1.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/storage/apps/software/gatk/4.1.0.0/gatk-package-4.1.0.0-loc

I'm running these command line in the conda environment explained by the gatk tutorial. Any insight?

@samuelklee
Copy link
Contributor

The first issue looks like an out of memory error. You may need to scatter your intervals into separate shards, as is done in the WDLs: https://github.com/broadinstitute/gatk/tree/master/scripts/cnv_wdl/germline

The second issue regarding the output directory creation is by design---CNV tools require the output directory to exist beforehand.

@samuelklee
Copy link
Contributor

@samuelklee
Copy link
Contributor

samuelklee commented Feb 26, 2019

@sooheelee do you remember what version you were using when you encountered this? As far as I can tell, we should fail early in the most recent version (as you can see from @Yu-jinKim's post).

EDIT: Never mind, actually the fail-early check is only for existence and not for write permission.

@samuelklee samuelklee changed the title GermlineCNVCaller does not overwrite existing file name and errors at the end of the run Ensure consistent output directory requirements in CNV tools. Feb 26, 2019
@samuelklee
Copy link
Contributor

samuelklee commented Feb 26, 2019

Changed the name of the issue. As discussed in the above forum thread, behavior is already consistent across CNV tools---except for DetermineGermlineContigPloidy, where the check is missing, so this issue is really just a reminder to add that (along with checks for write permission). We don't really have conventions for this sort of thing GATK-wide, though.

@sooheelee
Copy link
Contributor Author

@samuelklee Is this type of thing something you'd like for us to start documenting, e.g. in the repo wiki? There is some need for this type of document if we expect more external contributions going forward. We can ask @droazen to weigh in.

@samuelklee
Copy link
Contributor

We had some discussion about conventions in the hellbender Slack. Different tools have different requirements or expectations, but it would be good to converge on and document a couple of patterns.

@sooheelee
Copy link
Contributor Author

sooheelee commented Mar 1, 2019

If you don't mind the list being public, the github wiki, e.g. like this existing article https://github.com/broadinstitute/gatk/wiki/GATK4-Documentation-Guidelines, seems like a good place to start a list.

@sooheelee
Copy link
Contributor Author

I made you a page to start collecting such reminders at https://github.com/broadinstitute/gatk/wiki/Checks-and-tests-guidelines.

@samuelklee
Copy link
Contributor

Excellent, thanks!

@samuelklee
Copy link
Contributor

samuelklee commented Mar 3, 2019

Just finished switching over all of the CNV tools to fail early if directories are not writeable---or do not exist and cannot be created---only to realize that this behavior is inconsistent with that of Picard IntervalListTools (which is used in the gCNV pipeline).

That tool fails early if the output directory is not writeable or does not exist, and although there is a code path later that suggests that output directories should be created, it is not reached due to this early fail. It might be that this inconsistency was introduced in broadinstitute/picard#1208 and I did not catch it in my PR review. @yfarjoun any opinions what the intended behavior should be? Are there any conventions for Picard tools in general?

Perhaps we could enforce this at the engine level (maybe checks that are triggered by annotations such as suggested in #141, if possible)? But this would only work for GATK tools and would still rely on the diligence of developers.

In any case, I'll decide on and document a convention for the CNV tools, but I think it might be a quixotic dream to enforce consistent behavior---especially without breaking things downstream which may rely on existing, inconsistent behavior...

@yfarjoun
Copy link
Contributor

yfarjoun commented Mar 6, 2019 via email

@samuelklee
Copy link
Contributor

Thanks, @yfarjoun, I think those are reasonable. Just to be clear, the code for the tool mentioned above is a little confusing, in that an early fail for writability when the directory does not exist prevents us from reaching code that appears to be intended to create the directory. Not a big deal in the end (and I checked that this was also the case before the PR).

But minor things like this can easily break downstream scripts, etc., as was demonstrated above, so we should take some care. I agree that it's fine to leave some decisions up to each tool, but we should try to document them for the benefit of users and future devs that might need to maintain the behavior of the tool.

samuelklee added a commit that referenced this issue Mar 12, 2019
* Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typos. (#5382)

* Added output of MAD values as floats in somatic CNV WDL. (#5591)

* Exposed boot disk space for Oncotator in somatic CNV WDL. (#3566)

* Added check to skip outlier truncation if number of matrix elements exceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)

* Miscellaneous boy scout activities.

* Fixed some issues concerning intervals in DetermineGermlineContigPloidy documentation.

* Fixed non-kebab-case argument in CollectAllelicCountsSpark and other minor issues.

* Improved consistency of style and input/output validation across CNV tools. (#4825)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants