-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 timeout: montage_a
& montage_s
#1338
Comments
Aha! Someone else has finally seen this. Some of my big regression tests runs have stalled at this point but I was never able to replicate it on command. |
I suspect this behavior won't show itself if the montage nodes are run individually or outside a Nipype workflow- but I am going to try a unit test for them just in case- I'll let you know if I end up with any useful info. |
Update: this issue seems not to occur if running with just 1 CPU, so probably/possibly related to #1130 |
Okay yeah- then I'm going to test it via a Nipype workflow. |
Have you seen any other montage nodes hang other than these?
|
@shnizzedy in your run: are the resample_o/resample_u nodes being successfully completed? Do they have full folders (report.rst etc.) in the working directory? Would be in /working/resting_preproc_{sub}/montage_csf_gm_wm_x |
Are they supposed to have both? Looks like yes to
|
Yes, for the csf_gm_wm montages, you'll get three
I noticed in some of the other participants from my test run, they had I made a small workflow that only runs these montage nodes on whatever data you give it, with Multiproc enabled and basically the same environment as the whole pipeline; I haven't been able to replicate any hanging or stalling yet. I would say that might be a clue though; for the stalling runs- can check the upstream intermediates like the tissue files, the skullstrip, etc.- but those all come from such different processes that I doubt it has something to do with upstream nodes stalling. |
At least sometimes, maybe always, tracebacks like this appear when this behavior occurs.
Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 69, in run_node
result['result'] = node.run(updatehash=updatehash)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 471, in run
result = self._run_interface(execute=True)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 555, in _run_interface
return self._run_command(execute)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_command
result = self._interface.run(cwd=outdir)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 523, in run
outputs = self.aggregate_outputs(runtime)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 597, in aggregate_outputs
predicted_outputs = self._list_outputs()
File "/code/CPAC/utils/interfaces/datasink.py", line 586, in _list_outputs
use_hardlink=use_hardlink)
File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/utils/filemanip.py", line 443, in copyfile
os.unlink(newfile)
FileNotFoundError: [Errno 2] No such file or directory: '/outputs/output/pipeline_analysis_freq-filter_nuisance/sub-04_ses-movie/raw_functional/sub-04_ses-movie_task-movie_run-8_bold.nii.gz' |
The same problem does not seem to exist for registration: C-PAC/CPAC/registration/registration.py Line 662 in a218419
C-PAC/CPAC/registration/registration.py Line 828 in a218419
C-PAC/CPAC/registration/registration.py Lines 555 to 561 in e5b2320
C-PAC/CPAC/longitudinal_pipeline/longitudinal_workflow.py Lines 245 to 250 in a218419
C-PAC/CPAC/pipeline/cpac_pipeline.py Lines 1246 to 1251 in d3614c5
|
Closing as duplicate of #1404. Will reopen if still occurs after resolving that issue. |
Describe the bug
SLURM running C-PAC in Singularity with the following sbatch options times out with 5
montage_a
and 5montage_s
tasks "running" for 12 hours.One run:
Another run:
Expected behavior
C-PAC run completes or throws an error
Versions
Additional context
Possibly related:
The text was updated successfully, but these errors were encountered: