Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASLPrep hangs when changing dseg datatype #310

Closed
Luoyu-Wang opened this issue Jun 26, 2023 · 29 comments · Fixed by #316
Closed

ASLPrep hangs when changing dseg datatype #310

Luoyu-Wang opened this issue Jun 26, 2023 · 29 comments · Fixed by #316
Labels
bug Something isn't working

Comments

@Luoyu-Wang
Copy link

Luoyu-Wang commented Jun 26, 2023

Summary

Thank you very much for your previous guidance. We used the unstable version of ASLPREP to process multi-PLD ASL data from GE scans and found that we got stuck at this step several days, we found the cpu was running very inefficiently(about 3%), while processing single-PLD ASL data normally, what is the reason for this?

docker run -ti --rm \
    -v /media/sf_data1/wly/ASL/BIDS:/data:ro \
    -v /media/sf_data1/wly/ASL/derivatives:/out:rw \
    -v /media/sf_data1/wly/ASL/wd_temp:/work:rw \
    -v /home/wly/softwares/freesurfer/license.txt:/opt/freesurfer/license.txt  \
    -v /home/wly/softwares/templateflow:/home/aslprep/.cache/templateflow/ \
    pennlinc/aslprep:unstable /data /out participant \
    -w /work --ignore fieldmaps \
    --output-spaces MNI152NLin2009cAsym:res-2
...
 [Node] Cached "_anat2std_tpms1" - collecting precomputed outputs
230626-01:21:26,471 nipype.workflow INFO:
[Node] "_anat2std_tpms1" found cached.
230626-01:21:26,472 nipype.workflow INFO:
[Node] Setting-up "_anat2std_tpms2" in "/work/aslprep_wf/single_subject_S0001_wf/anat_preproc_wf/anat_derivatives_wf/_in_tuple_MNI152NLin2009cAsym.res2/anat2std_tpms/mapflow/_anat2std_tpms2".
230626-01:21:26,472 nipype.workflow INFO:
[Node] Cached "_anat2std_tpms2" - collecting precomputed outputs
230626-01:21:26,473 nipype.workflow INFO:
[Node] "_anat2std_tpms2" found cached.
230626-01:21:27,729 nipype.workflow INFO:
[Node] Finished "norm_rpt", elapsed time 10.425669s.
230626-01:21:33,3 nipype.workflow INFO:
[Node] Finished "anat2std_mask", elapsed time 12.22636s.
230626-01:21:33,692 nipype.workflow INFO:
[Node] Finished "anat2std_dseg", elapsed time 12.916289s.
230626-01:21:34,425 nipype.interface WARNING:
Changing /out/aslprep/sub-S0001/anat/sub-S0001_space-MNI152NLin2009cAsym_res-2_desc-brain_mask.nii.gz dtype from float64 to uint8
230626-01:21:34,689 nipype.interface WARNING:
Changing /out/aslprep/sub-S0001/anat/sub-S0001_space-MNI152NLin2009cAsym_res-2_dseg.nii.gz dtype from float64 to int16

Additional details

  • ASLPrep version: unstable
  • Docker version: 23.0.1
  • Singularity version:

What were you trying to do?

We used the unstable version of ASLPREP to process multi-PLD ASL data from GE scans

docker run -ti --rm -v /media/sf_data1/wly/ASL/BIDS:/data:ro -v /media/sf_data1/wly/ASL/derivatives:/out:rw -v /media/sf_data1/wly/ASL/wd_temp:/work:rw -v /home/wly/softwares/freesurfer/license.txt:/opt/freesurfer/license.txt -v /home/wly/softwares/templateflow:/home/aslprep/.cache/templateflow/ pennlinc/aslprep:unstable /data /out participant -w /work --ignore fieldmaps --output-spaces MNI152NLin2009cAsym:res-2

What did you expect to happen?

What actually happened?

We got stuck at this step several days, we found the cpu was running very inefficiently(about 3%),

Reproducing the bug

@Luoyu-Wang Luoyu-Wang added the bug Something isn't working label Jun 26, 2023
@tsalo
Copy link
Member

tsalo commented Jun 26, 2023

Can you fill in all of the fields of the bug report template?

Also, I'm not sure what the traceback you posted refers to. Why did you share that section of the log? Is that the part that ASLPrep gets stuck at?

@Luoyu-Wang
Copy link
Author

Luoyu-Wang commented Jun 26, 2023

hank you very much for your guidance and I have filled in most the fields of the bug report template.

Yes, that is the part that ASLPrep gets stuck at and there is no information after that.

@tsalo
Copy link
Member

tsalo commented Jun 26, 2023

Thank you for the updated info. You are just using Docker directly now, right? Not within a virtual box?

Can you update the top comment with the command you used to call ASLPrep as well?
Also, how much memory and how many CPUs are allotted in your Docker settings?
One other question- are you using a Mac with an M1 or M2 chip?

@Luoyu-Wang
Copy link
Author

Luoyu-Wang commented Jun 27, 2023

Thanks for your guidance. Yes, in fact, I ran ASLprep with both the direct call and the virtualbox separately, but both had the same result.
Ok, I have updated.
CPU and memory did not make any limits.
CPU is intel 8350C *2

@tsalo
Copy link
Member

tsalo commented Jun 27, 2023

CPU and memory did not make any limits.

I think maybe you're referring to the settings in your ASLPrep call. Docker can't be run without any limits, as far as I know. If you open your Docker dashboard, you can go to the Settings page and then go to the Resources tab. You should be able to see how many CPUs, how much memory, and how much swap you've allowed Docker to have.

@Luoyu-Wang
Copy link
Author

Thanks for your guidance. I checked the memory (512GB) and CPU (128) through Docker Info -- format, without any restrictions

@tsalo
Copy link
Member

tsalo commented Jun 27, 2023

I'm quite stumped then. Are these data shareable? Would you be able to upload the failing data to something like Box or Dropbox and share that with me so I can debug locally?

@Luoyu-Wang
Copy link
Author

Luoyu-Wang commented Jun 28, 2023 via email

@tsalo
Copy link
Member

tsalo commented Jun 29, 2023

Thanks @Luoyu-Wang. I was able to download the data and start debugging. I wasn't able to run ASLPrep all the way through yesterday, but one thing I noticed is that ASLPrep's heuristic to switch between GE-specific processing and general processing uses the number of volumes in the ASL file. Any ASL file with <= 5 volumes is run through the GE processing pipeline, but your data have 6 volumes (5 PLDs, plus the M0 scan), so the general processing pipeline is being called incorrectly.

I think I will need to add parameters to explicitly control this behavior (e.g., --force-ge and --force-non-ge).

I don't know for sure that this is causing your problem, since my run didn't stall one the step you reported, but it should help.

@Luoyu-Wang
Copy link
Author

Thanks for your guidance. This error disappeared when I used 3PLD's ASL analysis, but a new errer appeared.

Node: aslprep_wf.single_subject_S0001_wf.asl_preproc_asl.compute_cbf_wf.compute_cbf
Working directory: /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/compute_cbf_wf/compute_cbf

Node inputs:

cbf_only = False
deltam = /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/compute_cbf_wf/extract_deltam/sub-S0001_asl_m0file_brain_mask_cbfdeltam.nii.gz
m0_file = /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/asl_reference_ge_wf/gen_ge_ref/sub-S0001_asl_m0file.nii
m0_scale = 1.0
mask = /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/compute_cbf_wf/refine_mask/sub-S0001_asl_m0file_brain_mask_refinemask.nii.gz
metadata = {'AcquisitionMatrixPE': 8, 'AcquisitionNumber': 1, 'AcquisitionTime': '17:02:21.000000', 'ArterialSpinLabelingType': 'PCASL', 'BackgroundSuppression': True, 'CoilString': '48HAP', 'ConversionSoftware': 'dcm2niix', 'ConversionSoftwareVersion': 'v1.0.20230411', 'DeviceSerialNumber': '0000082427140268', 'EchoTime': 0.012432, 'FlipAngle': 111, 'ImageOrientationPatientDICOM': [1, 0, 0, 0, 1, 0], 'ImageType': ['ORIGINAL', 'PRIMARY', 'OTHER', 'REAL'], 'ImagingFrequency': 127.77, 'InPlanePhaseEncodingDirectionDICOM': 'COL', 'InstitutionName': 'ZJ Hangzhou NO.1 hospital', 'InversionTime': 1, 'LabelingDuration': 1, 'M0Type': 'Included', 'MRAcquisitionType': '3D', 'MagneticFieldStrength': 3, 'Manufacturer': 'GE', 'ManufacturersModelName': 'SIGNA Premier', 'Modality': 'MR', 'NumberOfArms': 8, 'NumberOfPointsPerArm': 512, 'ParallelReductionFactorInPlane': 2, 'PatientPosition': 'HFS', 'PercentPhaseFOV': 100, 'PercentSampling': 100, 'PixelBandwidth': 976.56, 'PostLabelDelay': [1, 2, 3], 'PostLabelingDelay': [1, 2, 3, 0], 'PrescanReuseString': 'RN/s3,Coil/s1', 'ProtocolName': 'research_brain_test_2306', 'PulseSequenceName': 'research/eASL6/easl_PA6', 'ReconMatrixPE': 128, 'RepetitionTime': 10.156, 'RepetitionTimePreparation': 10.156, 'SAR': 1.0431, 'ScanOptions': 'SAT_GEMS\EDR_GEMS\SPIRAL_GEMS\FS', 'ScanningSequence': 'RM', 'SequenceVariant': 'NONE', 'SeriesDescription': 'NOT DIAGNOSTIC: (Raw) EASL_3PLD_1000', 'SeriesNumber': 400, 'ShimSetting': [28, -49, -198], 'SliceThickness': 2, 'SoftwareVersions': '29\LX\Product Software Version: RX29.1_R02_2131.a\Orchestra SDK', 'SpacingBetweenSlices': 2, 'StationName': 'GEHCGEHC', 'VascularCrushing': True, 'timeEncodedMatrixSize': [4, 4]}

Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
self.procs[jobid].run(updatehash=updatehash)
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
result = self._run_interface(execute=True)
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
return self._run_command(execute)
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node compute_cbf.

Traceback:
Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 397, in run
runtime = self._run_interface(runtime)
File "/usr/local/miniconda/lib/python3.8/site-packages/aslprep/interfaces/cbf.py", line 571, in _run_interface
att, mean_cbf = estimate_cbf_pcasl_multipld(
File "/usr/local/miniconda/lib/python3.8/site-packages/aslprep/utils/cbf.py", line 700, in estimate_cbf_pcasl_multipld
raise ValueError(
ValueError: Number of PostLabelingDelays (4) does not match number of delta-M volumes (3).

@tsalo
Copy link
Member

tsalo commented Jun 29, 2023

That makes sense. You must not have modified the PostLabelingDelay field in the JSON file.

@Luoyu-Wang
Copy link
Author

Luoyu-Wang commented Jun 29, 2023

In fact, I started out using PostLabelingDelay:[1000, 2000, 3000], but ASLprep reported errors before start:
1: [ERR] The number of values for 'PostLabelingDelay' for this file does not match the 4th dimension of the NIfTI header. 'PostLabelingDelay' is the time, in seconds, after the the end of the labeling (for (P)CASL) or middle of the labeling pulse (for PASL) until the middle of the excitation pulse applied to the imaging slab (for 3D acquisition) or first slice (for 2D acquisition). Can be a number (for a single-PLD time series) or an array of numbers (for multi-PLD and Look-Locker). In the latter case, the array of numbers contains the PLD of each volume (i.e. each 'control' and 'label') in the acquisition order. Any image within the time-series without a PLD (e.g. an 'm0scan') is indicated by a zero. Based on DICOM Tags 0018,9079 Inversion Times and 0018,0082 InversionTime. (code: 173 - POST_LABELING_DELAY_NOT_MATCHING_NIFTI)
./sub-S0001/perf/sub-S0001_asl.nii

Please visit https://neurostars.org/search?q=POST_LABELING_DELAY_NOT_MATCHING_NIFTI for existing conversations about this issue.

2: [ERR] 'The number of values for PostLabelingDelay' for this file does not match the number of volumes in the 'sub-<label>[_ses-<label>][_acq-<label>][_rec-<label>][_run-<index>]_aslcontext.tsv'.'PostLabelingDelay' is the time, in seconds, after the the end of the labeling (for (P)CASL) or middle of the labeling pulse (for PASL) until the middle of the excitation pulse applied to the imaging slab (for 3D acquisition) or first slice (for 2D acquisition). Can be a number (for a single-PLD time series) or an array of numbers (for multi-PLD and Look-Locker). In the latter case, the array of numbers contains the PLD of each volume (i.e. each 'control' and 'label') in the acquisition order. Any image within the time-series without a PLD (e.g. an 'm0scan') is indicated by a zero. Based on DICOM Tags 0018,9079 Inversion Times and 0018,0082 InversionTime. (code: 174 - POST_LABELING_DELAY_NOT_MATCHING_ASLCONTEXT_TSV)

Later I changed PostLabelingDelay to [1000, 2000, 3000, 0], ASLprep can start running, but then it reports an error

Node: aslprep_wf.single_subject_S0001_wf.asl_preproc_asl.compute_cbf_wf.compute_cbf
Working directory: /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/compute_cbf_wf/compute_cbf

Node inputs:

cbf_only = False
deltam = /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/compute_cbf_wf/extract_deltam/sub-S0001_asl_m0file_brain_mask_cbfdeltam.nii.gz
m0_file = /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/asl_reference_ge_wf/gen_ge_ref/sub-S0001_asl_m0file.nii
m0_scale = 1.0
mask = /work/aslprep_wf/single_subject_S0001_wf/asl_preproc_asl/compute_cbf_wf/refine_mask/sub-S0001_asl_m0file_brain_mask_refinemask.nii.gz
metadata = {'AcquisitionMatrixPE': 8, 'AcquisitionNumber': 1, 'AcquisitionTime': '17:02:21.000000', 'ArterialSpinLabelingType': 'PCASL', 'BackgroundSuppression': True, 'CoilString': '48HAP', 'ConversionSoftware': 'dcm2niix', 'ConversionSoftwareVersion': 'v1.0.20230411', 'DeviceSerialNumber': '0000082427140268', 'EchoTime': 0.012432, 'FlipAngle': 111, 'ImageOrientationPatientDICOM': [1, 0, 0, 0, 1, 0], 'ImageType': ['ORIGINAL', 'PRIMARY', 'OTHER', 'REAL'], 'ImagingFrequency': 127.77, 'InPlanePhaseEncodingDirectionDICOM': 'COL', 'InstitutionName': 'ZJ Hangzhou NO.1 hospital', 'InversionTime': 1, 'LabelingDuration': 1, 'M0Type': 'Included', 'MRAcquisitionType': '3D', 'MagneticFieldStrength': 3, 'Manufacturer': 'GE', 'ManufacturersModelName': 'SIGNA Premier', 'Modality': 'MR', 'NumberOfArms': 8, 'NumberOfPointsPerArm': 512, 'ParallelReductionFactorInPlane': 2, 'PatientPosition': 'HFS', 'PercentPhaseFOV': 100, 'PercentSampling': 100, 'PixelBandwidth': 976.56, 'PostLabelDelay': [1, 2, 3], 'PostLabelingDelay': [1, 2, 3, 0], 'PrescanReuseString': 'RN/s3,Coil/s1', 'ProtocolName': 'research_brain_test_2306', 'PulseSequenceName': 'research/eASL6/easl_PA6', 'ReconMatrixPE': 128, 'RepetitionTime': 10.156, 'RepetitionTimePreparation': 10.156, 'SAR': 1.0431, 'ScanOptions': 'SAT_GEMS\EDR_GEMS\SPIRAL_GEMS\FS', 'ScanningSequence': 'RM', 'SequenceVariant': 'NONE', 'SeriesDescription': 'NOT DIAGNOSTIC: (Raw) EASL_3PLD_1000', 'SeriesNumber': 400, 'ShimSetting': [28, -49, -198], 'SliceThickness': 2, 'SoftwareVersions': '29\LX\Product Software Version: RX29.1_R02_2131.a\Orchestra SDK', 'SpacingBetweenSlices': 2, 'StationName': 'GEHCGEHC', 'VascularCrushing': True, 'timeEncodedMatrixSize': [4, 4]}

Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
self.procs[jobid].run(updatehash=updatehash)
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
result = self._run_interface(execute=True)
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
return self._run_command(execute)
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node compute_cbf.

Traceback:
Traceback (most recent call last):
File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 397, in run
runtime = self._run_interface(runtime)
File "/usr/local/miniconda/lib/python3.8/site-packages/aslprep/interfaces/cbf.py", line 571, in _run_interface
att, mean_cbf = estimate_cbf_pcasl_multipld(
File "/usr/local/miniconda/lib/python3.8/site-packages/aslprep/utils/cbf.py", line 700, in estimate_cbf_pcasl_multipld
raise ValueError(
ValueError: Number of PostLabelingDelays (4) does not match number of delta-M volumes (3).

@tsalo
Copy link
Member

tsalo commented Jun 29, 2023

In fact, I started out using PostLabelingDelay:[1000, 2000, 3000]

PostLabelingDelay is in seconds, rather than milliseconds.

ValueError: Number of PostLabelingDelays (4) does not match number of delta-M volumes (3).

It sounds like you missed one or more of the files that need to be modified. You would need to modify PostLabelingDelay in the JSON, the NIFTI file, and the aslcontext.tsv file.

However, dropping some PLDs seems like a bad idea. I'm going to merge #313 soon and it will be available via the unstable Docker tag in about an hour. I would recommend waiting and then trying ASLPrep with the new --force-ge parameter.

@Luoyu-Wang
Copy link
Author

Luoyu-Wang commented Jun 29, 2023

I am very sorry for the error earlier, in fact I was using the unit seconds. I have modified the PostLabelingDelay in JSON to [1,2,3,0], NIFTI contains 3 Δ files and one m0 file, total 4 volumes, aslcontest,tsv I changed to
volume_type
deltam
deltam
deltam
m0scan
I modified the parameters of delta-M as much as possible, but I don‘t know how to change "delta-M volumes (3)"

This is the data:
https://1drv.ms/u/s!AoMxHEmbgddFreMAx5vtK3nYimtzqQ?e=MNzEQT

Thanks for your patience, in fact I scanned the ASL of 3 PLDs and 7 PLDs respectively, I will test it later and expect good results.

@tsalo
Copy link
Member

tsalo commented Jun 29, 2023

The unstable version just deployed to DockerHub. Please give that a try with the --force-ge option.

@Luoyu-Wang
Copy link
Author

I had a try with the --force-ge option,reoprt the same error also. ValueError: Number of PostLabelingDelays (4) does not match number of delta-M volumes (3).
I have modified the parameters of delta-M as much as possible, could you help to check my data? Thanks for your patient guidance.
This is the data:
https://1drv.ms/u/s!AoMxHEmbgddFreMAx5vtK3nYimtzqQ?e=MNzEQT

@tsalo
Copy link
Member

tsalo commented Jun 30, 2023

Sorry for the confusion. I meant run the new version of ASLPrep with your original data (i.e., without any modifications). The only reason you wanted to modify your dataset, as far as I know, was to trick ASLPrep into using the GE-specific workflow. Since the new ASLPrep version has the --force-ge parameter, you no longer need to do that.

I took a quick look at your modified data, although I don't think you should use it because it's not the full data. I noticed that you have an extra PostLabelDelay metadata field, but that shouldn't cause any problems in processing. It's possible that ASLPrep has a bug in how it splits up GE time series. I'll take a look at that today.

@Luoyu-Wang
Copy link
Author

Thanks for your patient guidance. But similar error was reported:Number of PostLabelingDelays (8) does not match number of delta-M volumes (7). Could the data run in your computer successful?

@adcohen2
Copy link

adcohen2 commented Jun 30, 2023

I just wanted to add on that I am having the same issue as the original post for single delay, non-GE data. Namely, the processing hangs at the float64 to int16 conversion for the dseg image.

Here are the last few outputs:

230630-17:04:30,149 nipype.workflow INFO:
	 [Node] Cached "_anat2std_tpms2" - collecting precomputed outputs
230630-17:04:30,149 nipype.workflow INFO:
	 [Node] "_anat2std_tpms2" found cached.
230630-17:04:46,192 nipype.workflow INFO:
	 [Node] Finished "anat2std_t1w", elapsed time 43.912203s.
230630-17:05:49,799 nipype.workflow INFO:
	 [Node] Finished "anat2std_mask", elapsed time 107.51075s.
230630-17:05:51,660 nipype.interface WARNING:
	 Changing /out/aslprep/sub-MCI1008/ses-1/anat/sub-MCI1008_ses-1_space-MNI152NLin2009cAsym_desc-brain_mask.nii.gz dtype from float64 to uint8
230630-17:06:09,131 nipype.workflow INFO:
	 [Node] Finished "anat2std_dseg", elapsed time 126.844922s.
230630-17:06:09,673 nipype.interface WARNING:
	 Changing /out/aslprep/sub-MCI1008/ses-1/anat/sub-MCI1008_ses-1_space-MNI152NLin2009cAsym_dseg.nii.gz dtype from float64 to int16

and the command:

docker run -ti  --rm  -v /usr/local/freesurfer/license.txt:/license/license.txt \
    -v /data/aslprep:/data:ro \
    -v /data/aslprep/derivatives:/out:rw \
    -v /data/aslprep-workdir:/work:rw \
    pennlinc/aslprep:unstable \
    /data \
    /out \
    participant --participant-label MCI1008 --fs-license-file /license/license.txt \
    -w /work --nprocs 20 --omp-nthreads 10 --dummy-vols 2 --scorescrub 

@tsalo
Copy link
Member

tsalo commented Jun 30, 2023

@adcohen2 thanks for sharing your traceback. It seems like there are two issues here.

The first issue is the hanging problem. It looks like this came up before in fMRIPrep (nipreps/fmriprep#2652 and nipreps/fmriprep#2725). @adcohen2, I'm not sure what causes that problem, but one thing that would be really helpful would be if you could clear out the working directory and re-run your job with --nprocs 1 --omp-nthreads 1 -vvv. Hopefully it will stall in that run as well, and if you share both the output and error logs then it will be easier to pin down the exact problem.

The second issue is that there is a bug in ASLPrep when it comes to multi-PLD GE data. I'm working on that and will hopefully have a fix around Wednesday of next week (Monday and Tuesday are holidays for me). I'll probably open a separate issue for this problem so we don't continue to clutter up the hanging issue thread with talk about multi-PLD GE data.

@tsalo tsalo changed the title multi-PLD ASL data from GE scans ASLPrep hangs when changing dseg datatype Jul 5, 2023
@adcohen2
Copy link

adcohen2 commented Jul 6, 2023

Per your suggestion, I reran the job with --nprocs 1 --omp-nthreads 1 -vvv.
Now, it shows an error, but doesn't crash. It just hangs here:

230705-21:54:20,295 nipype.workflow INFO:
	 [MultiProc] Running 1 tasks, and 18 jobs ready. Free memory (GB): 453.28/453.48, Free processors: 0/1.
                     Currently running:
                       * aslprep_wf.single_subject_MCI1008_wf.asl_preproc_ses_1_dir_PA_run_1_wf.asl_hmc_wf.combine_motpars
230705-21:54:20,295 nipype.workflow DEBUG:
	 No resources available
230705-21:54:22,299 nipype.workflow DEBUG:
	 No resources available
230705-21:54:24,298 nipype.workflow DEBUG:
	 No resources available
230705-21:54:26,300 nipype.workflow DEBUG:
	 No resources available
230705-21:54:28,302 nipype.workflow DEBUG:
	 No resources available
230705-21:54:29,265 nipype.workflow INFO:
	 [Node] Setting-up "aslprep_wf.single_subject_MCI1008_wf.asl_preproc_ses_1_dir_PA_run_1_wf.asl_hmc_wf.combine_motpars" in "/work/aslprep_wf/single_subject_MCI1008_wf/asl_preproc_ses_1_dir_PA_run_1_wf/asl_hmc_wf/combine_motpars".
exception calling callback for <Future at 0x7f03619779a0 state=finished raised FileNotFoundError>
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 497, in run
    self._get_hashval()
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 548, in _get_hashval
    self._get_inputs()
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 619, in _get_inputs
    self.set_input(key, deepcopy(output_value))
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 312, in set_input
    setattr(self.inputs, parameter, deepcopy(val))
  File "/usr/local/miniconda/lib/python3.8/site-packages/traits/base_trait_handler.py", line 74, in error
    raise TraitError(
traits.trait_errors.TraitError: The 'm0scan_mat_files' trait of a _CombineMotionParametersInputSpec instance must be a list of items which are a pathlike object or string representing an existing file or None, but a value of '/work/aslprep_wf/single_subject_MCI1008_wf/asl_preproc_ses_1_dir_PA_run_1_wf/asl_hmc_wf/mcflirt_m0scan/sub-MCI1008_ses-1_dir-PA_run-1_asl_reduced_m0scan_mcf.nii.gz.mat/MAT_0000' <class 'str'> was specified.

Error setting node input:
Node: combine_motpars
input: m0scan_mat_files
results_file: /work/aslprep_wf/single_subject_MCI1008_wf/asl_preproc_ses_1_dir_PA_run_1_wf/asl_hmc_wf/mcflirt_m0scan/result_mcflirt_m0scan.pklz
value: /work/aslprep_wf/single_subject_MCI1008_wf/asl_preproc_ses_1_dir_PA_run_1_wf/asl_hmc_wf/mcflirt_m0scan/sub-MCI1008_ses-1_dir-PA_run-1_asl_reduced_m0scan_mcf.nii.gz.mat/MAT_0000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 70, in run_node
    result["result"] = node.result
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 223, in result
    return _load_resultfile(
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/utils.py", line 291, in load_resultfile
    raise FileNotFoundError(results_file)
FileNotFoundError: /work/aslprep_wf/single_subject_MCI1008_wf/asl_preproc_ses_1_dir_PA_run_1_wf/asl_hmc_wf/combine_motpars/result_combine_motpars.pklz
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/concurrent/futures/_base.py", line 328, in _invoke_callbacks
    callback(self)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/usr/local/miniconda/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
FileNotFoundError: /work/aslprep_wf/single_subject_MCI1008_wf/asl_preproc_ses_1_dir_PA_run_1_wf/asl_hmc_wf/combine_motpars/result_combine_motpars.pklz
230705-21:54:30,307 nipype.workflow DEBUG:
	 No resources available
230705-21:54:32,308 nipype.workflow DEBUG:

I can't seem to find an actual crash log or anything.
I also ran the job with --anat-only and it runs all the way through with no errors. It doesn't hang when changing the dseg datatype.

@tsalo
Copy link
Member

tsalo commented Jul 6, 2023

Thank you @adcohen2. That's really helpful. It looks like that the hanging problem stemmed from a bug in the motion correction, rather than the datatype conversion step you and @Luoyu-Wang were seeing. I will try to reproduce the problem locally today.

@tsalo
Copy link
Member

tsalo commented Jul 6, 2023

I think I see the problem. @adcohen2, how many M0 volumes do you have in your dataset? Are they in the ASL file, rather than stored in a separate m0scan file?

@adcohen2
Copy link

adcohen2 commented Jul 6, 2023

Good news! There is one M0 volume and it is the last volume in the asl file. I have is labeled in the aslcontext.txt file.

@tsalo
Copy link
Member

tsalo commented Jul 6, 2023

Ah, that makes sense. The issue is that the MCFLIRT interface is returning a string for the single M0 volume that is being motion corrected, rather than a list as expected. This would only arise when there's a single M0 scan inside the ASL file. I will work on a fix today.

@tsalo tsalo closed this as completed in #316 Jul 6, 2023
@tsalo
Copy link
Member

tsalo commented Jul 6, 2023

@adcohen2 I think I've fixed the problem. It will probably take about an hour and a half for the updated version to deploy to DockerHub. Once that happens, would you mind pulling the new unstable version and trying it out on your data?

@lizihuang1
Copy link

Hi, thank you for updating the code! I encountered the same "stuck forever in dtype change" problem. And I tried the unstable docker version. Good news is that it no longer got stuck on date type change. Bad news is it got stuck later on some "apply_mask" step (or step after that). After I split sub-01_asl.nii.gz into sub-01_asl.nii.gz and sub-01_m0scan.nii.gz, everything went fine. I think this can be a temporary solution before a new version is available. :)

@lucsanara
Copy link

Hello! we are having the same problem with version 0.4.0
Error setting node input:
Node: combine_motpars
input: m0scan_mat_files
We also have m0 image as part of asl file

@tsalo
Copy link
Member

tsalo commented Aug 3, 2023

@lucsanara please use unstable instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants