[ENH] Use different masking thresholds for denoising and classification #736

tsalo · 2021-06-17T15:29:10Z

Closes #685 and closes #680.

NOTE: While the proposed changes should affect the coverage of the T2*, combined, and denoised data, they should not impact component classification or values within the original coverage of any map.

Changes proposed in this pull request:

Create two sets of masks and adaptive masks:
- [mask|masksum]_denoise: Threshold of 1, used for denoising and optimal combination.
- [mask|masksum]_clf: Threshold of 3, used for decomposition and component selection.
Write out the full T2*/S0 maps as the primary maps (T2starmap and S0map), while the limited maps are now only written out on verbose runs and are named desc-limited_[T2starmap|S0map].

masksum_denoise: Threshold of 1, used for denoising and optimal combination. masksum_clf: Threshold of 3, used for decomposition and component selection.

tedana/workflows/tedana.py

codecov · 2021-06-17T15:50:35Z

Codecov Report

Merging #736 (19f9686) into main (7e1cf2c) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #736      +/-   ##
==========================================
+ Coverage   93.19%   93.20%   +0.01%     
==========================================
  Files          27       27              
  Lines        2204     2209       +5     
==========================================
+ Hits         2054     2059       +5     
  Misses        150      150

Impacted Files	Coverage Δ
tedana/workflows/t2smap.py	`96.29% <100.00%> (ø)`
tedana/workflows/tedana.py	`89.89% <100.00%> (+0.18%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e1cf2c...19f9686. Read the comment docs.

tedana/workflows/tedana.py

handwerkerd

This looks good. I still plan to run this on several data sets to visually make sure everyone looks good. I hope to finish that up today.

One suggested change is to alter variable names with something like:

mask_denoise, masksum_denoise -> mask_1echo, masksum_1echo
mask_clf, masksum_clf -> mask_3echo, masksum_3echo

My sense is that this will help people keep track of the masks better. That is, one mask is for all voxels with 1 good echo and the other is for voxels with at lest 3 good echoes. Naming them by their purpose (denoise and clf) seems like it will cause confusion, particularly when we do things like classify with _clf and display the classifications with _denoise. This is NOT a strong opinion on my part.

My other general suggestion regards documentation and comments. With these two masks floating around, it's not always clear how various functions are handling masks with only one good echo. For example, to understand how a mask with only one good echo is treated in decay.fit_decay you really need to dig into the code. I'd suggest expanding some of the explanation for fit_decay and maybe adding more general documentation on how why we have these two masks bouncing around the code. This isn't critical for this PR so, if you want to merge this PR and open and new issue to improve this documentation, I'd be fine with that.

handwerkerd · 2021-07-12T21:22:06Z

@tsalo I'm seeing some issues when running this PR. The clearest issue is the program crashes when I run it with --fittype curvefit What's weird is that it does not crash when the T2* map is calculated, but much later. I'm still trying to trace the issue. T2starmap.nii.gz and desc-full_T2starmap.nii.gz both look reasonable. Here's the output fragment with the error:

Performing ICA component selection with Kundu decision tree v2.5
NumExpr defaulting to 8 threads.
Traceback (most recent call last):
  File "/opt/anaconda3/bin/tedana", line 33, in <module>
    sys.exit(load_entry_point('tedana', 'console_scripts', 'tedana')())
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/workflows/tedana.py", line 805, in _main
    tedana_workflow(**kwargs)
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/workflows/tedana.py", line 586, in tedana_workflow
    comptable, metric_metadata = selection.kundu_selection_v2(
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/selection/tedica.py", line 250, in kundu_selection_v2
    kappa_elbow = np.min((getelbow(kappas_nonsig, return_val=True),
  File "/Users/handwerkerd/code/tedana_community/master/tedana/tedana/selection/_utils.py", line 85, in getelbow
    p = coords - coords[:, 0].reshape(2, 1)
IndexError: index 0 is out of bounds for axis 1 with size 0

I've found one additional error that might be related. desc-adaptiveGoodSignal_mask.nii.gz cannot be read by AFNI with or without the curvefit option. Other files can be opened by AFNI.

I'll update if I localize this more, but wanted to share as is.

I'm running this locally using the 5 echo testing dataset with commands like:

tedana -d p05.SBJ01_S09_Task11_e1.in.nii.gz \
          p05.SBJ01_S09_Task11_e2.in.nii.gz \
          p05.SBJ01_S09_Task11_e3.in.nii.gz \
          p05.SBJ01_S09_Task11_e4.in.nii.gz \
          p05.SBJ01_S09_Task11_e5.in.nii.gz \
 -e 15.4 29.7 44.0 58.3 72.6 \
--out-dir tedana_out/PR736BIDS \
--verbose

tsalo · 2021-07-12T22:02:09Z

I don't think the p05.SBJ01_S09_Task11_e*.in.nii.gz data are up on the OSF. Can you share a link?

EDIT: I can replicate this on other data, though, and the reason is that there are no non-significant Kappa values, so it's trying to get an elbow out of an empty array. I think that's a general bug rather than something introduced in this PR.

EDIT 2: Indeed, this is the same general error as in #181.

tsalo · 2021-07-13T16:20:26Z

@handwerkerd I've opened #752 about the getelbow issue. I'm not sure how the adaptive mask managed to end up messed up. Would you mind sending me the file AFNI couldn't open?

Regarding the other big issue (documentation and variable naming), I will start working on that. To start, here is something I'd like to add to the text report, right after the adaptive masks are created:

A two-stage masking procedure was applied, in which a liberal mask (including voxels with good data in at least the first echo) was used for optimal combination, T2*/S0 estimation, and denoising, while a more conservative mask (restricted to voxels with good data in at least the first three echoes) was used for the component classification procedure.

What do you think of it? I'd like to workshop it before merging.

Now that the full maps are used throughout the pipeline, they are the primary T2*/S0 outputs. The limited maps are not used for anything, and are only written out if the workflow is run in verbose mode.

handwerkerd · 2021-07-13T16:58:35Z

The good/bad news is that both of the issues I list above: crashing with the curvefit option and the adaptive mask not playing well with AFNI also occur on the main branch. I need to get a few other things done today, but will try to share more info soon.

handwerkerd · 2021-07-15T21:35:54Z

I think I figured out the issue with the unreadable desc-adaptiveGoodSignal_mask.nii.gz
masksum_denoise is int64. io_generator.save_file checks the data type, but then uses the data type of the reference image (probably float) when calling new_nii_like. When I change masksum_denoise to float, the saved file is then readable. Assuming I'm right, this should probably be fixed in a distinct PR. Converting the mask to float would be easiest, but checking data type in save_img would be more elegant.

jbteves · 2021-07-15T21:48:19Z

It must be int64 because we read the data initially as float32 data, and convert it to int, and then it wants something with large enough capacity to fit a 32-bit float that's been converted to an integer. We may be able to get around it by specifying a datatype when we do the mask, since it should make a new matrix anyway. At least, if I'm recalling the code correctly, that's what happens and why we may be seeing this behavior.

jbteves · 2021-07-15T21:51:19Z

I would agree with coercing data to always saving as float32 since most programs will read it in that way anyway. WDYT @tsalo, @emdupre ? This arguably should go in its own issue, but we also have to make a decision for this PR in particular.

jbteves · 2021-07-19T17:54:05Z

@tsalo with #757 closed, I'm happy to re-review once the merge conflicts are fixed.

jbteves

This PR I think looks good. I agree with @handwerkerd that the documentation on what's going on is probably not enough to make things to clear in the future. However, I don't think that this is the fault of the PR, I just think that the PR made it clear how confusing the whole process is instead of exacerbating it by much (since previously we had one mask doing two tasks instead, which doesn't particularly seem clear either). I lean towards approving and merging this, and then deciding how to tackle the documentation differently since that problem is fairly pervasive throughout the package.

handwerkerd · 2021-07-21T19:59:49Z

I've been looking at the outputs from a test data set. The ICA results are different. We are using different voxels, but since the ICA is only done on voxels with 3 good echoes in both main & this PR, the voxels that end up in the ICA step shouldn't be THAT different. I can't compare metrics component-by-component, but there's a non-trivial difference between % variance explained in the accepted components:
Main:

PR736:

The big Q in my mind is how do results differ? I took the denoised time series from main & PR736 & ran 3dTcorrelate (with & without detrending) Here are the histograms

TCorrelation map (reds are near 1 & yellows are closer to 0.8)

You can see that correlations are almost all greater than r=0.8, but a non-trivial number are in that 0.8-0.85 range. These would normally be great values, but I’d expect things to be nearly identical. You can see from the images that the r=0.8 (yellow) aren’t simply edge artifacts. My best guess is a couple of components weren't formed in a way to get their content classified in the same way. The question is whether this just points to the brittleness of the method (which we all are working to improve!) or a unique issue created by this PR that should hold up acceptance. I’m leaning towards taking one more look at the code to make sure nothing jumps out & then approving, but figured I’d post & get feedback.
These data are from the same dataset as the 5 echo testing sample, but a different, full run.

tsalo · 2021-07-21T20:36:46Z

I think that difference comes from the change in the T2* map. I tested by running on one of the runs from Zaki's preprocessed Cambridge dataset with (1) the main branch, (2) the split-adaptive-mask branch, and (3) the split-adaptive-mask branch, using the T2* map from the run on the main branch. The main branch and split-adaptive-mask branch results differ, but the main branch and split-adaptive-mask branch + main branch's T2* map results appear almost identical.

With main branch

tedana -d \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-1_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-2_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-3_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-4_space-scanner_desc-partialPreproc_bold.nii.gz \
    -e 12 28 44 60 --tedpca mdl \
    --debug \
    --out-dir /Users/taylor/Documents/datasets/cambridge/tedana-main/

With split-adaptive-mask branch

tedana -d \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-1_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-2_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-3_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-4_space-scanner_desc-partialPreproc_bold.nii.gz \
    -e 12 28 44 60 --tedpca mdl \
    --debug \
    --out-dir /Users/taylor/Documents/datasets/cambridge/tedana-sam/

With split-adaptive-mask branch, using t2smap from main branch

# with split-adaptive-mask branch
tedana -d \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-1_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-2_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-3_space-scanner_desc-partialPreproc_bold.nii.gz \
    /Users/taylor/Documents/datasets/cambridge/sub-04570/func/sub-04570_task-rest_echo-4_space-scanner_desc-partialPreproc_bold.nii.gz \
    -e 12 28 44 60 --tedpca mdl \
    --debug \
    --t2smap /Users/taylor/Documents/datasets/cambridge/tedana-main/T2starmap.nii.gz \
    --out-dir /Users/taylor/Documents/datasets/cambridge/tedana-sam-t2s/

EDIT: Also, looking at the correlations between the ICA components, my interpretation seems to bear out. The correlations between components from main and split-adaptive-mask are pretty randomly distributed, while the correlations between components from main and split-adaptive-mask with the T2* map from the main run are all >0.999.

handwerkerd · 2021-07-23T20:58:09Z

Sorry for the delay on this, but I'm still trying to wrap my head around things and I do think there's an issue. The ICA maps shouldn't be similar. They should be identical. The big change here is that there are voxels in the optimally combined data with <3 good echoes that would otherwise have been excluded. When we run the PCA (line 563 of tedana.py) we use mask_clf which means all those added voxels are not part of the PCA. I've checked and the PCA output, as viewable in desc-optcomPCAReduced_bold is identical. That output is then given to decomposition.tedica (line 578 of tedana.py). The outputted mixing matrices are different despite (I think) using identical input data and the default fixed seed.

I'm a bit stuck trying to figure out why this is happening and if I'm making a silly mistake. Can others confirm that desc-optcomPCAReduced_bold is identical between main & this PR but desc-ICA_mixing.tsv is not? If others are seeing this as well, I'll try to figure out why.

[Update: desc-optcomPCAReduced_bold files are not always identical, so now I'm trying to better understand why. Unless I'm missing something the same data will be the input to both]

tsalo · 2021-07-29T16:49:25Z

The ICA maps shouldn't be similar. They should be identical.

That's true, but I think the differences just come from changes in resolution due to loading a file instead of working directly from an array. Just to make sure, I ran six versions with (1) the most current version of main, (2) split-adaptive-masks after updating from main, and (3) split-adaptive-masks minus T2*-related changes and after updating from main (see #768):

main, no pre-set T2* map (main)
main, with T2* map from main run (main+main)
split-adaptive-masks, with no pre-set T2* map (sam)
split-adaptive-masks, with T2* map from main run (sam+main)
test-sam-2, with no pre-set T2* map (sam2)
test-sam-2, with T2* map from main run (sam2+main)

main vs. main+main: Pretty different. The difference is more pronounced than it was last time, probably because of #759.

main vs. sam: Different.

main+main vs. sam+main vs. sam2+main: No visual differences, at all. I think we can infer from this that the changes in the T2* map are driving things.

main vs. sam2: No visual differences.

sam vs. sam2: Different, like main vs. sam.

handwerkerd

With @tsalo last check and I gave this one more read-through, I think this looks good. Any issues I have with this aren't due to this PR.
It looks like there's now a CI failure, but hopefully running it again will fix that.

tsalo · 2021-08-06T01:56:30Z

I restarted the failing workflow and it looks like it fixed itself.

tsalo · 2021-08-12T15:29:50Z

Looks like we have two approvals, which I didn't realize. Does anyone have any issues with me merging this?

Use two different adaptive masks.

3bded74

masksum_denoise: Threshold of 1, used for denoising and optimal combination. masksum_clf: Threshold of 3, used for decomposition and component selection.

tsalo commented Jun 17, 2021

View reviewed changes

tedana/workflows/tedana.py Outdated Show resolved Hide resolved

tsalo commented Jun 17, 2021

View reviewed changes

tedana/workflows/tedana.py Outdated Show resolved Hide resolved

Fix long line.

a6d885d

tsalo added breaking change WIll make a non-trivial change to outputs output-change labels Jun 17, 2021

tsalo requested review from jbteves and handwerkerd June 17, 2021 16:22

tsalo commented Jun 17, 2021

View reviewed changes

tedana/workflows/tedana.py Show resolved Hide resolved

tsalo added 2 commits June 17, 2021 16:57

Merge branch 'main' into split-adaptive-masks

a1bcff9

Derive classifier masks directly from denoising ones.

1a54bb7

tsalo removed the output-change label Jun 21, 2021

Merge branch 'main' into split-adaptive-masks

ad9f99c

handwerkerd requested changes Jul 12, 2021

View reviewed changes

Use denoising mask for component figures.

8317ff1

tsalo mentioned this pull request Jul 13, 2021

getelbow raises uninformative exception when supplied with an empty array #752

Closed

tsalo added 2 commits July 13, 2021 12:28

Swap full and limited T2*/S0 maps.

7925a61

Now that the full maps are used throughout the pipeline, they are the primary T2*/S0 outputs. The limited maps are not used for anything, and are only written out if the workflow is run in verbose mode.

Apply the same logic to t2smap.

def354c

tsalo added 2 commits July 13, 2021 12:59

Fix tests.

f34c5bb

Fix documentation about full vs. limited maps.

953abeb

tsalo added 2 commits July 16, 2021 10:06

Merge branch 'main' into split-adaptive-masks

2014f02

Fix bad merge.

237cf1f

Log updated masking procedure.

95db93d

tsalo mentioned this pull request Jul 16, 2021

Integer-based output images are unreadable with AFNI #757

Closed

tsalo added 2 commits July 19, 2021 13:57

Merge branch 'main' into split-adaptive-masks

331df31

Fix!

05a1cd2

jbteves approved these changes Jul 20, 2021

View reviewed changes

Merge branch 'ME-ICA:main' into split-adaptive-masks

80d5f35

Merge branch 'ME-ICA:main' into split-adaptive-masks

19f9686

handwerkerd approved these changes Aug 4, 2021

View reviewed changes

tsalo merged commit 3147e01 into ME-ICA:main Aug 13, 2021

tsalo deleted the split-adaptive-masks branch August 13, 2021 15:06

handwerkerd mentioned this pull request Sep 22, 2021

[FIX] Address divide by 0 issue in decay.py #807

Merged

tsalo mentioned this pull request Oct 18, 2021

tedana's adaptive mask removes white matter and CSF necessary for nuisance regressions NBCLab/power-replication#14

Open

tsalo mentioned this pull request Nov 16, 2021

[ENH] Remove accepted and rejected outputs #822

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Use different masking thresholds for denoising and classification #736

[ENH] Use different masking thresholds for denoising and classification #736

tsalo commented Jun 17, 2021 •

edited

Loading

codecov bot commented Jun 17, 2021 •

edited

Loading

handwerkerd left a comment

handwerkerd commented Jul 12, 2021

tsalo commented Jul 12, 2021 •

edited

Loading

tsalo commented Jul 13, 2021

handwerkerd commented Jul 13, 2021

handwerkerd commented Jul 15, 2021

jbteves commented Jul 15, 2021

jbteves commented Jul 15, 2021

jbteves commented Jul 19, 2021

jbteves left a comment

handwerkerd commented Jul 21, 2021

tsalo commented Jul 21, 2021 •

edited

Loading

handwerkerd commented Jul 23, 2021 •

edited

Loading

tsalo commented Jul 29, 2021

handwerkerd left a comment

tsalo commented Aug 6, 2021

tsalo commented Aug 12, 2021

[ENH] Use different masking thresholds for denoising and classification #736

[ENH] Use different masking thresholds for denoising and classification #736

Conversation

tsalo commented Jun 17, 2021 • edited Loading

codecov bot commented Jun 17, 2021 • edited Loading

Codecov Report

handwerkerd left a comment

Choose a reason for hiding this comment

handwerkerd commented Jul 12, 2021

tsalo commented Jul 12, 2021 • edited Loading

tsalo commented Jul 13, 2021

handwerkerd commented Jul 13, 2021

handwerkerd commented Jul 15, 2021

jbteves commented Jul 15, 2021

jbteves commented Jul 15, 2021

jbteves commented Jul 19, 2021

jbteves left a comment

Choose a reason for hiding this comment

handwerkerd commented Jul 21, 2021

tsalo commented Jul 21, 2021 • edited Loading

With main branch

With split-adaptive-mask branch

With split-adaptive-mask branch, using t2smap from main branch

handwerkerd commented Jul 23, 2021 • edited Loading

tsalo commented Jul 29, 2021

handwerkerd left a comment

Choose a reason for hiding this comment

tsalo commented Aug 6, 2021

tsalo commented Aug 12, 2021

tsalo commented Jun 17, 2021 •

edited

Loading

codecov bot commented Jun 17, 2021 •

edited

Loading

tsalo commented Jul 12, 2021 •

edited

Loading

tsalo commented Jul 21, 2021 •

edited

Loading

handwerkerd commented Jul 23, 2021 •

edited

Loading