MLE estimation and convergence #200

dowdlelt · 2019-01-24T15:33:08Z

There have been a number of discussions regarding PCA/ICA selection and convergence and this issue is an attempt to bring these things together in one place. In testing the current implementation of tedana, using the 'mle' default, I am not seeing any dimensionality reduction. That is, the PCA step with MLE selects the maximum number of components: len(timeseries)-1 in all 6 data sets (3 different tasks) I have tested thus far (more testing to come @jbteves may have more to report as well).

This is an improvement over the problem of too few components causing convergence or other issues that @handwerkerd raises in #101, My original concern was that this was odd behavior - as I thought dimensionality reduction was required, but apparently whitening the data prior to ICA is the only 'requirement', per @emdupre's link in #101 and comments on gitter. Thankfully, the vast majority of these components explain very low variance and are then ignored - though I haven't examined them closely yet, looking forward to #86.

My new concern is that this contributes to ICA failing to converge. Currently, when tedana doesn't converge it continues onward, producing output. In #181 and #162 @tsalo is suggesting that this is probably ok and maybe even a good thing - but its not clear to me if this is something that needs to be fixed, or is acceptable, long-term behavior which touches on the points raised in #153.

This concern is unique from #144, in which convergence fails from a lack of preprocessing.

So the big questions I am left with are

Are others seeing the same PC selection behavior with 'mle': the max number of components?
Is data that comes from a convergence failure usable?
Should we include variance based thresholding, as I believe that was what @handwerkerd was testing at one point in Should PCA component selection leverage multi-echo information? #101? I believe it can be done by setting n_components to a value between 0 and 1, per https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Thoughts?

tsalo · 2019-01-24T16:37:42Z

Yes, at least with our test data. Although, if you look at preliminary results from the reliability analysis, you'll see that the number definitely varies by subject. Subsequent convergence failure is also generally driven by subject, with most never having it happen and some having it happen almost every time.
That's definitely something we're planning to look at with the reliability analysis, but as of yet I at least don't know. My intuition is that the statistics used to select ICA components are valid, but that the denoising won't be as efficient as possible. If we imagine generating random time series for the components, we can still determine whether weights for those random time series change with echo time (i.e., are BOLD, which is very unlikely) or do not (i.e., are noise, which is likely with random data). Thus, I imagine that denoising would likely ignore BOLD signal (which is fine) and would remove noise patterns that maybe don't align as well with the noise we want to remove (e.g., motion) as we would like. Of course this is all speculative, and hopefully the reliability analysis will clear things up for us.
I think including additional selection methods, including a variance threshold, is a good idea, but we shouldn't make it the default without demonstrating that it outperforms other methods. That will definitely be a part of the comparison/validation analysis.

dowdlelt · 2019-01-24T17:10:40Z

@tsalo this is great!

I hadn't seen those figures from the reliability analysis, that's a wealth of information. Happy to have created this issue just to see those.
So we are leaning heavily on the multi-echo principles, as our defense against a failure to converge causing problems. I follow the intuition, and wish I had a better understanding of PCA/ICA. Did we ever up the default convergence steps?
It just crossed my mind as I was reading back through Should PCA component selection leverage multi-echo information? #101, and seemed related.

This is absolutely highlighting the importance of the comparison/validation/reliability analyses angle. Looking forward to (hopefully) contributing to that and hearing thoughts from other folks.

tsalo · 2019-01-24T17:25:34Z

Indeed, we have increased the number of iterations to 5000 to match mdp. We also switched to using the default convergence limit, which is higher than the one we were using and thus should increase the likelihood of convergence.

dowdlelt · 2019-01-31T21:52:14Z

Perhaps I am very confused here regarding whitening, but it looks like the sklearn.decomposition.PCA default is not to whiten the data.
In the code here we do not override this:

tedana/tedana/decomposition/eigendecomp.py

Line 55 in 101ad21

ppca = PCA(n_components='mle', svd_solver='full')

However, when calling fastica, which also has a whiten argument which we do not specify anything for, which defaults to whiten=True.

tedana/tedana/decomposition/eigendecomp.py

Lines 427 to 428 in 101ad21

    
           ica = FastICA(n_components=n_components, algorithm='parallel', 
        
                         fun='logcosh', max_iter=5000, random_state=rand_state)

So is it more correct to say that we do not whiten the data at the PCA step (as whiten is not set) but rather as part of ICA? Perhaps this doesn't matter - I just noticed it when looking through the code and recalled previous conversations and wanted to seek clarity. This is all effectively my introduction to python, so it is likely I missed something.

tsalo · 2019-01-31T22:48:31Z

@emdupre and I have discussed this in the past. See this PR comment for the original discussion, but the gist is that we use PCA for dimensional reduction, but not whitening, and we do the whitening in the ICA because we do need it (and we can't turn it off). I think we can and should change our choice of words throughout the code and summary text to reflect that.

jbteves · 2019-01-31T23:11:14Z

I think there may be confusion because currently at the top of #101 here, which @dowdlelt and I have referenced in various discussions (though I'm speaking for my own confusion-- I'm not sure what his interpretation was), @emdupre mentions the data must be whitened. I interpreted that as using the PCA step to perform whitening, especially since that's an available option with many PCA approaches (including scikit-learn-- though obviously it's not used in tedana) and we state that PCA is the prerequisite for a successful ICA. However, it seems that's done in the ICA. It makes sense to me to update the following:

The tedpca docstring
The tedica docstring
Add a comment here just above the ica step within the tedica function which states that the scikit-learn will automatically (with no option to override) pre-whiten the data, potentially with a documentation link?

Could we also ask that @emdupre edit that comment to avoid further confusion? Since a lot of discussion happens and starts from the top, I feel it may set the tone for the rest of the conversation and make it clear for new readers.

I hope that this isn't viewed as redundant to your comment above, @tsalo -- I felt that it may be helpful for everybody to see where these issues are being cross-referenced and help us zero in on where people may read and become confused. This is such a critical component of operation for tedana that I think a lot of users and future contributors will read it.

dowdlelt · 2019-02-01T01:06:48Z

@tsalo I hunted all over for that comment and couldn't find it! I had this vague recollection that this had been answered, that perhaps I had seen this before and I was being redundant. I think I searched just within issues. This makes sense to me and I'm glad I was making sense of the code correctly.

tsalo · 2019-02-01T12:32:55Z

@jbteves Thank you for tracking down where we erroneously refer to the PCA step as "whitening" the data, rather than "dimensionally reducing" it. It's a good point, and I think we can fix them pretty easily. Would you mind opening a new issue with a list of those locations? It would also serve as a good first issue for new contributors, or as low-hanging fruit for any current contributors.

emdupre · 2019-02-01T14:53:23Z

Sorry if there's any confusion from that issue ! It sounds like it'd be a good idea to go through the issues and try to clean up those that have become sprawling discussions to more concretely align with our milestones. I'll try to allot some time for that, soon.

re updating documentation: I'd be happy to see updated docstrings ! PRs welcome, or I can tackle this myself this weekend 😸

One general idea, tying back into the top point: It looks like there are a few dimensions even to this single issue. Maybe we can focus the conversation here on the MLE and convergence questions, and open a new issue if we want to keep discussing the PCA / ICA relationship ? Or, if we'd like that conversation to happen in a PR review, that's fine as well !

jbteves · 2019-02-01T15:06:57Z

@tsalo opened as #209
@emdupre I'm happy to help with the issues sprawl-- I've been reading through them a lot lately trying to get up to speed.

tsalo · 2019-04-28T19:02:45Z

I think that discussions of convergence are currently spread out over several issues and conversations, and also that evaluation of MLE dimensionality estimation and ICA convergence are under the umbrella of the reliability analysis at this point. @dowdlelt are you okay with closing this?

dowdlelt · 2019-04-29T14:29:55Z

Absolutely - lets clean some things up

emdupre added the question issues detailing questions about the project or its direction label Jan 25, 2019

emdupre added this to the transparent and reproducible processing milestone Jan 25, 2019

jbteves mentioned this issue Feb 1, 2019

Update comments and docstrings in PCA/ICA to clarify where whitening is performed #209

Closed

rmarkello mentioned this issue Feb 8, 2019

Ambiguous error during "spatial clustering of components" step #181

Closed

dowdlelt closed this as completed Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLE estimation and convergence #200

MLE estimation and convergence #200

dowdlelt commented Jan 24, 2019

tsalo commented Jan 24, 2019

dowdlelt commented Jan 24, 2019

tsalo commented Jan 24, 2019

dowdlelt commented Jan 31, 2019

tsalo commented Jan 31, 2019

jbteves commented Jan 31, 2019

dowdlelt commented Feb 1, 2019

tsalo commented Feb 1, 2019

emdupre commented Feb 1, 2019

jbteves commented Feb 1, 2019

tsalo commented Apr 28, 2019

dowdlelt commented Apr 29, 2019

MLE estimation and convergence #200

MLE estimation and convergence #200

Comments

dowdlelt commented Jan 24, 2019

tsalo commented Jan 24, 2019

dowdlelt commented Jan 24, 2019

tsalo commented Jan 24, 2019

dowdlelt commented Jan 31, 2019

tsalo commented Jan 31, 2019

jbteves commented Jan 31, 2019

dowdlelt commented Feb 1, 2019

tsalo commented Feb 1, 2019

emdupre commented Feb 1, 2019

jbteves commented Feb 1, 2019

tsalo commented Apr 28, 2019

dowdlelt commented Apr 29, 2019