Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA is applying an inappropriate transformation #654

Closed
notZaki opened this issue Jan 18, 2021 · 6 comments
Closed

PCA is applying an inappropriate transformation #654

notZaki opened this issue Jan 18, 2021 · 6 comments
Labels
bug issues describing a bug or error found in the project

Comments

@notZaki
Copy link
Contributor

notZaki commented Jan 18, 2021

Summary

The PCA implementation in sklearn performs normalization along the voxel dimension and this isn't an appropriate strategy for fMRI data. We should switch to a different PCA implementation or transpose the data before PCA.

Additional Detail

The PCA implementation in sklearn centers the input data along the first dimension before decomposition. For tedana, the data is an NxT matrix and the appropriate normalization should be along the second dimension (#636).

The simplest change would be to define our own PCA function. This actually already exists in decomposition.ma_pca._icatb_svd.

Alternatively, we could transpose the data so that its dimensions are TxN. Then we can continue using sklearn, however this will require heavier refactoring because the left/right sides of the decomposition will be swapped.

This normalization could be one of the reasons that the ICA decomposition is inconsistent (#629). Possible explanation is that the PCA step normalizes across voxels and this distorts the time-series which makes ICA's life difficult.
l did a quick test on the three echo test data, and the variability across python versions, CPU, and single/multi-threading vanished by swapping sklearn's PCA with my own definition. [log]

@tsalo
Copy link
Member

tsalo commented Jan 19, 2021

Would transposing impact the decomposition? I know that the number of components is limited based on both T and N, but I assume that one way would be a spatial decomposition vs. a temporal one...

@tsalo tsalo added the bug issues describing a bug or error found in the project label Jan 19, 2021
@notZaki
Copy link
Contributor Author

notZaki commented Jan 19, 2021

In theory, transposing shouldn't change PCA results. The most significant difference would be a swap between the U<->Vt matrices. However, since sklearn internally normalizes the data along axis = 0, that part will change the results. I'm not sure if this change can be done by re-normalizing the PCA output. My guess is no, but I'll check on some test data.

For ICA, I'm not sure. Could just try it on dummy data and check if there is a difference. The current ICA in tedana looks a bit strange to me because it doesn't using ICA.fit_transform() to extract the sources, rather it uses data from the mixing matrix. I was expecting the former to be used, but I haven't looked at the selection/metrics code yet.

How did the original meica code store the data? Was it NxT or TxN? (N = num voxels, T = number of TRs)

@eurunuela
Copy link
Collaborator

I'm not sure I'm following you @notZaki . What PCA are we talking about? maPCA?

Also, as you mention, transposing the data before the PCA would only swap U and Vt. I'm not familiar with sklearn's internal normalization.

@notZaki
Copy link
Contributor Author

notZaki commented Jan 19, 2021

@eurunuela I am focusing on sklearn's implementation of PCA and ICA. This is partially relevant to maPCA because it uses PCA at the very end, once the number of components is estimated.

@eurunuela
Copy link
Collaborator

Gotcha!

We could simply transpose the input matrix and the resulting eigenvalues, then swap U and Vt.

@notZaki
Copy link
Contributor Author

notZaki commented Jan 27, 2021

I am going to close this issue since the underlying issue has to do with normalization and there is discussion on that in #653 and #655. Resolving those issues will resolve this issue as well.

@notZaki notZaki closed this as completed Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issues describing a bug or error found in the project
Projects
None yet
Development

No branches or pull requests

3 participants