-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Normalize PCA mixing matrix over time, not component #228
Conversation
Codecov Report
@@ Coverage Diff @@
## master #228 +/- ##
==========================================
+ Coverage 47.83% 47.86% +0.02%
==========================================
Files 33 33
Lines 2013 2012 -1
==========================================
Hits 963 963
+ Misses 1050 1049 -1
Continue to review full report at Codecov.
|
To the best of my understanding, the purpose of normalization was so that each components would be on the same scale when passed on to the PCA decision tree (and therefore could use e.g. the same thresholds). If we normalize by time, that assumption no longer holds. Could you explain a little more why you were thinking it would make sense to normalize over time ? |
The old way was z-scoring each timepoint across components, while this fix z-scores each component across timepoints. The former approach doesn't just rescale the time series, but also changes them. Granted, the difference is small (the correlation for a random array with 80 components and 160 timepoints before vs. after z-scoring is ~0.99), but it should also not be a valid change to make, as far as I can tell. |
Ah, sorry, I think I misunderstood ! So to clarify (for me): What is the correlation between the original components and this new normalization ? The 0.99 is with the old normalization, correct ? I think we should be checking the PCA selection tree in the integration tests. This seems like as good a time as any, since I want to know how this is impacting the PCA selection. WDYT ? Should we add it to the three echo dataset ? |
The time series correlate perfectly after z-scoring the new way. Yeah, the old way gets 0.999. That's a good idea. I can change the three-echo integration test in this PR. |
If this is correlating at 1.0 then I'm wondering if it's really a necessary normalization -- obviously removing the old one seems to be ! If we can fix the merge conflict (sorry, I think I pulled it in with #208 ) then this LGTM ! |
It's used to generate the normalized version of the mixing matrix, which is used in |
I think this is good to merge. It sounds like we need to do some issue clean-up around metric calculation -- we can deal with that after this is in :) |
References #223. One of the concerns I brought up in #223 is that the normalized PCA mixing matrix, which is only used to calculate the weight maps (
WTS
) withinfitmodels_direct
, is normalized over component, rather than over time. This strikes me as invalid, though I could be misinterpreting the purpose of the normalization. This will not impact the MLE dimensionality estimation, but should improve the validity of the Kundu PCA decision tree.Changes proposed in this pull request: