get_coeffs run on mean-centered data for no reason #179

tsalo · 2019-01-09T15:14:15Z

The function get_coeffs performs a least-squares fit of parameter X (generally a mixing matrix) on parameter data (generally optcom or other data). It is used in the following locations:

writeresults: data is not demeaned. add_const is not provided, so it defaults to False.
split_ts: X is ICA mixing matrix and data is demeaned data. add_const is not provided, so it defaults to False.
write_split_ts: X is ICA mixing matrix and data is demeaned data. add_const is not provided, so it defaults to False.
fitmodels_direct: X is ICA or PCA mixing matrix and data is demeaned optimally combined data. add_const is not provided, so it defaults to False.
fitmodels_direct (again): X is ICA or PCA mixing matrix and data is concatenated data (catd; not demeaned, afaik). add_const is not provided, so it defaults to False.
computefeats2: X is ICA or PCA mixing matrix and data is demeaned. add_const is not provided, so it defaults to False.

As far as I can tell, add_const isn't used anywhere in the package. That said, I also realized that demeaning data along axis 1 (which is how it is done when it is done) does not affect the results, so I'm not sure why we do it.

A really weird behavior that I don't understand at all is, when Y is not mean-centered, X gives the correct beta (with no need for an intercept) if X is mean-centered, but not if it's not. Does anyone know why that is?

Still, regardless of why that's the case, I think that we just need to mean-center X in get_coeffs, and we can stop mean-centering data and can also drop the add_const argument (should be unnecessary as long as X is mean-centered and it's never used anyway). This should not affect the results, but will make the code much easier to understand. How does that sound?

The text was updated successfully, but these errors were encountered:

tsalo · 2019-02-19T15:05:56Z

Just to follow this up, the demeaning of data only has no impact when X (generally mmix) is also demeaned. I noticed that there is a difference for the PCA metric calculation because the PCA mixing matrix is not mean-centered. We should do that inside get_coeffs, because I can't imagine a situation where we would want to run this on data without a constant or mean-centered IVs. The parameter weights wouldn't be useful in that case.

jbteves · 2019-05-23T19:55:22Z

I don't see any reason why the above couldn't be done. Can you point me to the blob where'd like someone to see why de-meaning messes up the beta values?

CesarCaballeroGaudes · 2019-11-15T22:30:35Z

I have been working on the get_coeffs and computefeats2 functions (i.e. stats.py), which were basically performing a ordinary least squares (OLS) estimation and computation of Z-values after Fisher transformation assuming that OLS estimates were correlation coefficients, which is not exactly the same. Two related points:

I have changed the function to compute z-values, based on conversion from t-statistics, based on the following code The computations basically follow those indicated in any linear regression textbook, i.e. t-value = beta / std(beta), for instance see section Estimation in Ordinary Least Squares Wikipedia
Regardless the issue of mean centering design matrix, my opinion is that get_coeffs should only do OLS estimation regardless of the design (i.e. mixing) matrix and enable the option of adding a constant regressor (i.e. intercept). If the design matrix must have demeaned regressors, it should be done outside get_coeffs. Similarly, any data normalization must be done outside the function.

In such manner, get_coeffs would be a simple OLS estimation and computation of Z-values.

And these changes will also affect to the following pull request #458 (comment)

emdupre · 2019-11-24T19:24:50Z

Thanks @CesarCaballeroGaudes -- does this mean that you'll be adding these changes into that PR ? It'd be great if we could keep the PRs focused, if at all possible. So, please keep us up to date with what you're intending to change ! I think that can be separate from the improvements to the documentation (which are also very much needed !).

stale · 2020-02-22T20:16:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

tsalo added the question issues detailing questions about the project or its direction label Jan 9, 2019

emdupre added this to the transparent and reproducible processing milestone Jan 14, 2019

tsalo mentioned this issue Feb 19, 2019

Concerns regarding TE-(in)dependence metric calculation #223

Closed

tsalo mentioned this issue Mar 11, 2019

[FIX] Normalize PCA mixing matrix over time, not component #228

Merged

tsalo mentioned this issue Jul 17, 2019

[REF] Mean-center design matrix within getcoeffs #365

Closed

tsalo added the TE-dependence issues related to TE dependence metrics and component selection label Oct 4, 2019

tsalo mentioned this issue Nov 10, 2019

[REF] Mean-center design matrix within getcoeffs #443

Closed

stale bot added the stale label Feb 22, 2020

stale bot closed this as completed Feb 29, 2020

handwerkerd mentioned this issue Mar 13, 2020

Topics for March 2020 Developers’ call: Pandemic Edition #550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_coeffs run on mean-centered data for no reason #179

get_coeffs run on mean-centered data for no reason #179

tsalo commented Jan 9, 2019

tsalo commented Feb 19, 2019

jbteves commented May 23, 2019

CesarCaballeroGaudes commented Nov 15, 2019 •

edited

Loading

emdupre commented Nov 24, 2019

stale bot commented Feb 22, 2020

get_coeffs run on mean-centered data for no reason #179

get_coeffs run on mean-centered data for no reason #179

Comments

tsalo commented Jan 9, 2019

tsalo commented Feb 19, 2019

jbteves commented May 23, 2019

CesarCaballeroGaudes commented Nov 15, 2019 • edited Loading

emdupre commented Nov 24, 2019

stale bot commented Feb 22, 2020

CesarCaballeroGaudes commented Nov 15, 2019 •

edited

Loading