-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Continuum Power CCA #196
Conversation
xeofs/models/mca.py
Outdated
class MCA(ContinuousPowerCCA): | ||
def __init__( | ||
self, | ||
n_modes: int = 2, | ||
center: bool = True, | ||
standardize: bool = False, | ||
use_coslat: bool = False, | ||
check_nans: bool = True, | ||
n_pca_modes: Optional[int] = None, | ||
use_pca: bool = True, | ||
n_pca_modes: int | float = 0.999, | ||
compute: bool = True, | ||
sample_name: str = "sample", | ||
feature_name: str = "feature", | ||
solver: str = "auto", | ||
random_state: Optional[int] = None, | ||
solver_kwargs: Dict = {}, | ||
random_state: int | None = None, | ||
solver_kwargs: dict = {}, | ||
**kwargs, | ||
): | ||
# NOTE: **kwargs is only used here to catch any additional arguments that may be passed during deserialization. During .serialize(), the MCA model stores all model parameters from the parent class ContinuousPowerCCA. During deserialization, `use_pca` and `alpha` are passed to the child class MCA where there are not present and would raise an error. To avoid this, we catch all additional arguments here and ignore them. | ||
super().__init__( | ||
n_modes=n_modes, | ||
center=center, | ||
alpha=[1.0, 1.0], | ||
standardize=standardize, | ||
use_coslat=use_coslat, | ||
check_nans=check_nans, | ||
use_pca=use_pca, | ||
n_pca_modes=n_pca_modes, | ||
compute=compute, | ||
sample_name=sample_name, | ||
feature_name=feature_name, | ||
solver=solver, | ||
random_state=random_state, | ||
solver_kwargs=solver_kwargs, | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @slevang I’m not sure if you’ll have time to review the PR when it’s ready, but I’d really appreciate your thoughts, especially on model serialization.
The MCA
class is really just an inheritance of the CPCCA
class with alpha=1
. When I save the MCA
class, all attributes from the parent class are serialized. However, when I load it, it tries to assign the alpha
parameter, which doesn’t exist in MCA
. To prevent an error, I’ve been using kwargs
to absorb the extra alpha
parameter.
Do you think using kwargs
like this is okay? Or do you have a cleaner solution in mind that avoids kwargs
altogether?
Hey @slevang, would you like to review this PR? It’s a bit extensive - sorry about that! No worries if you’re busy; I plan to merge it into the develop branch along with some other breaking changes before it goes live. |
I'll try to take a look at the serialization issue in the next few days, but probably won't have time to dig into the rest of the implementation right now. Nice work though, this looks cool! |
sample_name=sample_name, | ||
feature_name=feature_name, | ||
solver=solver, | ||
random_state=random_state, | ||
solver_kwargs=solver_kwargs, | ||
) | ||
self.attrs.update({"model": "Maximum Covariance Analysis"}) | ||
# Renove alpha from the inherited CPCCA serialization params because it is hard-coded for MCA | ||
self._params.pop("alpha") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we deserialize we use the _params
dictionary to reinstantiate the class with correct args, so I was able to fix this by removing anything that is hard coded (alpha
and center
in this case).
Looking again at the inheritance structure, I'm noticing we recreate the entire API on every child model class. This has the benefit of making the signature explicit and the auto built docstrings complete, but it also becomes a hassle to maintain such long argument lists. For example I also had to add verbose
back to these classes to get the parameter serialization list complete.
An alternative is to only specify whatever new parameters are included on the child class and pass everything else to the parent with **kwargs
, but this makes the signature less explicit. We could consider something like pydantic's BaseModel which I believe helps with this. Just something for future consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I agree, it’s a bit cumbersome, but for a clear documentation it's probably (still) worth it. I'll check out the package you mentioned. If we could have both - less code and clear documentation - that would be awesome!
This PR introduces Continuous Power CCA (CPCCA) model to the API (close #178 ). Along with this, significant refactoring of the
_CrossBaseModel
class was required, leading to the addition of several new features.Key Changes
New Features:
ContinuousPowerCCA
class and its Hilbert versionComplexCPCCA
.MCA
classes with new functionalities related to correlation coefficients, explained variance and apredict
method for forecasting Y from X (see here.homogeneous_patterns()
andheterogeneous_patterns()
methods for theComplexMCA
model.Testing:
ContinuousPowerCCA
MCA
classes to align with changes in default parametersMiscellaneous:
MCA
class (now a special case ofContinuousPowerCCA
) along with related clases:ComplexMCA
,MCARotator
,ComplexMCARotator
._BaseModel
, with specific model implementations now in_BaseModelSingleSet
and_BaseModelCossSet
.Open To Dos
Whitener
class