Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Continuum Power CCA #196

Merged
merged 31 commits into from
Sep 1, 2024
Merged

feat: add Continuum Power CCA #196

merged 31 commits into from
Sep 1, 2024

Conversation

nicrie
Copy link
Contributor

@nicrie nicrie commented Aug 25, 2024

This PR introduces Continuous Power CCA (CPCCA) model to the API (close #178 ). Along with this, significant refactoring of the _CrossBaseModel class was required, leading to the addition of several new features.

Key Changes

  1. New Features:

    • Added the ContinuousPowerCCA class and its Hilbert version ComplexCPCCA.
    • Enhanced MCA classes with new functionalities related to correlation coefficients, explained variance and a predict method for forecasting Y from X (see here.
    • Introduced homogeneous_patterns() and heterogeneous_patterns() methods for the ComplexMCA model.
  2. Testing:

    • Added basics tests for ContinuousPowerCCA
    • Updated some of the existing tests for MCA classes to align with changes in default parameters
  3. Miscellaneous:

    • Refactored MCA class (now a special case of ContinuousPowerCCA) along with related clases: ComplexMCA, MCARotator, ComplexMCARotator.
    • Split the original base models into three classes, separating functionalities regarding serialization and lazy model evaluation into _BaseModel, with specific model implementations now in _BaseModelSingleSet and _BaseModelCossSet.

Open To Dos

  • update and streamline docstrings in new models and methods
  • allow full lazy evaluation mode; currently there are a few hiccups in the Whitener class

Comment on lines 4 to 37
class MCA(ContinuousPowerCCA):
def __init__(
self,
n_modes: int = 2,
center: bool = True,
standardize: bool = False,
use_coslat: bool = False,
check_nans: bool = True,
n_pca_modes: Optional[int] = None,
use_pca: bool = True,
n_pca_modes: int | float = 0.999,
compute: bool = True,
sample_name: str = "sample",
feature_name: str = "feature",
solver: str = "auto",
random_state: Optional[int] = None,
solver_kwargs: Dict = {},
random_state: int | None = None,
solver_kwargs: dict = {},
**kwargs,
):
# NOTE: **kwargs is only used here to catch any additional arguments that may be passed during deserialization. During .serialize(), the MCA model stores all model parameters from the parent class ContinuousPowerCCA. During deserialization, `use_pca` and `alpha` are passed to the child class MCA where there are not present and would raise an error. To avoid this, we catch all additional arguments here and ignore them.
super().__init__(
n_modes=n_modes,
center=center,
alpha=[1.0, 1.0],
standardize=standardize,
use_coslat=use_coslat,
check_nans=check_nans,
use_pca=use_pca,
n_pca_modes=n_pca_modes,
compute=compute,
sample_name=sample_name,
feature_name=feature_name,
solver=solver,
random_state=random_state,
solver_kwargs=solver_kwargs,
**kwargs,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @slevang I’m not sure if you’ll have time to review the PR when it’s ready, but I’d really appreciate your thoughts, especially on model serialization.

The MCA class is really just an inheritance of the CPCCA class with alpha=1. When I save the MCA class, all attributes from the parent class are serialized. However, when I load it, it tries to assign the alpha parameter, which doesn’t exist in MCA. To prevent an error, I’ve been using kwargs to absorb the extra alpha parameter.

Do you think using kwargs like this is okay? Or do you have a cleaner solution in mind that avoids kwargs altogether?

@nicrie nicrie changed the title feat: add Continuous Power CCA feat: add Continuum Power CCA Aug 26, 2024
@nicrie nicrie marked this pull request as ready for review August 30, 2024 19:16
@nicrie
Copy link
Contributor Author

nicrie commented Aug 30, 2024

Hey @slevang, would you like to review this PR? It’s a bit extensive - sorry about that! No worries if you’re busy; I plan to merge it into the develop branch along with some other breaking changes before it goes live.

@slevang
Copy link
Contributor

slevang commented Aug 31, 2024

I'll try to take a look at the serialization issue in the next few days, but probably won't have time to dig into the rest of the implementation right now. Nice work though, this looks cool!

sample_name=sample_name,
feature_name=feature_name,
solver=solver,
random_state=random_state,
solver_kwargs=solver_kwargs,
)
self.attrs.update({"model": "Maximum Covariance Analysis"})
# Renove alpha from the inherited CPCCA serialization params because it is hard-coded for MCA
self._params.pop("alpha")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we deserialize we use the _params dictionary to reinstantiate the class with correct args, so I was able to fix this by removing anything that is hard coded (alpha and center in this case).

Looking again at the inheritance structure, I'm noticing we recreate the entire API on every child model class. This has the benefit of making the signature explicit and the auto built docstrings complete, but it also becomes a hassle to maintain such long argument lists. For example I also had to add verbose back to these classes to get the parameter serialization list complete.

An alternative is to only specify whatever new parameters are included on the child class and pass everything else to the parent with **kwargs, but this makes the signature less explicit. We could consider something like pydantic's BaseModel which I believe helps with this. Just something for future consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I agree, it’s a bit cumbersome, but for a clear documentation it's probably (still) worth it. I'll check out the package you mentioned. If we could have both - less code and clear documentation - that would be awesome!

@nicrie nicrie merged commit de9eaa7 into develop Sep 1, 2024
6 checks passed
@nicrie nicrie deleted the base-multi-set branch September 1, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants