Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify derivatives in pysindy.py #185

Merged
merged 115 commits into from
Jul 3, 2022
Merged

Simplify derivatives in pysindy.py #185

merged 115 commits into from
Jul 3, 2022

Conversation

Jacob-Stevens-Haas
Copy link
Member

@Jacob-Stevens-Haas Jacob-Stevens-Haas commented May 24, 2022

Removes a lot of nested conditionals and ~150 SLOC. All derivatives/integrals are calculated in a single line of code inside pysindy.py. Passes all tests locally, but I'm leaving it as a draft until I can check that the notebooks all run.

As the former version of pysindy/pysindy.py said:

 # Note: This block for calculating x_dot for weak and pde libs is identical
 # to the one in score. Should we redefine the differentiation_method to cover
 # the weak and pde library cases more generally?
 # The validate_input should probably explicity handle the weak and pde cases...

This PR addresses that concern with two abstractions:

  1. Feature libraries are responsible for addressing how differentiation occurs via their calc_trajectory() method. For BaseFeatureLibrary, that simply means applying the differentiation function passed. For PDELibrary, that means ignoring the differentiation library and using FiniteDifference. For WeakPDELibrary, that means ignoring the differentiation library passed and using convert_udot_integral. GeneralizedLibrary chooses which method depending on which types of library it has, in an existing loop. These are how SINDy.fit() (and other methods) was previously working, but it allows all of the conditionals to be replaced with: self.feature_library.calc_trajectory(self.differentiation_method, x, t). I chose that name because it's called in an iterator within _process_multiple_trajectories, but happy to change it.
  2. Feature libraries are responsible for addressing how input validation occurs. For BaseFeatureLibrary, that simply means applying utils.validate_input(). For PDELibrary and WeakPDELibrary (and when applicable, GeneralizedLibrary) just return x.

It also adds two more abstractions that assist code deletion and DRY
3. _adapt_to_multiple_trajectories() makes all trajectories lists, so code handling single trajectories is removed
4. _zip_like_sequence() allows combining the many conditionals around whether t is a sequence or a scalar:

if isinstance(t, Sequence):
    [something(xi, ti) for xi, ti in zip(x, t)
else:
    [something(xi, t) for xi in x)

becomes

[something(xi, ti) for xi, ti in _zip_like_sequence(x, t)]

Only immediately obvious con of this PR is a slight performance hit in the non-PDE cases. The PDE cases previously ran validate_input() after their differentiation/integration step. Now non-PDE cases do so as well before differentiation. Using the current form of calc_trajectory() isn't compatible with this method unless BaseFeatureLibrary.calc_trajectory() would return x, x_dot instead of just x_dot. I'd love for that to be the case (see #182 ), but it was a design decision I wanted to kick back to the group in case objections.

Other issues:

  • There's still some weird conditional code in _process_multiple_trajectories() for when x_dot is explicitly given as an argument. It assumes to handle some WeakPDELibrary cases when x_dot is given in a different form than if calculated from the data. I'm assuming that test case is a mistake?
  • Tests take a long time. In many cases (esp PDE/WeakPDE cases) the test generates 100x100 matrices for data. Maybe it's possible to reduce that?
  • The feature libraries may be the right place for validate_input, but maybe there's a more elegant solution?

@akaptano
Copy link
Collaborator

@Jacob-Stevens-Haas Great job with this. This is no longer my day job but I should be able to give this a solid look this long weekend!

Alan Kaptanoglu and others added 23 commits June 5, 2022 15:40
… WeakPDELibrary is similar, but havent figured out the tweights issue to correctly integrate things. Replaced the SINDyPI example with an updated and better example. Waiting on modified KdV data from Kardy to try it out on PDE data. One other issue is generating polynomials of derivative terms. Going to talk to Zack about easiest way to implement since he does this somewhere.
…kPDELibrary, but now PDELibrary also has such a grid. So can just check for some WeakPDELibrary specific attribute like Hxt instead. There will definitely be future issues with the ensembling.
Add adapter code in new utils function adapt_to_multiple_trajectories()
Remove all internal code for handling single-trajectory data from pysindy.py,
but maintain interface in fit(), score(), predict(), and differentiate().

In cases when x and u had multiple trajectories but multiple_trajectories was,
false, tests previously expected a ValueError.  The desired ValueError was not
explicitly raised.  Make the error explicit.
Different libraries meant different
things happened:
- self.differentiation_method
- FiniteDifference
- convert_udot_integral

These were all about calculating the left hand side of the SINDy objective.
That means that we have a common abstraction, and we can use that to remove a
lot of conditional behavior by assigning calculate_trajectory to the correct
method, once.

Add that assignment to BaseFeatureLibrary, PDELibrary, WeakPDELibrary, and
GeneralizedLibrary.  Trade off: GeneralizedLibrary now needs to import
PDELibrary, which seems an acceptable deepening of internal package dependency.

Because the feature libraries also deal with different shapes of inputs, we
can do the same with validate_input().  Maybe ideally there's a better
abstraction of "well formatted data" that works for all SINDy libraries, but in
the interim, handle this function within the class and then at runtime,
dispatch to the appropriate class method.
Different libraries meant different
things happened:
- self.differentiation_method
- FiniteDifference
- convert_udot_integral

These were all about calculating the left hand side of the SINDy objective.
That means that we have a common abstraction, and we can use that to remove a
lot of conditional behavior by assigning calculate_trajectory to the correct
method, once.

Add that assignment to BaseFeatureLibrary, PDELibrary, WeakPDELibrary, and
GeneralizedLibrary.  Trade off: GeneralizedLibrary now needs to import
PDELibrary, which seems an acceptable deepening of internal package dependency.

Because the feature libraries also deal with different shapes of inputs, we
can do the same with validate_input().  Maybe ideally there's a better
abstraction of "well formatted data" that works for all SINDy libraries, but in
the interim, handle this function within the class and then at runtime,
dispatch to the appropriate class method.
Sometimes it had been used purely for its reshaping into a tall array,
with time in the first axis and coordinate index in the second index.
In the interest of refactoring to a common use of validate_input, breaking
it apart.
Introduces helper _zip_like_sequence()
Should have been in a previous commit
Move the conditional steps from _process_multiple_trajectories() inside each
class's calc_trajectory() method
In interest of flatness, move it to pysindy.py (not used in any other
module).
Also adds the has_type method for GeneralizedLibrary to deal with all the
checks to see if GeneralizedLibrary has a particular type.
Rather than having AxesArray know about libraries, have them know about
AxesArray and define their own assumptions about bad data shapes so they may
customize the ability to correct the data shape.  Since they can't import
pysindy without circular references, move AxesArray to them.

Adds PDEShapedMixin as a mixin for both PDELibrary and WeakPDELibrary, since
I think they both accept the same shape input data and can therefore use
the same `comprehend_axes()` method.

Also add validate_no_reshape to utils to break apart validate_inputs.  The old,
omnibus solution to validate_input + reshape left a lot of implicit behavior
all over pysindy, as different conditionals knew something about which axes
meant what.

This means we can replace code like:
    FiniteDifference(d=1, axis=-2)
with:
    FiniteDifference(d=1, axis=x.ax_time)
These should be TypeErrors.  Previously, test_fit_multiple_trajectories
asked that x not being a list raise a TypeError, but u not being a list
raise a ValueError.

In addition, move this kind of validation forward, before derivatives, rather
than after, when control inputs were previously examined for the first time.
Previously passed a list of lists, rather than a list of arrays.
Modified the tests to pop out these warnings. I don't know why these warnings
are caught in the first place, given that I tried edits to pyproject.toml to
tell pytest to ignore them.  The AxesArray warning shouldn't matter to the
user, because PendingDeprecationWarnings are ignored in all cases except test.

May be the wrong error superclass, since AxesWarning really here just for
debugging during refactor.
Also:
- Deprecate ambiguous default data shapes.
  Previously, passing an array with three axes to the default problem worked
  because extra axes were implicitly concatenated onto the time axis.  This
  meant that meaningless derivatives were calculated across the concatenation
  points.

  AFAIK, no client code should use this and is only tested in
  test_pysindy.test_data_shapes.  Probably safe to remove these tests and the
  related code outright.

- Recreate AxesArray object when BaseFeatureLibrary runs calc_trajectory.
    Because differentiation methods come from outside SINDy, some do not
    return an AxesArray.

Breaking Change:
- Continued quest to remove implicit shaping by removing some old calls to
    validate_input().  Now, differentiation methods now return the same shape
    as their inputs.  Previously, they implicitly created 2D arrays.
Removed from:
    _process_multiple_trajectories()
    validate_control_variables()

This param exists to reshape the data with a sample axis.  To move away
from implicit shaping, remove this param and in refactoring, let the
necessary callers do the reshaping.
Allow subclasses.  Now named test_differentiate_returns_compatible_data_type()
@Jacob-Stevens-Haas Jacob-Stevens-Haas self-assigned this Jun 28, 2022
Copy link
Member Author

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these can be turned into GH Issues/project todos.

test/feature_library/test_feature_library.py Outdated Show resolved Hide resolved
test/feature_library/test_feature_library.py Outdated Show resolved Hide resolved
test/feature_library/test_feature_library.py Outdated Show resolved Hide resolved
test/optimizers/test_optimizers.py Outdated Show resolved Hide resolved
test/test_pysindy.py Outdated Show resolved Hide resolved
pysindy/feature_library/weak_pde_library.py Outdated Show resolved Hide resolved
pysindy/feature_library/weak_pde_library.py Outdated Show resolved Hide resolved
pysindy/feature_library/pde_library.py Show resolved Hide resolved
pysindy/feature_library/pde_library.py Show resolved Hide resolved
pysindy/feature_library/pde_library.py Outdated Show resolved Hide resolved
@Jacob-Stevens-Haas
Copy link
Member Author

Hey @znicolaou , I finished reading through the PR and there were two questions I had for you on code that came from #195. Also, I didn't reread feature_libraries code too closely. When you review, can you give that part more focus than I did?

Move related code nearer, delete commented code, edited inc/obsolete docstrings
Discrete case:
Since validate_control_variables() relies on the shape of x, which
_process_multiple_trajectories() changes, its behavior needed to change
in order to move call to _process_multiple_trajectories().
Removed unneccessary attributes.  Extra attributes leads to spooky
action at a distance (highly coupled code)
@znicolaou
Copy link
Collaborator

Thanks for the review @Jacob-Stevens-Haas! I will plan to take a look in the few days as well, and respond to your questions.

Still can't be fixed in some slice cases, but will work for all
explicitly created AxesArrays.

Added test_n_elements
The different DefaultShapedInputsMixin and PDEShapedInputsMixin supported
backwards compatability with code that has been removed.  A module function
is better.
The culprit was the feature libraries transform using comprehend_axes(), which
assigns a certain axis to time rather than sample.
Accidentally removed copy(), which was ok, but still needed
derivs=x.
return axes


def ax_time_to_ax_sample(x: AxesArray) -> AxesArray:
Copy link
Collaborator

@znicolaou znicolaou Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function actually used for anything? It is applied in some but not all feature library transforms and in the SINDy score and/or predict. But the SampleConcatter implements a more general version of converting other axes to samples (which is why it doesn't matter that some feature libraries don't call ax_time_to_samples). I'd probably try to remove this, let feature libraries keep the axes unchanged, and use the SampleConcatter everywhere instead. But this is not pressing, so could be deferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove it - the function was created at a time when drop_nan_samples and drop_random_samples were called before the feature libraries transform(), and therefore before samples were concatted. Now it probably can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in new commit

@@ -232,7 +244,25 @@ def fit(self, x, y=None):

return self

def transform(self, x):
def has_type(self, libtype: Type, exclusively=False) -> bool:
Copy link
Collaborator

@znicolaou znicolaou Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used yet, but implemented for future purposes? Should the ConcatLibrary and TensorLibrary also implement it, and should included in BaseFeatureLibrary? Also, what if the self.libraries_ contains a GeneralizedLibrary, TensorLibrary, or ConcatLibrary? Should has_type recursively check the has_type of elements in self.libraries_? It would be easier if we never need to check if the library contains any particular type and we could remove this function...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a vestige of refactoring, when pysindy.py had several nested loops and conditionals to check whether to treat the data like it went to a WeakPDELibrary.

Since there's no current use or test for the function, I'm happy to delete it rather than try to improve it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in new commit.

x = check_array(x, order="F", dtype=FLOAT_DTYPES, accept_sparse=("csr", "csc"))
xp_full = []
for x in x_full:
if sparse.issparse(x) and x.format not in ["csr", "csc"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PolynomialLibrary is the only library that supports the sparse input data currently. I don't understand the use case for it--what sort of time series data is expected to be sparse? If there is a usage, should we make an issue to implement it more generally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PolynomialLibrary is the only library that optimizes for sparse input. Sparse input may also work in other libraries via duck typing.

I think Kathleen added these lines when I ran git blame. I had the same question. It was just easier for me to build backwards compatibility rather than try to find out the reason.

I'll create an issue and reach out to Kathleen to see why she added them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense! Obviously, nothing required for this PR, but just an observation.

@@ -99,7 +159,7 @@ def _ensemble(self, xp):
)
inds = range(self.n_output_features_)
inds = np.delete(inds, self.ensemble_indices)
return xp[:, inds]
return [x[..., inds] for x in xp]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ellipses ... are used in many libraries to slice only along the ax_coords axis. For future usage where the axes aren't necessarily ordered with ax_coord at the end, it would be nice to include a utility function to slice along the ax_coord.

Copy link
Member Author

@Jacob-Stevens-Haas Jacob-Stevens-Haas Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x.take(inds, axis=x.ax_coord)

There are a few cases where it doesn't work (not all types of indexing are supported). Since this is in _ensemble, which is only kept for BC, do you want me to go in and change it here?

Copy link
Collaborator

@znicolaou znicolaou Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not sure this will work for assignment of slices along the ax_coord axis, like spatiotemporal_grid[..., -1] = temporal_grid. Would be easy to implement with a slice object, like

s=[slice(None)]*spatiotemporal_grid.ndim
s[spatiotemporal_grid.ax_coord]=-1
spatiotemporal_grid[tuple(s)] = temporal_grid

Not sure whether a utility function in AxesArray would simplify code like this or not, but maybe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used it, it looks like np.put_along_axis seems to be the counterpart to np.take

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, good to know, thanks!

Copy link
Collaborator

@znicolaou znicolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm satisfied with the current state of this branch now, and am ready to draft some release notes, which I'll start thinking about this weekend. @Jacob-Stevens-Haas I left a few comments with questions and suggestions for the future, which we may want to include as issues for the future or make some small adjustments still before the merge. Let me know what you think.

pysindy.GeneralizedLibrary.has_type - not used
pysindy.utils.ax_time_to_ax_spatial - limited use in previous refactor.
@Jacob-Stevens-Haas Jacob-Stevens-Haas merged commit 632585a into dynamicslab:master Jul 3, 2022
@Jacob-Stevens-Haas
Copy link
Member Author

Merged! I added a 1.x branch at the old position of master just in case something goes wrong.

@znicolaou
Copy link
Collaborator

Awesome, looks great! We just need to draft up the release notes to make a new release soon now.

@Jacob-Stevens-Haas Jacob-Stevens-Haas deleted the simplify-fitting branch July 25, 2022 19:24
jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this pull request Apr 30, 2024
…itting

Simplify derivatives in pysindy.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants