Simplify derivatives in pysindy.py #185

Jacob-Stevens-Haas · 2022-05-24T03:33:16Z

Removes a lot of nested conditionals and ~150 SLOC. All derivatives/integrals are calculated in a single line of code inside pysindy.py. Passes all tests locally, but I'm leaving it as a draft until I can check that the notebooks all run.

As the former version of pysindy/pysindy.py said:

 # Note: This block for calculating x_dot for weak and pde libs is identical
 # to the one in score. Should we redefine the differentiation_method to cover
 # the weak and pde library cases more generally?
 # The validate_input should probably explicity handle the weak and pde cases...

This PR addresses that concern with two abstractions:

Feature libraries are responsible for addressing how differentiation occurs via their calc_trajectory() method. For BaseFeatureLibrary, that simply means applying the differentiation function passed. For PDELibrary, that means ignoring the differentiation library and using FiniteDifference. For WeakPDELibrary, that means ignoring the differentiation library passed and using convert_udot_integral. GeneralizedLibrary chooses which method depending on which types of library it has, in an existing loop. These are how SINDy.fit() (and other methods) was previously working, but it allows all of the conditionals to be replaced with: self.feature_library.calc_trajectory(self.differentiation_method, x, t). I chose that name because it's called in an iterator within _process_multiple_trajectories, but happy to change it.
Feature libraries are responsible for addressing how input validation occurs. For BaseFeatureLibrary, that simply means applying utils.validate_input(). For PDELibrary and WeakPDELibrary (and when applicable, GeneralizedLibrary) just return x.

It also adds two more abstractions that assist code deletion and DRY
3. _adapt_to_multiple_trajectories() makes all trajectories lists, so code handling single trajectories is removed
4. _zip_like_sequence() allows combining the many conditionals around whether t is a sequence or a scalar:

if isinstance(t, Sequence):
    [something(xi, ti) for xi, ti in zip(x, t)
else:
    [something(xi, t) for xi in x)

becomes

[something(xi, ti) for xi, ti in _zip_like_sequence(x, t)]

Only immediately obvious con of this PR is a slight performance hit in the non-PDE cases. The PDE cases previously ran validate_input() after their differentiation/integration step. Now non-PDE cases do so as well before differentiation. Using the current form of calc_trajectory() isn't compatible with this method unless BaseFeatureLibrary.calc_trajectory() would return x, x_dot instead of just x_dot. I'd love for that to be the case (see #182 ), but it was a design decision I wanted to kick back to the group in case objections.

Other issues:

There's still some weird conditional code in _process_multiple_trajectories() for when x_dot is explicitly given as an argument. It assumes to handle some WeakPDELibrary cases when x_dot is given in a different form than if calculated from the data. I'm assuming that test case is a mistake?
Tests take a long time. In many cases (esp PDE/WeakPDE cases) the test generates 100x100 matrices for data. Maybe it's possible to reduce that?
The feature libraries may be the right place for validate_input, but maybe there's a more elegant solution?

akaptano · 2022-05-24T16:30:14Z

@Jacob-Stevens-Haas Great job with this. This is no longer my day job but I should be able to give this a solid look this long weekend!

… WeakPDELibrary is similar, but havent figured out the tweights issue to correctly integrate things. Replaced the SINDyPI example with an updated and better example. Waiting on modified KdV data from Kardy to try it out on PDE data. One other issue is generating polynomials of derivative terms. Going to talk to Zack about easiest way to implement since he does this somewhere.

…kPDELibrary, but now PDELibrary also has such a grid. So can just check for some WeakPDELibrary specific attribute like Hxt instead. There will definitely be future issues with the ensembling.

Add adapter code in new utils function adapt_to_multiple_trajectories() Remove all internal code for handling single-trajectory data from pysindy.py, but maintain interface in fit(), score(), predict(), and differentiate(). In cases when x and u had multiple trajectories but multiple_trajectories was, false, tests previously expected a ValueError. The desired ValueError was not explicitly raised. Make the error explicit.

Different libraries meant different things happened: - self.differentiation_method - FiniteDifference - convert_udot_integral These were all about calculating the left hand side of the SINDy objective. That means that we have a common abstraction, and we can use that to remove a lot of conditional behavior by assigning calculate_trajectory to the correct method, once. Add that assignment to BaseFeatureLibrary, PDELibrary, WeakPDELibrary, and GeneralizedLibrary. Trade off: GeneralizedLibrary now needs to import PDELibrary, which seems an acceptable deepening of internal package dependency. Because the feature libraries also deal with different shapes of inputs, we can do the same with validate_input(). Maybe ideally there's a better abstraction of "well formatted data" that works for all SINDy libraries, but in the interim, handle this function within the class and then at runtime, dispatch to the appropriate class method.

WIP for further consolidation.

Sometimes it had been used purely for its reshaping into a tall array, with time in the first axis and coordinate index in the second index. In the interest of refactoring to a common use of validate_input, breaking it apart.

Introduces helper _zip_like_sequence()

Should have been in a previous commit

Move the conditional steps from _process_multiple_trajectories() inside each class's calc_trajectory() method

In interest of flatness, move it to pysindy.py (not used in any other module).

Also adds the has_type method for GeneralizedLibrary to deal with all the checks to see if GeneralizedLibrary has a particular type.

Rather than having AxesArray know about libraries, have them know about AxesArray and define their own assumptions about bad data shapes so they may customize the ability to correct the data shape. Since they can't import pysindy without circular references, move AxesArray to them. Adds PDEShapedMixin as a mixin for both PDELibrary and WeakPDELibrary, since I think they both accept the same shape input data and can therefore use the same `comprehend_axes()` method. Also add validate_no_reshape to utils to break apart validate_inputs. The old, omnibus solution to validate_input + reshape left a lot of implicit behavior all over pysindy, as different conditionals knew something about which axes meant what. This means we can replace code like: FiniteDifference(d=1, axis=-2) with: FiniteDifference(d=1, axis=x.ax_time)

These should be TypeErrors. Previously, test_fit_multiple_trajectories asked that x not being a list raise a TypeError, but u not being a list raise a ValueError. In addition, move this kind of validation forward, before derivatives, rather than after, when control inputs were previously examined for the first time.

Previously passed a list of lists, rather than a list of arrays.

Modified the tests to pop out these warnings. I don't know why these warnings are caught in the first place, given that I tried edits to pyproject.toml to tell pytest to ignore them. The AxesArray warning shouldn't matter to the user, because PendingDeprecationWarnings are ignored in all cases except test. May be the wrong error superclass, since AxesWarning really here just for debugging during refactor.

Also: - Deprecate ambiguous default data shapes. Previously, passing an array with three axes to the default problem worked because extra axes were implicitly concatenated onto the time axis. This meant that meaningless derivatives were calculated across the concatenation points. AFAIK, no client code should use this and is only tested in test_pysindy.test_data_shapes. Probably safe to remove these tests and the related code outright. - Recreate AxesArray object when BaseFeatureLibrary runs calc_trajectory. Because differentiation methods come from outside SINDy, some do not return an AxesArray. Breaking Change: - Continued quest to remove implicit shaping by removing some old calls to validate_input(). Now, differentiation methods now return the same shape as their inputs. Previously, they implicitly created 2D arrays.

Removed from: _process_multiple_trajectories() validate_control_variables() This param exists to reshape the data with a sample axis. To move away from implicit shaping, remove this param and in refactoring, let the necessary callers do the reshaping.

Allow other functions to use it.

Allow subclasses. Now named test_differentiate_returns_compatible_data_type()

Jacob-Stevens-Haas

Many of these can be turned into GH Issues/project todos.

test/feature_library/test_feature_library.py

test/optimizers/test_optimizers.py

test/test_pysindy.py

pysindy/feature_library/weak_pde_library.py

pysindy/feature_library/pde_library.py

Jacob-Stevens-Haas · 2022-06-28T03:46:19Z

Hey @znicolaou , I finished reading through the PR and there were two questions I had for you on code that came from #195. Also, I didn't reread feature_libraries code too closely. When you review, can you give that part more focus than I did?

Move related code nearer, delete commented code, edited inc/obsolete docstrings Discrete case: Since validate_control_variables() relies on the shape of x, which _process_multiple_trajectories() changes, its behavior needed to change in order to move call to _process_multiple_trajectories().

Removed unneccessary attributes. Extra attributes leads to spooky action at a distance (highly coupled code)

znicolaou · 2022-06-28T21:23:18Z

Thanks for the review @Jacob-Stevens-Haas! I will plan to take a look in the few days as well, and respond to your questions.

Still can't be fixed in some slice cases, but will work for all explicitly created AxesArrays. Added test_n_elements

wrap_axes

The different DefaultShapedInputsMixin and PDEShapedInputsMixin supported backwards compatability with code that has been removed. A module function is better.

The culprit was the feature libraries transform using comprehend_axes(), which assigns a certain axis to time rather than sample.

Accidentally removed copy(), which was ok, but still needed derivs=x.

znicolaou · 2022-07-01T20:06:29Z

pysindy/utils/axes.py

+    return axes
+
+
+def ax_time_to_ax_sample(x: AxesArray) -> AxesArray:


Is this function actually used for anything? It is applied in some but not all feature library transforms and in the SINDy score and/or predict. But the SampleConcatter implements a more general version of converting other axes to samples (which is why it doesn't matter that some feature libraries don't call ax_time_to_samples). I'd probably try to remove this, let feature libraries keep the axes unchanged, and use the SampleConcatter everywhere instead. But this is not pressing, so could be deferred.

I'm happy to remove it - the function was created at a time when drop_nan_samples and drop_random_samples were called before the feature libraries transform(), and therefore before samples were concatted. Now it probably can be removed.

Removed in new commit

znicolaou · 2022-07-01T20:18:21Z

pysindy/feature_library/generalized_library.py

@@ -232,7 +244,25 @@ def fit(self, x, y=None):

        return self

-    def transform(self, x):
+    def has_type(self, libtype: Type, exclusively=False) -> bool:


This is not used yet, but implemented for future purposes? Should the ConcatLibrary and TensorLibrary also implement it, and should included in BaseFeatureLibrary? Also, what if the self.libraries_ contains a GeneralizedLibrary, TensorLibrary, or ConcatLibrary? Should has_type recursively check the has_type of elements in self.libraries_? It would be easier if we never need to check if the library contains any particular type and we could remove this function...

Also a vestige of refactoring, when pysindy.py had several nested loops and conditionals to check whether to treat the data like it went to a WeakPDELibrary.

Since there's no current use or test for the function, I'm happy to delete it rather than try to improve it.

removed in new commit.

znicolaou · 2022-07-01T20:21:09Z

pysindy/feature_library/polynomial_library.py

-        x = check_array(x, order="F", dtype=FLOAT_DTYPES, accept_sparse=("csr", "csc"))
+        xp_full = []
+        for x in x_full:
+            if sparse.issparse(x) and x.format not in ["csr", "csc"]:


The PolynomialLibrary is the only library that supports the sparse input data currently. I don't understand the use case for it--what sort of time series data is expected to be sparse? If there is a usage, should we make an issue to implement it more generally?

The PolynomialLibrary is the only library that optimizes for sparse input. Sparse input may also work in other libraries via duck typing.

I think Kathleen added these lines when I ran git blame. I had the same question. It was just easier for me to build backwards compatibility rather than try to find out the reason.

I'll create an issue and reach out to Kathleen to see why she added them.

Cool, makes sense! Obviously, nothing required for this PR, but just an observation.

znicolaou · 2022-07-01T20:24:12Z

pysindy/feature_library/base.py

@@ -99,7 +159,7 @@ def _ensemble(self, xp):
                )
            inds = range(self.n_output_features_)
            inds = np.delete(inds, self.ensemble_indices)
-            return xp[:, inds]
+            return [x[..., inds] for x in xp]


The ellipses ... are used in many libraries to slice only along the ax_coords axis. For future usage where the axes aren't necessarily ordered with ax_coord at the end, it would be nice to include a utility function to slice along the ax_coord.

x.take(inds, axis=x.ax_coord)

There are a few cases where it doesn't work (not all types of indexing are supported). Since this is in _ensemble, which is only kept for BC, do you want me to go in and change it here?

Also, not sure this will work for assignment of slices along the ax_coord axis, like spatiotemporal_grid[..., -1] = temporal_grid. Would be easy to implement with a slice object, like

s=[slice(None)]*spatiotemporal_grid.ndim s[spatiotemporal_grid.ax_coord]=-1 spatiotemporal_grid[tuple(s)] = temporal_grid

Not sure whether a utility function in AxesArray would simplify code like this or not, but maybe.

I haven't used it, it looks like np.put_along_axis seems to be the counterpart to np.take

Cool, good to know, thanks!

znicolaou

I think I'm satisfied with the current state of this branch now, and am ready to draft some release notes, which I'll start thinking about this weekend. @Jacob-Stevens-Haas I left a few comments with questions and suggestions for the future, which we may want to include as issues for the future or make some small adjustments still before the merge. Let me know what you think.

pysindy.GeneralizedLibrary.has_type - not used pysindy.utils.ax_time_to_ax_spatial - limited use in previous refactor.

Jacob-Stevens-Haas · 2022-07-03T23:51:35Z

Merged! I added a 1.x branch at the old position of master just in case something goes wrong.

znicolaou · 2022-07-04T00:04:15Z

Awesome, looks great! We just need to draft up the release notes to make a new release soon now.

…itting Simplify derivatives in pysindy.py

akaptano requested review from znicolaou, briandesilva and akaptano May 24, 2022 16:29

Jacob-Stevens-Haas mentioned this pull request May 24, 2022

Add Pytest config? #186

Closed

This was referenced Jun 2, 2022

[BUG] FROLS creates nan entries and then fails #191

Closed

[BUG] Can we remove support for PDELibrary with no spatial derivatives or grid? #192

Open

Alan Kaptanoglu and others added 23 commits June 5, 2022 15:40

Think I fixed small bug where checking for spatiotemporal_grid if Wea…

df0472a

…kPDELibrary, but now PDELibrary also has such a grid. So can just check for some WeakPDELibrary specific attribute like Hxt instead. There will definitely be future issues with the ensembling.

CLN: Move all calculate_trajectory inside _process_multiple_trajectories

783377a

WIP for further consolidation.

CLN: Break apart validate_input (WIP)

dc9f013

Sometimes it had been used purely for its reshaping into a tall array, with time in the first axis and coordinate index in the second index. In the interest of refactoring to a common use of validate_input, breaking it apart.

CLN: Adapt scalar t to case when t is a Sequence

c696458

Introduces helper _zip_like_sequence()

CLN: Add PDELibrary.validate_input() (out of order)

9df1a31

Should have been in a previous commit

CLN: Combine all calls to feature_library.differentiation_method

19f077d

Move the conditional steps from _process_multiple_trajectories() inside each class's calc_trajectory() method

CLN: Move adapt_to_multiple_trajectories to pysindy.py

9364b2f

In interest of flatness, move it to pysindy.py (not used in any other module).

TST: Add test to ensure differentiate returns expected data type & shape

234706e

ENH: Add AxesArray type that stores the semantic meaning of each axis.

8c12868

Also adds the has_type method for GeneralizedLibrary to deal with all the checks to see if GeneralizedLibrary has a particular type.

TST: Change x type in test_fit_multiple_trajectores() mismatched x, u

9b6f79f

Previously passed a list of lists, rather than a list of arrays.

CLN: Remove more calls to utils.validate_input

3a37562

CLN: Add workflow files to gitignore

c03ab9b

BUG: Refactor validation and array axes comprehension out of fit()

9677969

Allow other functions to use it.

TST: Soften "test_differentiate_returns_same_data_type"

2d9d260

Allow subclasses. Now named test_differentiate_returns_compatible_data_type()

Jacob-Stevens-Haas self-assigned this Jun 28, 2022

Jacob-Stevens-Haas commented Jun 28, 2022

View reviewed changes

Jacob-Stevens-Haas added 4 commits June 28, 2022 11:36

Merge branch 'master' into simplify-fitting

ecf4f97

CLN: Simplify stgrid initialization in PDELibrary

6712504

Removed unneccessary attributes. Extra attributes leads to spooky action at a distance (highly coupled code)

TST: Revert workaround for test_almost_equal(AxesArray)

fd93037

Jacob-Stevens-Haas added 4 commits June 28, 2022 14:47

BUG: Make AxesArray.n_spatial (&c) work

b7ba2d9

Still can't be fixed in some slice cases, but will work for all explicitly created AxesArrays. Added test_n_elements

CLN: Simplify AxesArray patterns

1fb3661

wrap_axes

CLN: make comprehend_axes module function in axes.py

04e3003

The different DefaultShapedInputsMixin and PDEShapedInputsMixin supported backwards compatability with code that has been removed. A module function is better.

ENH: Improve errors/comments on ax_time_to_ax_sample

6648670

Jacob-Stevens-Haas mentioned this pull request Jun 29, 2022

ENH: Allow AxesArray to handle slicing/indexing correctly #220

Closed

Jacob-Stevens-Haas added 3 commits June 28, 2022 18:36

CLN: Defer issue #220

3ba7473

ENH: Upgrade warn to error when concatenating misaligned AxesArray

21e933b

The culprit was the feature libraries transform using comprehend_axes(), which assigns a certain axis to time rather than sample.

BUG: Fix deriv copy in PDELibrary.transform()

1128a5a

Accidentally removed copy(), which was ok, but still needed derivs=x.

This was referenced Jul 1, 2022

Add differentiation_method to PDELibrary and WeakPDELibrary #221

Closed

Enable multiple spatial_grids and spatiotemporal_grids for multiple trajectories #222

Open

CLN: remove commented out code

40cb00a

znicolaou reviewed Jul 1, 2022

View reviewed changes

znicolaou approved these changes Jul 1, 2022

View reviewed changes

CLN: Remove unused and unnecessary functions.

f9ebbbd

pysindy.GeneralizedLibrary.has_type - not used pysindy.utils.ax_time_to_ax_spatial - limited use in previous refactor.

Jacob-Stevens-Haas merged commit 632585a into dynamicslab:master Jul 3, 2022

Jacob-Stevens-Haas deleted the simplify-fitting branch July 25, 2022 19:24

jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this pull request Apr 30, 2024

Merge pull request dynamicslab#185 from Jacob-Stevens-Haas/simplify-f…

d6494a6

…itting Simplify derivatives in pysindy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify derivatives in pysindy.py #185

Simplify derivatives in pysindy.py #185

Jacob-Stevens-Haas commented May 24, 2022 •

edited

Loading

akaptano commented May 24, 2022

Jacob-Stevens-Haas left a comment

Jacob-Stevens-Haas commented Jun 28, 2022

znicolaou commented Jun 28, 2022

znicolaou Jul 1, 2022 •

edited

Loading

Jacob-Stevens-Haas Jul 2, 2022

Jacob-Stevens-Haas Jul 3, 2022

znicolaou Jul 1, 2022 •

edited

Loading

Jacob-Stevens-Haas Jul 2, 2022

Jacob-Stevens-Haas Jul 3, 2022

znicolaou Jul 1, 2022

Jacob-Stevens-Haas Jul 2, 2022

znicolaou Jul 3, 2022

znicolaou Jul 1, 2022

Jacob-Stevens-Haas Jul 3, 2022 •

edited

Loading

znicolaou Jul 3, 2022 •

edited

Loading

Jacob-Stevens-Haas Jul 3, 2022

znicolaou Jul 3, 2022

znicolaou left a comment

Jacob-Stevens-Haas commented Jul 3, 2022

znicolaou commented Jul 4, 2022

		return axes


		def ax_time_to_ax_sample(x: AxesArray) -> AxesArray:

Simplify derivatives in pysindy.py #185

Simplify derivatives in pysindy.py #185

Conversation

Jacob-Stevens-Haas commented May 24, 2022 • edited Loading

akaptano commented May 24, 2022

Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas commented Jun 28, 2022

znicolaou commented Jun 28, 2022

znicolaou Jul 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

znicolaou Jul 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas Jul 3, 2022 • edited Loading

Choose a reason for hiding this comment

znicolaou Jul 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

znicolaou left a comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas commented Jul 3, 2022

znicolaou commented Jul 4, 2022

Jacob-Stevens-Haas commented May 24, 2022 •

edited

Loading

znicolaou Jul 1, 2022 •

edited

Loading

znicolaou Jul 1, 2022 •

edited

Loading

Jacob-Stevens-Haas Jul 3, 2022 •

edited

Loading

znicolaou Jul 3, 2022 •

edited

Loading