-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify derivatives in pysindy.py #185
Simplify derivatives in pysindy.py #185
Conversation
@Jacob-Stevens-Haas Great job with this. This is no longer my day job but I should be able to give this a solid look this long weekend! |
… WeakPDELibrary is similar, but havent figured out the tweights issue to correctly integrate things. Replaced the SINDyPI example with an updated and better example. Waiting on modified KdV data from Kardy to try it out on PDE data. One other issue is generating polynomials of derivative terms. Going to talk to Zack about easiest way to implement since he does this somewhere.
…kPDELibrary, but now PDELibrary also has such a grid. So can just check for some WeakPDELibrary specific attribute like Hxt instead. There will definitely be future issues with the ensembling.
Add adapter code in new utils function adapt_to_multiple_trajectories() Remove all internal code for handling single-trajectory data from pysindy.py, but maintain interface in fit(), score(), predict(), and differentiate(). In cases when x and u had multiple trajectories but multiple_trajectories was, false, tests previously expected a ValueError. The desired ValueError was not explicitly raised. Make the error explicit.
Different libraries meant different things happened: - self.differentiation_method - FiniteDifference - convert_udot_integral These were all about calculating the left hand side of the SINDy objective. That means that we have a common abstraction, and we can use that to remove a lot of conditional behavior by assigning calculate_trajectory to the correct method, once. Add that assignment to BaseFeatureLibrary, PDELibrary, WeakPDELibrary, and GeneralizedLibrary. Trade off: GeneralizedLibrary now needs to import PDELibrary, which seems an acceptable deepening of internal package dependency. Because the feature libraries also deal with different shapes of inputs, we can do the same with validate_input(). Maybe ideally there's a better abstraction of "well formatted data" that works for all SINDy libraries, but in the interim, handle this function within the class and then at runtime, dispatch to the appropriate class method.
Different libraries meant different things happened: - self.differentiation_method - FiniteDifference - convert_udot_integral These were all about calculating the left hand side of the SINDy objective. That means that we have a common abstraction, and we can use that to remove a lot of conditional behavior by assigning calculate_trajectory to the correct method, once. Add that assignment to BaseFeatureLibrary, PDELibrary, WeakPDELibrary, and GeneralizedLibrary. Trade off: GeneralizedLibrary now needs to import PDELibrary, which seems an acceptable deepening of internal package dependency. Because the feature libraries also deal with different shapes of inputs, we can do the same with validate_input(). Maybe ideally there's a better abstraction of "well formatted data" that works for all SINDy libraries, but in the interim, handle this function within the class and then at runtime, dispatch to the appropriate class method.
WIP for further consolidation.
Sometimes it had been used purely for its reshaping into a tall array, with time in the first axis and coordinate index in the second index. In the interest of refactoring to a common use of validate_input, breaking it apart.
Introduces helper _zip_like_sequence()
Should have been in a previous commit
Move the conditional steps from _process_multiple_trajectories() inside each class's calc_trajectory() method
In interest of flatness, move it to pysindy.py (not used in any other module).
Also adds the has_type method for GeneralizedLibrary to deal with all the checks to see if GeneralizedLibrary has a particular type.
Rather than having AxesArray know about libraries, have them know about AxesArray and define their own assumptions about bad data shapes so they may customize the ability to correct the data shape. Since they can't import pysindy without circular references, move AxesArray to them. Adds PDEShapedMixin as a mixin for both PDELibrary and WeakPDELibrary, since I think they both accept the same shape input data and can therefore use the same `comprehend_axes()` method. Also add validate_no_reshape to utils to break apart validate_inputs. The old, omnibus solution to validate_input + reshape left a lot of implicit behavior all over pysindy, as different conditionals knew something about which axes meant what. This means we can replace code like: FiniteDifference(d=1, axis=-2) with: FiniteDifference(d=1, axis=x.ax_time)
These should be TypeErrors. Previously, test_fit_multiple_trajectories asked that x not being a list raise a TypeError, but u not being a list raise a ValueError. In addition, move this kind of validation forward, before derivatives, rather than after, when control inputs were previously examined for the first time.
Previously passed a list of lists, rather than a list of arrays.
Modified the tests to pop out these warnings. I don't know why these warnings are caught in the first place, given that I tried edits to pyproject.toml to tell pytest to ignore them. The AxesArray warning shouldn't matter to the user, because PendingDeprecationWarnings are ignored in all cases except test. May be the wrong error superclass, since AxesWarning really here just for debugging during refactor.
Also: - Deprecate ambiguous default data shapes. Previously, passing an array with three axes to the default problem worked because extra axes were implicitly concatenated onto the time axis. This meant that meaningless derivatives were calculated across the concatenation points. AFAIK, no client code should use this and is only tested in test_pysindy.test_data_shapes. Probably safe to remove these tests and the related code outright. - Recreate AxesArray object when BaseFeatureLibrary runs calc_trajectory. Because differentiation methods come from outside SINDy, some do not return an AxesArray. Breaking Change: - Continued quest to remove implicit shaping by removing some old calls to validate_input(). Now, differentiation methods now return the same shape as their inputs. Previously, they implicitly created 2D arrays.
Removed from: _process_multiple_trajectories() validate_control_variables() This param exists to reshape the data with a sample axis. To move away from implicit shaping, remove this param and in refactoring, let the necessary callers do the reshaping.
Allow other functions to use it.
Allow subclasses. Now named test_differentiate_returns_compatible_data_type()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these can be turned into GH Issues/project todos.
Hey @znicolaou , I finished reading through the PR and there were two questions I had for you on code that came from #195. Also, I didn't reread feature_libraries code too closely. When you review, can you give that part more focus than I did? |
Move related code nearer, delete commented code, edited inc/obsolete docstrings Discrete case: Since validate_control_variables() relies on the shape of x, which _process_multiple_trajectories() changes, its behavior needed to change in order to move call to _process_multiple_trajectories().
Removed unneccessary attributes. Extra attributes leads to spooky action at a distance (highly coupled code)
Thanks for the review @Jacob-Stevens-Haas! I will plan to take a look in the few days as well, and respond to your questions. |
Still can't be fixed in some slice cases, but will work for all explicitly created AxesArrays. Added test_n_elements
wrap_axes
The different DefaultShapedInputsMixin and PDEShapedInputsMixin supported backwards compatability with code that has been removed. A module function is better.
The culprit was the feature libraries transform using comprehend_axes(), which assigns a certain axis to time rather than sample.
Accidentally removed copy(), which was ok, but still needed derivs=x.
pysindy/utils/axes.py
Outdated
return axes | ||
|
||
|
||
def ax_time_to_ax_sample(x: AxesArray) -> AxesArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function actually used for anything? It is applied in some but not all feature library transforms and in the SINDy
score
and/or predict
. But the SampleConcatter
implements a more general version of converting other axes to samples (which is why it doesn't matter that some feature libraries don't call ax_time_to_samples
). I'd probably try to remove this, let feature libraries keep the axes unchanged, and use the SampleConcatter
everywhere instead. But this is not pressing, so could be deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to remove it - the function was created at a time when drop_nan_samples
and drop_random_samples
were called before the feature libraries transform()
, and therefore before samples were concatted. Now it probably can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in new commit
@@ -232,7 +244,25 @@ def fit(self, x, y=None): | |||
|
|||
return self | |||
|
|||
def transform(self, x): | |||
def has_type(self, libtype: Type, exclusively=False) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used yet, but implemented for future purposes? Should the ConcatLibrary
and TensorLibrary
also implement it, and should included in BaseFeatureLibrary
? Also, what if the self.libraries_
contains a GeneralizedLibrary
, TensorLibrary
, or ConcatLibrary
? Should has_type
recursively check the has_type
of elements in self.libraries_
? It would be easier if we never need to check if the library contains any particular type and we could remove this function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a vestige of refactoring, when pysindy.py had several nested loops and conditionals to check whether to treat the data like it went to a WeakPDELibrary
.
Since there's no current use or test for the function, I'm happy to delete it rather than try to improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in new commit.
x = check_array(x, order="F", dtype=FLOAT_DTYPES, accept_sparse=("csr", "csc")) | ||
xp_full = [] | ||
for x in x_full: | ||
if sparse.issparse(x) and x.format not in ["csr", "csc"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PolynomialLibrary
is the only library that supports the sparse input data currently. I don't understand the use case for it--what sort of time series data is expected to be sparse? If there is a usage, should we make an issue to implement it more generally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PolynomialLibrary
is the only library that optimizes for sparse input. Sparse input may also work in other libraries via duck typing.
I think Kathleen added these lines when I ran git blame. I had the same question. It was just easier for me to build backwards compatibility rather than try to find out the reason.
I'll create an issue and reach out to Kathleen to see why she added them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, makes sense! Obviously, nothing required for this PR, but just an observation.
@@ -99,7 +159,7 @@ def _ensemble(self, xp): | |||
) | |||
inds = range(self.n_output_features_) | |||
inds = np.delete(inds, self.ensemble_indices) | |||
return xp[:, inds] | |||
return [x[..., inds] for x in xp] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ellipses ...
are used in many libraries to slice only along the ax_coords
axis. For future usage where the axes aren't necessarily ordered with ax_coord
at the end, it would be nice to include a utility function to slice along the ax_coord
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x.take(inds, axis=x.ax_coord)
There are a few cases where it doesn't work (not all types of indexing are supported). Since this is in _ensemble
, which is only kept for BC, do you want me to go in and change it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, not sure this will work for assignment of slices along the ax_coord
axis, like spatiotemporal_grid[..., -1] = temporal_grid
. Would be easy to implement with a slice object, like
s=[slice(None)]*spatiotemporal_grid.ndim
s[spatiotemporal_grid.ax_coord]=-1
spatiotemporal_grid[tuple(s)] = temporal_grid
Not sure whether a utility function in AxesArray would simplify code like this or not, but maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used it, it looks like np.put_along_axis
seems to be the counterpart to np.take
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, good to know, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm satisfied with the current state of this branch now, and am ready to draft some release notes, which I'll start thinking about this weekend. @Jacob-Stevens-Haas I left a few comments with questions and suggestions for the future, which we may want to include as issues for the future or make some small adjustments still before the merge. Let me know what you think.
pysindy.GeneralizedLibrary.has_type - not used pysindy.utils.ax_time_to_ax_spatial - limited use in previous refactor.
Merged! I added a |
Awesome, looks great! We just need to draft up the release notes to make a new release soon now. |
…itting Simplify derivatives in pysindy.py
Removes a lot of nested conditionals and ~150 SLOC. All derivatives/integrals are calculated in a single line of code inside pysindy.py. Passes all tests locally, but I'm leaving it as a draft until I can check that the notebooks all run.
As the former version of pysindy/pysindy.py said:
This PR addresses that concern with two abstractions:
calc_trajectory()
method. ForBaseFeatureLibrary
, that simply means applying the differentiation function passed. ForPDELibrary
, that means ignoring the differentiation library and using FiniteDifference. ForWeakPDELibrary
, that means ignoring the differentiation library passed and usingconvert_udot_integral
.GeneralizedLibrary
chooses which method depending on which types of library it has, in an existing loop. These are howSINDy.fit()
(and other methods) was previously working, but it allows all of the conditionals to be replaced with:self.feature_library.calc_trajectory(self.differentiation_method, x, t)
. I chose that name because it's called in an iterator within_process_multiple_trajectories
, but happy to change it.BaseFeatureLibrary
, that simply means applyingutils.validate_input()
. ForPDELibrary
andWeakPDELibrary
(and when applicable,GeneralizedLibrary
) just returnx
.It also adds two more abstractions that assist code deletion and DRY
3.
_adapt_to_multiple_trajectories()
makes all trajectories lists, so code handling single trajectories is removed4.
_zip_like_sequence()
allows combining the many conditionals around whether t is a sequence or a scalar:becomes
Only immediately obvious con of this PR is a slight performance hit in the non-PDE cases. The PDE cases previously ran
validate_input()
after their differentiation/integration step. Now non-PDE cases do so as well before differentiation. Using the current form ofcalc_trajectory()
isn't compatible with this method unlessBaseFeatureLibrary.calc_trajectory()
would returnx, x_dot
instead of justx_dot
. I'd love for that to be the case (see #182 ), but it was a design decision I wanted to kick back to the group in case objections.Other issues:
_process_multiple_trajectories()
for whenx_dot
is explicitly given as an argument. It assumes to handle someWeakPDELibrary
cases whenx_dot
is given in a different form than if calculated from the data. I'm assuming that test case is a mistake?validate_input
, but maybe there's a more elegant solution?