Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: a simple cov #10

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

ENH: a simple cov #10

wants to merge 5 commits into from

Conversation

lucascolley
Copy link
Collaborator

This would replace a private utility in SciPy.

@lucascolley lucascolley changed the title ENH: a simple cov ENH: a simple cov and complex mean Sep 25, 2024
@lucascolley lucascolley added enhancement New feature or request new function labels Sep 25, 2024
@@ -46,3 +47,48 @@ def atleast_nd(x: Array, *, ndim: int, xp: ModuleType) -> Array:
x = xp.expand_dims(x, axis=0)
x = atleast_nd(x, ndim=ndim, xp=xp)
return x


def cov(x: Array, *, xp: ModuleType) -> Array:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this implementation and API is constrained enough, I would exercise caution when adding such APIs to ensure that the design of whatever is included in this library is consistent with Array API design principles and does not impose limitations on future standardization.

While not applicable in this specific instance, I would like to avoid a scenario where array_api_extra provides a backdoor to de facto standardized APIs which have design choices that we wouldn't want to endorse were we standardizing directly in the standard. Meaning, I'd like to avoid getting backed into a corner.

In this case, the particular design aspect which comes to mind is the behavior of a correction argument, similar to what we have in std and var.

In general, my current thinking is that, so long as this is under the Consortium umbrella, potential APIs need to go through a somewhat similar "standardization" proposal and inclusion process. That means comparative library analysis, discussion of design choices, and whether inclusion presents an opportunity to address limitations in prior art.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point, and we need to think more about API stability here. At the same time, this could to some degree defeat the purpose of the library if we have to do a full standardization process for every function. I wonder if there's a way we could organize things here so that API stability isn't as big of a concern. For instance, we could make it so that array API extra is only supported as a vendored library (or requiring people to just copy-paste the implementations). Or we could organize the code in a way so that different versions of APIs are possible.

Copy link
Collaborator Author

@lucascolley lucascolley Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @kgryte ! Agreed that we want to avoid creating de facto standardized APIs here with less care than would be put into the standard. I can see how it would be bad to guide consumer libraries towards whatever APIs are here, just to tell them that they need to change again when a more considered API is standardized.

I do think that this library should still be able to move more quickly than the standard though - the primary motivation being providing functionality to consumer libraries via some reasonable API, rather than convincing array libraries that this is the "correct" API (there is no need since everything is implemented in terms of the standard). Having a non-ideal API here is (arguably) still better than making consumer library maintainers search through private modules of other libraries like SciPy or implement the function themselves, provided that we can sufficiently communicate that an API being a part of this library does not mean that it is standardized.

If we were to go through a very similar process to the standard, and the result was consensus on an API, then it should probably just be added to the standard itself!

I think one way we could try to avoid creating de facto standardized APIs is being quite liberal in adding kwargs to vary behaviour when there are competing ideas (i.e. competing use-cases in consumer libraries). Then, while functions live in this library, consumer libraries can pick and choose the functionality they need without any particular mode being "standard". If at some point some "extra" functionality is considered for standardization, then the normal process would apply, but with the additional information of how the API from this library has been used.

That means comparative library analysis, discussion of design choices, and whether inclusion presents an opportunity to address limitations in prior art.

This would be fantastic to see, but given that I'm heading back to university soon, I am unlikely to be able to contribute the necessary time to carry out that process rigorously for the foreseeable future. So development here may be quite slow. Hopefully you can get others to fill that role while I'm too busy! I do think that this library will be an important part of making adoption of the standard more easy for consumer libraries.

EDIT: The same point about time applies to other potential contributors. I think we may be able to create some nice collaboration where consumer libraries take from and give back to xpx by writing a function or two, which in turn will help other consumer libraries. I think we are less likely to convince consumer library maintainers to enter discussions about an ideal API (they may well not really care about the API at all, they just need something that works).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one way we could try to avoid creating de facto standardized APIs is being quite liberal in adding kwargs to vary behaviour when there are competing ideas (i.e. competing use-cases in consumer libraries).

I'd go the opposite way: namely, being very conservative in terms of what behavior is supported.

For some background on Array API "extras", this idea has been discussed during workgroup meetings at various points for different reasons. One of those reasons is a potential place for APIs which have broad, but not ubiquitous, consensus among ecosystem libraries. E.g., a number of ecosystem libraries have other special functions, such as gamma, erf, beta, etc. Some of these are unlikely to be standardized directly in the standard due to the NumPy/SciPy split. Other libraries have APIs associated with deep learning applications (e.g., various loss functions), but these are unlikely to ever be directly added to more general purpose libraries such as NumPy. "Extras" was batted around as a way to provide consensus APIs outside of the standard, provided such APIs could either be implemented in terms of the standard or delegated to libraries exposing a compliant API.

With this in mind, similar to the standard, we'd be best served to be biased toward parsimony, not abundance.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stated, this isn't to say that "extras" cannot have non-consensus APIs (i.e., APIs which have no precedent in array libraries). Those can be in scope. However, for APIs which do have precedent in array libraries, IMO, we should tread a bit more carefully to ensure what's implemented is aligned behavior.

@lucascolley
Copy link
Collaborator Author

lucascolley commented Sep 28, 2024

cov

This PR implements a small subset of numpy.cov, namely cov(m), to replace a private helper in SciPy.

  • NumPy, JAX and PyTorch all agree on the same default behaviour for single-argument cov (unbiased estimate)
  • No particular design choices for this API.
  • As mentioned above by @kgryte, if this function is extended to support more arguments of the existing implementations, there is potential to address some prior limitations of the NumPy API by using a correction argument like PyTorch and other statistical functions in the standard. Need for that hasn't arisen yet though, so that isn't attempted here.

Good to proceed? EDIT: edited to make mean private.

@rgommers
Copy link
Member

Why add mean? It seems from data-apis/array-api#846 that it's only an oversight that complex support wasn't added, and that can be done. I would just finish that discussion, and then likely nothing else has to change (except perhaps deleting a check in array-api-compat). Moving usages of mean over to this package and then back a little later sound like quite a bit of churn for no real reason.

@lucascolley
Copy link
Collaborator Author

We could keep complex mean private here until there is a reported use case for it, and remove it once it is supported in array-api-compat - I think that sounds like a better idea than exposing it just because we can.

@rgommers
Copy link
Member

That sounds good to me.

@lucascolley lucascolley changed the title ENH: a simple cov and complex mean ENH: a simple cov Sep 29, 2024
@lucascolley lucascolley marked this pull request as ready for review September 29, 2024 14:48
@asmeurer
Copy link
Member

and then likely nothing else has to change (except perhaps deleting a check in array-api-compat)

I guess you mean array-api-strict. Complex mean already works with the upstream libraries (at least NumPy and PyTorch, I didn't check the others), and will work with array-api-compat.

Also, since array-api-strict now supports multiple versions, we can easily add a preliminary 2024.12 version and disable the complex check in mean for that version, even before that version of the standard is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new function
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants