ENH: a simple `cov` #10

lucascolley · 2024-09-25T20:51:25Z

This would replace a private utility in SciPy.

src/array_api_extra/_funcs.py

kgryte · 2024-09-25T21:39:14Z

src/array_api_extra/_funcs.py

@@ -46,3 +47,48 @@ def atleast_nd(x: Array, *, ndim: int, xp: ModuleType) -> Array:
        x = xp.expand_dims(x, axis=0)
        x = atleast_nd(x, ndim=ndim, xp=xp)
    return x
+
+
+def cov(x: Array, *, xp: ModuleType) -> Array:


While this implementation and API is constrained enough, I would exercise caution when adding such APIs to ensure that the design of whatever is included in this library is consistent with Array API design principles and does not impose limitations on future standardization.

While not applicable in this specific instance, I would like to avoid a scenario where array_api_extra provides a backdoor to de facto standardized APIs which have design choices that we wouldn't want to endorse were we standardizing directly in the standard. Meaning, I'd like to avoid getting backed into a corner.

In this case, the particular design aspect which comes to mind is the behavior of a correction argument, similar to what we have in std and var.

In general, my current thinking is that, so long as this is under the Consortium umbrella, potential APIs need to go through a somewhat similar "standardization" proposal and inclusion process. That means comparative library analysis, discussion of design choices, and whether inclusion presents an opportunity to address limitations in prior art.

This is a very good point, and we need to think more about API stability here. At the same time, this could to some degree defeat the purpose of the library if we have to do a full standardization process for every function. I wonder if there's a way we could organize things here so that API stability isn't as big of a concern. For instance, we could make it so that array API extra is only supported as a vendored library (or requiring people to just copy-paste the implementations). Or we could organize the code in a way so that different versions of APIs are possible.

Thanks for the feedback @kgryte ! Agreed that we want to avoid creating de facto standardized APIs here with less care than would be put into the standard. I can see how it would be bad to guide consumer libraries towards whatever APIs are here, just to tell them that they need to change again when a more considered API is standardized.

I do think that this library should still be able to move more quickly than the standard though - the primary motivation being providing functionality to consumer libraries via some reasonable API, rather than convincing array libraries that this is the "correct" API (there is no need since everything is implemented in terms of the standard). Having a non-ideal API here is (arguably) still better than making consumer library maintainers search through private modules of other libraries like SciPy or implement the function themselves, provided that we can sufficiently communicate that an API being a part of this library does not mean that it is standardized.

If we were to go through a very similar process to the standard, and the result was consensus on an API, then it should probably just be added to the standard itself!

I think one way we could try to avoid creating de facto standardized APIs is being quite liberal in adding kwargs to vary behaviour when there are competing ideas (i.e. competing use-cases in consumer libraries). Then, while functions live in this library, consumer libraries can pick and choose the functionality they need without any particular mode being "standard". If at some point some "extra" functionality is considered for standardization, then the normal process would apply, but with the additional information of how the API from this library has been used.

That means comparative library analysis, discussion of design choices, and whether inclusion presents an opportunity to address limitations in prior art.

This would be fantastic to see, but given that I'm heading back to university soon, I am unlikely to be able to contribute the necessary time to carry out that process rigorously for the foreseeable future. So development here may be quite slow. Hopefully you can get others to fill that role while I'm too busy! I do think that this library will be an important part of making adoption of the standard more easy for consumer libraries.

EDIT: The same point about time applies to other potential contributors. I think we may be able to create some nice collaboration where consumer libraries take from and give back to xpx by writing a function or two, which in turn will help other consumer libraries. I think we are less likely to convince consumer library maintainers to enter discussions about an ideal API (they may well not really care about the API at all, they just need something that works).

I think one way we could try to avoid creating de facto standardized APIs is being quite liberal in adding kwargs to vary behaviour when there are competing ideas (i.e. competing use-cases in consumer libraries).

I'd go the opposite way: namely, being very conservative in terms of what behavior is supported.

For some background on Array API "extras", this idea has been discussed during workgroup meetings at various points for different reasons. One of those reasons is a potential place for APIs which have broad, but not ubiquitous, consensus among ecosystem libraries. E.g., a number of ecosystem libraries have other special functions, such as gamma, erf, beta, etc. Some of these are unlikely to be standardized directly in the standard due to the NumPy/SciPy split. Other libraries have APIs associated with deep learning applications (e.g., various loss functions), but these are unlikely to ever be directly added to more general purpose libraries such as NumPy. "Extras" was batted around as a way to provide consensus APIs outside of the standard, provided such APIs could either be implemented in terms of the standard or delegated to libraries exposing a compliant API.

With this in mind, similar to the standard, we'd be best served to be biased toward parsimony, not abundance.

This stated, this isn't to say that "extras" cannot have non-consensus APIs (i.e., APIs which have no precedent in array libraries). Those can be in scope. However, for APIs which do have precedent in array libraries, IMO, we should tread a bit more carefully to ensure what's implemented is aligned behavior.

lucascolley · 2024-09-28T15:55:56Z

`cov`

This PR implements a small subset of numpy.cov, namely cov(m), to replace a private helper in SciPy.

NumPy, JAX and PyTorch all agree on the same default behaviour for single-argument cov (unbiased estimate)
No particular design choices for this API.
As mentioned above by @kgryte, if this function is extended to support more arguments of the existing implementations, there is potential to address some prior limitations of the NumPy API by using a correction argument like PyTorch and other statistical functions in the standard. Need for that hasn't arisen yet though, so that isn't attempted here.

Good to proceed? EDIT: edited to make mean private.

rgommers · 2024-09-28T18:57:49Z

Why add mean? It seems from data-apis/array-api#846 that it's only an oversight that complex support wasn't added, and that can be done. I would just finish that discussion, and then likely nothing else has to change (except perhaps deleting a check in array-api-compat). Moving usages of mean over to this package and then back a little later sound like quite a bit of churn for no real reason.

lucascolley · 2024-09-28T19:05:54Z

We could keep complex mean private here until there is a reported use case for it, and remove it once it is supported in array-api-compat - I think that sounds like a better idea than exposing it just because we can.

rgommers · 2024-09-28T19:08:48Z

That sounds good to me.

asmeurer · 2024-09-30T19:49:29Z

and then likely nothing else has to change (except perhaps deleting a check in array-api-compat)

I guess you mean array-api-strict. Complex mean already works with the upstream libraries (at least NumPy and PyTorch, I didn't check the others), and will work with array-api-compat.

Also, since array-api-strict now supports multiple versions, we can easily add a preliminary 2024.12 version and disable the complex check in mean for that version, even before that version of the standard is released.

ENH: a simple cov

ea98b18

lucascolley changed the title ~~ENH: a simple cov~~ ENH: a simple cov and complex mean Sep 25, 2024

lucascolley added enhancement New feature or request new function labels Sep 25, 2024

kgryte reviewed Sep 25, 2024

View reviewed changes

src/array_api_extra/_funcs.py Show resolved Hide resolved

kgryte reviewed Sep 25, 2024

View reviewed changes

lucascolley mentioned this pull request Sep 25, 2024

RFC: add support for complex input to mean data-apis/array-api#846

Open

TST: cov: add another test

0d05944

DOC: cov, mean: WIP docstrings

28e6f86

lucascolley added 2 commits September 29, 2024 15:34

API: make mean private

a53cfb8

DOC: cov: finish docstring

5b3b8d5

lucascolley changed the title ~~ENH: a simple cov and complex mean~~ ENH: a simple cov Sep 29, 2024

lucascolley marked this pull request as ready for review September 29, 2024 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: a simple `cov` #10

ENH: a simple `cov` #10

lucascolley commented Sep 25, 2024

kgryte Sep 25, 2024

asmeurer Sep 25, 2024

lucascolley Sep 25, 2024 •

edited

Loading

kgryte Sep 25, 2024

kgryte Sep 25, 2024

lucascolley commented Sep 28, 2024 •

edited

Loading

rgommers commented Sep 28, 2024

lucascolley commented Sep 28, 2024

rgommers commented Sep 28, 2024

asmeurer commented Sep 30, 2024

ENH: a simple cov #10

Are you sure you want to change the base?

ENH: a simple cov #10

Conversation

lucascolley commented Sep 25, 2024

kgryte Sep 25, 2024

Choose a reason for hiding this comment

asmeurer Sep 25, 2024

Choose a reason for hiding this comment

lucascolley Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

kgryte Sep 25, 2024

Choose a reason for hiding this comment

kgryte Sep 25, 2024

Choose a reason for hiding this comment

lucascolley commented Sep 28, 2024 • edited Loading

cov

rgommers commented Sep 28, 2024

lucascolley commented Sep 28, 2024

rgommers commented Sep 28, 2024

asmeurer commented Sep 30, 2024

ENH: a simple `cov` #10

ENH: a simple `cov` #10

lucascolley Sep 25, 2024 •

edited

Loading

lucascolley commented Sep 28, 2024 •

edited

Loading

`cov`