Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ddof=1 for std & var #8566

Open
max-sixty opened this issue Dec 21, 2023 · 2 comments
Open

Use ddof=1 for std & var #8566

max-sixty opened this issue Dec 21, 2023 · 2 comments
Labels
topic-arrays related to flexible array support

Comments

@max-sixty
Copy link
Collaborator

What is your issue?

I've discussed this a bunch with @dcherian (though I'm not sure he necessarily agrees, I'll let him comment)

Currently xarray uses ddof=0 for std & var. This is:

  • Rarely what someone actually wants — xarray data is almost always a sample of some underlying distribution, for which ddof=1 is correct
  • Inconsistent with pandas

OTOH:

  • It is consistent with numpy
  • It wouldn't be a painless change — folks who don't read deprecation messages would see values change very slightly

Any thoughts?

@TomNicholas
Copy link
Member

Note also that the ddof argument doesn't even appear in the array API standard for std (or for var), and it appears that standard also uses ddof=0 by default right now.

In [1]: import numpy as np

In [2]: np_arr = np.array([1.0, 2.0, 3.0])

In [3]: np.std(np_arr, ddof=0)
Out[3]: 0.816496580927726

In [4]: np.std(np_arr, ddof=1)
Out[4]: 1.0

In [5]: from numpy import array_api
<ipython-input-5-54c229be796f>:1: UserWarning: The numpy.array_api submodule is still experimental. See NEP 47.
  from numpy import array_api

In [6]: arr = array_api.asarray([1.0, 2.0, 3.0])

In [7]: array_api.std(arr)
Out[7]: Array(0.81649658, dtype=float64)

This is actually currently a bug in xarray's support of the array API standard, because it seems xarray tries to pass the ddof argument explicitly. (see #6903)

I guess I would like to see ddof appear in the array API standard before we consider changing xarray's choice of it away from the default, otherwise you break API standard compatibility until it is included.

@TomNicholas TomNicholas added the topic-arrays related to flexible array support label Dec 21, 2023
@TomNicholas
Copy link
Member

TomNicholas commented Dec 27, 2023

Actually it seems ddof was renamed to correction in the array API standard, and a default value of 0 was chosen (see data-apis/array-api#695).

We should therefore start supporting correction in order to pass it down to array-api-compatible libraries, but we should also support ddof for backwards compatibility (until it hopefully gets deprecated at some point in the distant future I guess). There should be a joint default for both, which I think should be 0, but we would be able to change this as you suggest @max-sixty . What's the correct place in the codebase to put that logic though? Given that it requires checking which kwarg the array type supports before calling the function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-arrays related to flexible array support
Projects
None yet
Development

No branches or pull requests

2 participants