-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default value for correction parameter (aka. ddof)? #695
Comments
@vnmabus Thanks for filing this issue. For context, the signatures for Accordingly, when standardizing we went with what the majority of PyData libraries currently used as the default value in order to best preserve backward compatibility. In terms of compatibility with other languages or non-array libraries, that is not an explicit goal of the standard, but is something we consider more generally but not to the degree that cross-language compat takes priority over other considerations, such as minimizing ecosystem breakages. Regardless, in this case, I'd argue that it is better for |
One more comment: now that the specification has standardized the default for |
I'll go ahead and close this. Due to backward compatibility concerns, I don't believe we could action on changing the default correction value even if we wanted to. |
I see that in the standard the
ddof
parameter of thevar
andstd
functions has been standardized with the name ofcorrection
, as the prior name was not very clear.However, I saw no discussion over the default value of that parameter. The standard has
0
as the default, corresponding to a biased estimator. This is opposed to the common behavior found in other languages, such as R, Matlab or Julia. Even in Python, there does not seem to be a consensus, with NumPy (and some libraries inspired by it) using a biased estimator, while Pandas, Polars, and some libraries that evolved separately, such as Pytorch, use the unbiased estimator by default.In statistics it seems that normalizing by$N-1$ is usually the preferred option, and nowadays in some cases the estimator is even defined using that denominator.
Can someone explain to me what are the advantages of using
0
as the default? Right now I only see potential for confusion and bugs from people which obtain different results in different languages. Is it too late to change it?The text was updated successfully, but these errors were encountered: