Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whether should we follow pandas or numpy if they have different API? #1886

Closed
fujiisoup opened this issue Feb 4, 2018 · 4 comments
Closed

Comments

@fujiisoup
Copy link
Member

In working with #1883, I noticed that our (and numpy's) std and var work differently from pandas.
Pandas assumes ddof=1 as default, while we assume ddof=0.
Do we want to match these behaviors to pandas?

In [1]: import numpy as np
   ...: import xarray as xr
   ...: da = xr.DataArray([0, 1, 2], dims='x', name='da')

In [2]: da.std()
Out[2]: 
<xarray.DataArray 'da' ()>
array(0.816496580927726)

In [3]: da.to_dataframe().std()
Out[3]: 
da    1.0
dtype: float64

In [4]: da.std(ddof=1)
Out[4]: 
<xarray.DataArray 'da' ()>
array(1.0)
@fujiisoup fujiisoup changed the title Default ddof for std and var Whether should we follow pandas or numpy if they have different API? Feb 4, 2018
@fujiisoup
Copy link
Member Author

fujiisoup commented Feb 4, 2018

I also noticed that pandas' argmin and argmax behave differently from those of numpy.
I think we should follow pandas because the extension of pandas is our core concept.
Currently, our argmin and argmax behave as those of numpy.

@max-sixty
Copy link
Collaborator

Re the argmin / argmax - I think pandas is changing: pandas-dev/pandas#16830

@shoyer
Copy link
Member

shoyer commented Feb 4, 2018

I think a distinction between argmin/idxmin is useful -- and it seems that pandas is likely to restore this in the future as well.

The difference between ddof=0 and ddof=1 has never mattered to me. I suppose it probably does make more sense for us to follow pandas here (it's what we usually do), but I don't know if it's worth the trouble of changing it.

@fujiisoup
Copy link
Member Author

Thanks.
I did not notice pandas issue. OK. I will follow numpy's API for argmin/argmax.

The difference between ddof=0 and ddof=1 has never mattered to me

Me neither.

but I don't know if it's worth the trouble of changing it.

OK. Agreed. I keep the current API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants