Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Guaranteed zero-copy round-trip from numpy? #3077

Closed
amueller opened this issue Jul 3, 2019 · 2 comments
Closed

Question: Guaranteed zero-copy round-trip from numpy? #3077

amueller opened this issue Jul 3, 2019 · 2 comments

Comments

@amueller
Copy link

amueller commented Jul 3, 2019

This is a question about casting from and to numpy. I asked a similar question for pandas here: pandas-dev/pandas#27211

The question is whether we can rely on having zero-copy wrapping and unwrapping of numpy arrays into DataArray, i.e. is it future proof to assume something like

import xarray as xr
import numpy as np

X = np.random.uniform(size=(10000, 10))
X_xr = xr.DataArray(X)
X_again = np.asarray(X_xr)
print(X.__array_interface__['data'][0] == X_again.__array_interface__['data'][0])
True

will always be true and no copy is happening?

Context: We want to attach some meta-data to our numpy arrays, in particular I'm interested in column names. Pandas is an obvious candidate for doing that, as we only have 2d array most of the time. However, pandas might change their internal structure so that we can't do zero copy wrapping and unwrapping any more.

Xarray is another candidate, even though it's a bit unnatural given that our data is usually 2d.
This is a design decision that's very hard to undo, so I want to make sure that it's reasonably future-proof if we want to consider using DataArray as a possible output format.

@shoyer
Copy link
Member

shoyer commented Jul 4, 2019

Xarray currently only converts NumPy arrays with very particular dtypes:

  • object arrays will sometimes get converted to more specific dtypes (using pandas's rules)
  • datetime64 and timedelta64 arrays get converted into ns precision

I imagine we might add special cases like this in the future for esoteric dtypes, but numeric arrays will always be guaranteed to use views, both when creating a DataArray and casting it into a NumPy array.

(Pandas not being able to guarantee this was one of my motivations for writing xarray in the first place...)

@amueller
Copy link
Author

amueller commented Jul 5, 2019

Thank you @shoyer, that's very useful input. It seems that xarray would fulfill our requirements and so at least is a reasonable candidate for us.

@amueller amueller closed this as completed Jul 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants