Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.copy() drops encoding #1586

Closed
crusaderky opened this issue Sep 22, 2017 · 6 comments
Closed

Dataset.copy() drops encoding #1586

crusaderky opened this issue Sep 22, 2017 · 6 comments

Comments

@crusaderky
Copy link
Contributor

ds = Dataset()
ds.encoding = {"unlimited_dims": 'x'}
ds.copy().encoding
{}

By looking at dataset.py, there's a lot of calls to Dataset._construct_direct that omit the encoding. Is it correct to add it in all cases?

@crusaderky
Copy link
Contributor Author

By looking at the code, copy() is the only case where it is proper to pass the encoding to _construct_direct, but somebody please confirm

@shoyer
Copy link
Member

shoyer commented Sep 25, 2017

I think this was intentional at one point, but to be honest we never carefully defined the semantics for preserving encoding are.

My original thought (along the lines of some of my comments in #1297), was that we should not propagate encoding in cases where it might no longer be valid, so it should be dropped from most operations. Essentially, encoding should only stay with original files loaded from disk. Hence why it wasn't copied in .copy().

That said, I can see why this rule would be confusing and we haven't done a good job of enforcing it. Possibly a better policy would be "encoding is copied whenever attrs is copied".

@jhamman
Copy link
Member

jhamman commented Sep 25, 2017

I agree that we haven't done a good job defining how encoding should behave. I agree that any modification of the shape/type of an xarray object should probably drop the original encoding. I don't know if a copy should drop the encoding though. I don't see why we shouldn't be able rountrip datasets via a open/load/copy/write workflow.

@crusaderky
Copy link
Contributor Author

Can we reach a resolution on this? It's blocking #1551...

crusaderky pushed a commit to crusaderky/xarray that referenced this issue Oct 6, 2017
@jhamman
Copy link
Member

jhamman commented Oct 7, 2017

@crusaderky - After thinking about it, I'm still a 👍 on copying the encoding in this case.

@shoyer
Copy link
Member

shoyer commented Oct 7, 2017

I'm OK copying encoding, but we do still need to figure out general rules for propagating it.

crusaderky pushed a commit to crusaderky/xarray that referenced this issue Oct 9, 2017
jhamman pushed a commit that referenced this issue Oct 9, 2017
* Load non-index coords to memory ahead of concat

* Update unit test after #1522

* Minimise loads on concat. Extend new concat logic to data_vars.

* Trivial tweaks

* Added unit tests

Fix loads when vars are found different halfway through

* Add xfail for #1586

* Revert "Add xfail for #1586"

This reverts commit f99313c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants