Dataset.copy() drops encoding #1586

crusaderky · 2017-09-22T20:58:30Z

ds = Dataset()
ds.encoding = {"unlimited_dims": 'x'}
ds.copy().encoding
{}

By looking at dataset.py, there's a lot of calls to Dataset._construct_direct that omit the encoding. Is it correct to add it in all cases?

The text was updated successfully, but these errors were encountered:

crusaderky · 2017-09-24T17:36:21Z

By looking at the code, copy() is the only case where it is proper to pass the encoding to _construct_direct, but somebody please confirm

shoyer · 2017-09-25T01:20:08Z

I think this was intentional at one point, but to be honest we never carefully defined the semantics for preserving encoding are.

My original thought (along the lines of some of my comments in #1297), was that we should not propagate encoding in cases where it might no longer be valid, so it should be dropped from most operations. Essentially, encoding should only stay with original files loaded from disk. Hence why it wasn't copied in .copy().

That said, I can see why this rule would be confusing and we haven't done a good job of enforcing it. Possibly a better policy would be "encoding is copied whenever attrs is copied".

jhamman · 2017-09-25T03:50:19Z

I agree that we haven't done a good job defining how encoding should behave. I agree that any modification of the shape/type of an xarray object should probably drop the original encoding. I don't know if a copy should drop the encoding though. I don't see why we shouldn't be able rountrip datasets via a open/load/copy/write workflow.

crusaderky · 2017-10-06T20:10:33Z

Can we reach a resolution on this? It's blocking #1551...

jhamman · 2017-10-07T03:29:48Z

@crusaderky - After thinking about it, I'm still a 👍 on copying the encoding in this case.

shoyer · 2017-10-07T17:52:22Z

I'm OK copying encoding, but we do still need to figure out general rules for propagating it.

This reverts commit f99313c.

* Load non-index coords to memory ahead of concat * Update unit test after #1522 * Minimise loads on concat. Extend new concat logic to data_vars. * Trivial tweaks * Added unit tests Fix loads when vars are found different halfway through * Add xfail for #1586 * Revert "Add xfail for #1586" This reverts commit f99313c.

crusaderky mentioned this issue Sep 22, 2017

Load nonindex coords ahead of concat() #1551

Merged

4 tasks

jhamman added the bug label Sep 24, 2017

crusaderky mentioned this issue Sep 24, 2017

Dataset.copy() to preserve encoding #1590

Merged

4 tasks

shoyer added API design design question and removed bug labels Sep 26, 2017

crusaderky pushed a commit to crusaderky/xarray that referenced this issue Oct 6, 2017

Add xfail for pydata#1586

f99313c

shoyer closed this as completed in #1590 Oct 8, 2017

crusaderky pushed a commit to crusaderky/xarray that referenced this issue Oct 9, 2017

Revert "Add xfail for pydata#1586"

2f80cef

This reverts commit f99313c.

jhamman mentioned this issue Oct 9, 2017

Rules for propagating attrs and encoding #1614

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset.copy() drops encoding #1586

Dataset.copy() drops encoding #1586

crusaderky commented Sep 22, 2017

crusaderky commented Sep 24, 2017

shoyer commented Sep 25, 2017

jhamman commented Sep 25, 2017

crusaderky commented Oct 6, 2017

jhamman commented Oct 7, 2017

shoyer commented Oct 7, 2017

Dataset.copy() drops encoding #1586

Dataset.copy() drops encoding #1586

Comments

crusaderky commented Sep 22, 2017

crusaderky commented Sep 24, 2017

shoyer commented Sep 25, 2017

jhamman commented Sep 25, 2017

crusaderky commented Oct 6, 2017

jhamman commented Oct 7, 2017

shoyer commented Oct 7, 2017