-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate interpretation of _Unsigned attribute #1453
Conversation
In addition to the included (basic) test I've also tested this with the real-world data that motivated the PR and #1444. While it's a working draft, I'd welcome comments on the basic approach and appropriateness of the test coverage. |
Instead of putting this alongside the |
I've created a new |
The CI fail is for 2.7/cdat/pynio in a couple of my new tests. In one, the fill value is not being applied, while in the other the unsigned conversion isn't happening. Are there any known differences in that cdat/pynio stack that would cause these to fail while others pass? |
fe541dd
to
31d13ac
Compare
…. Move _Unsigned between attributes and encoding
31d13ac
to
d6a90c7
Compare
Tests now pass after I realized I wasn't converting the _FillValue to unsigned. I also turned off PyNIO's internal support for masking, in keeping with the philosophy that Note that some of the CI builds are skipping most of their tests (e.g, py=3.4; you can tell by the run time). This is a problem in other PRs as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Can you run flake8 again to make sure there aren't any PEP8 violations. I just had one comment of substance.
We intentionally skip some I/O and plotting tests on some of the test matrix so we can test xarray with/without some optional dependencies.
xarray/conventions.py
Outdated
@@ -637,6 +665,13 @@ def maybe_encode_dtype(var, name=None): | |||
'any _FillValue to use for NaNs' % name, | |||
RuntimeWarning, stacklevel=3) | |||
data = duck_array_ops.around(data)[...] | |||
if '_Unsigned' in encoding: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is _Unsigned = False
a valid attribute?
If so, we need this to be if encoding.get('_Unsigned', False)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the documentation I assembled as part of the same issue in NetCDF4-python, it looks like _Unsigned = "true"
is present if the data are unsigned, with no mention of _Unsigned = "false"
. I haven't seen an example of False
in the wild, but, your suggestion is consistent with the patch to NetCDF4-python for this issue.
I made the change you recommended on the encode side.
On decode there is a corresponding check of the attribute, so I made a change there, too.
xarray/tests/test_backends.py
Outdated
self.assertDatasetIdentical(encoded, | ||
create_encoded_unsigned_masked_scaled_data( | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's pep8 this:
self.assertDatasetIdentical(
encoded, create_encoded_unsigned_masked_scaled_data())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. That's much better.
xarray/tests/test_backends.py
Outdated
self.assertDatasetAllClose(encoded, actual) | ||
# make sure roundtrip encoding didn't change the | ||
# original dataset. | ||
self.assertDatasetIdentical(encoded, | ||
create_encoded_masked_and_scaled_data()) | ||
create_encoded_masked_and_scaled_data() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8 / move last )
. This whole call should actually fit on one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good to go here. @shoyer - I'll let you have the final review/merge.
Sounds good, I'll take another look shortly.
…On Tue, Jul 18, 2017 at 10:46 AM, Joe Hamman ***@***.***> wrote:
***@***.**** approved this pull request.
I think we're good to go here. @shoyer <https://github.com/shoyer> - I'll
let you have the final review/merge.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1453 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1nsL9Rg-WevNBgY8i30AFt5SJGuBks5sPO-FgaJpZM4N5YIf>
.
|
xarray/conventions.py
Outdated
@@ -637,6 +665,13 @@ def maybe_encode_dtype(var, name=None): | |||
'any _FillValue to use for NaNs' % name, | |||
RuntimeWarning, stacklevel=3) | |||
data = duck_array_ops.around(data)[...] | |||
if encoding.get('_Unsigned', False): | |||
unsigned_dtype = 'i%s' % dtype.itemsize | |||
old_fill = np.asarray(attrs['_FillValue']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block should be guarded in a check to verify that _FillValue
is actually defined as an attribute.
xarray/conventions.py
Outdated
@@ -786,6 +822,16 @@ def decode_cf_variable(var, concat_characters=True, mask_and_scale=True, | |||
dimensions = dimensions[:-1] | |||
data = CharToStringArray(data) | |||
|
|||
pop_to(attributes, encoding, '_Unsigned') | |||
is_unsigned = encoding.get('_Unsigned', False) | |||
if (is_unsigned) and (mask_and_scale is True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the extra parentheses here, and use implicit boolean checks instead of is True
. So this should be: if is_unsigned and mask_and_scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xarray/conventions.py
Outdated
# Need to convert the fill_value to unsigned, too | ||
# According to the CF spec, the fill value is of the same | ||
# type as its variable, i.e. its storage format on disk | ||
fill_value = np.asarray(fill_value, dtype=data.unsigned_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply cast fill_value
to the dtype of data
unilaterally here? e.g., fill_value = np.asarray(fill_value, dtype=data.dtype)
without the if is_unsigned and has_fill
check?
That seems a little more robust to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I had to leave the has_fill
check because fill_value
will be None
in cases where there is no fill_value
attribute.
xarray/tests/test_backends.py
Outdated
self.assertDatasetAllClose(decoded, actual) | ||
with self.roundtrip(decoded, | ||
open_kwargs=dict(decode_cf=False)) as actual: | ||
# TODO: this assumes that all roundtrips will first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this redundant TODO note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xarray/tests/test_backends.py
Outdated
open_kwargs=dict(decode_cf=False)) as actual: | ||
# TODO: this assumes that all roundtrips will first | ||
# encode. Is that something we want to test for? | ||
self.assertDatasetAllClose(encoded, actual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure assertDatasetAllClose
checks dtypes. It would be good to check explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks @deeplycloudy ! |
git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new API