-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for byte order (endianness) #132
Conversation
netCDF4 variables are written with a specific byte order (endianness), typically the same as the machine writing to the dataset. In netCDF4-python this shows up in the variable's dtype.byteorder, and may be '<' (little), '>' (big) or '=' (native). For the local storage backend this generally "just works", since the numpy array returned by netCDF4-python has the correct byte ordering. For files in an S3 store, kerchunk uses h5py to read the netCDF4 dataset when converting it to a Zarr array index. In this case the byte order of the dtype on the Zarr array is not always correct, and sometimes shows up as big endian even though little endian was specified. We work around this by capturing the dtype of the dataset when reading other attributes (filters, metadata, etc.). Closes #76
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## s3-missing-data #132 +/- ##
================================================
Coverage 86.52% 86.53%
================================================
Files 8 8
Lines 579 594 +15
================================================
+ Hits 501 514 +13
- Misses 78 80 +2
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, many thanks @markgoddard 👍 🍺
#130 was merged so am merging this too 👍 |
Since byte order support was added in #132, we capture the dtype from netCDF metadata and use it in preference to the Zarr dtype when using the S3 storage backend. In the case where missing data values are passed into the Active constructor, we do not load netCDF metadata, and therefore do not capture the dtype. This leads to the following error with S3 storage: AttributeError: 'Active' object has no attribute '_dtype' This change switches to always load the metadata so that we can capture the dtype. This has the downside that it is not necessary when using local storage and missing data has been specified. This allows us to enable the test_daily_data_masked test for S3. Closes #137
netCDF4 variables are written with a specific byte order (endianness),
typically the same as the machine writing to the dataset. In
netCDF4-python this shows up in the variable's dtype.byteorder, and may
be '<' (little), '>' (big) or '=' (native).
For the local storage backend this generally "just works", since the
numpy array returned by netCDF4-python has the correct byte ordering.
For files in an S3 store, kerchunk uses h5py to read the netCDF4 dataset
when converting it to a Zarr array index. In this case the byte order of
the dtype on the Zarr array is not always correct, and sometimes shows
up as big endian even though little endian was specified. We work around
this by capturing the dtype of the dataset when reading other attributes
(filters, metadata, etc.).
Closes #76