Add support for byte order (endianness) #132

markgoddard · 2023-08-03T12:36:58Z

netCDF4 variables are written with a specific byte order (endianness),
typically the same as the machine writing to the dataset. In
netCDF4-python this shows up in the variable's dtype.byteorder, and may
be '<' (little), '>' (big) or '=' (native).

For the local storage backend this generally "just works", since the
numpy array returned by netCDF4-python has the correct byte ordering.

For files in an S3 store, kerchunk uses h5py to read the netCDF4 dataset
when converting it to a Zarr array index. In this case the byte order of
the dtype on the Zarr array is not always correct, and sometimes shows
up as big endian even though little endian was specified. We work around
this by capturing the dtype of the dataset when reading other attributes
(filters, metadata, etc.).

Closes #76

netCDF4 variables are written with a specific byte order (endianness), typically the same as the machine writing to the dataset. In netCDF4-python this shows up in the variable's dtype.byteorder, and may be '<' (little), '>' (big) or '=' (native). For the local storage backend this generally "just works", since the numpy array returned by netCDF4-python has the correct byte ordering. For files in an S3 store, kerchunk uses h5py to read the netCDF4 dataset when converting it to a Zarr array index. In this case the byte order of the dtype on the Zarr array is not always correct, and sometimes shows up as big endian even though little endian was specified. We work around this by capturing the dtype of the dataset when reading other attributes (filters, metadata, etc.). Closes #76

codecov · 2023-08-03T12:40:31Z

Codecov Report

Patch coverage: 89.47% and no project coverage change.

Comparison is base (8aae749) 86.52% compared to head (bbcfe46) 86.53%.

Additional details and impacted files

@@               Coverage Diff                @@
##           s3-missing-data     #132   +/-   ##
================================================
  Coverage            86.52%   86.53%           
================================================
  Files                    8        8           
  Lines                  579      594   +15     
================================================
+ Hits                   501      514   +13     
- Misses                  78       80    +2

Files Changed	Coverage Δ
activestorage/reductionist.py	`87.17% <75.00%> (-1.40%)`	⬇️
activestorage/active.py	`92.68% <100.00%> (+0.07%)`	⬆️
activestorage/dummy_data.py	`89.21% <100.00%> (+0.55%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

markgoddard · 2023-08-03T12:50:24Z

(Please merge #130 before merging this, since this PR depends on and merges into the s3-missing-data branch. We can change the target branch once #130 merges)

valeriupredoi

looking good, many thanks @markgoddard 👍 🍺

valeriupredoi · 2023-08-11T11:25:25Z

#130 was merged so am merging this too 👍

Since byte order support was added in #132, we capture the dtype from netCDF metadata and use it in preference to the Zarr dtype when using the S3 storage backend. In the case where missing data values are passed into the Active constructor, we do not load netCDF metadata, and therefore do not capture the dtype. This leads to the following error with S3 storage: AttributeError: 'Active' object has no attribute '_dtype' This change switches to always load the metadata so that we can capture the dtype. This has the downside that it is not necessary when using local storage and missing data has been specified. This allows us to enable the test_daily_data_masked test for S3. Closes #137

markgoddard self-assigned this Aug 3, 2023

markgoddard requested a review from valeriupredoi August 3, 2023 12:37

markgoddard mentioned this pull request Aug 3, 2023

netcdf4-python support for s3 #95

Open

Base automatically changed from s3-missing-data to main August 8, 2023 15:22

valeriupredoi approved these changes Aug 11, 2023

View reviewed changes

valeriupredoi merged commit 8adf0f0 into main Aug 11, 2023
4 of 5 checks passed

valeriupredoi deleted the byte-order branch August 11, 2023 11:25

This was referenced Aug 15, 2023

Passing missing data values to constructor fails with S3 storage backend #137

Closed

Always load netCDF metadata #138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for byte order (endianness) #132

Add support for byte order (endianness) #132

markgoddard commented Aug 3, 2023

codecov bot commented Aug 3, 2023

markgoddard commented Aug 3, 2023

valeriupredoi left a comment

valeriupredoi commented Aug 11, 2023

Add support for byte order (endianness) #132

Add support for byte order (endianness) #132

Conversation

markgoddard commented Aug 3, 2023

codecov bot commented Aug 3, 2023

Codecov Report

markgoddard commented Aug 3, 2023

valeriupredoi left a comment

Choose a reason for hiding this comment

valeriupredoi commented Aug 11, 2023