Update pyhdf-based arrs to be manually tokenized #2886

djhoese · 2024-08-22T17:23:49Z

This avoids a bug in dask or cloudpickle that alters the state of the pyhdf SDS object in some way making it unusable. The PR in dask that started triggering this was in dask/dask#11320. My guess is that the repeated pickling/serialization is losing some hidden state in the pyhdf SDS object and then pyhdf or HDF-C no longer knows how to work with it.

We could register SDS with normalize_token in dask or we could just do what I do in this PR and come up with the token/name ourselves. Note this is the name for the dask array/task. This is similar to work done in the past by @mraspaud for the HDF5 utility package to make sure things are consistent across file variable loads.

One alternative to the PR as it is right now would be to copy from_sds as a method on the HDF utility class so it knows how to use self.filename automatically for the tokenizing. Also I'm realizing maybe the src_path shouldn't be required as it is only used if the name kwarg isn't provided. Thoughts @mraspaud @gerritholl ?

Closes MODIS and SEADAS test failures #2884
Tests added
Fully documented

This avoids a bug in dask or cloudpickle that alters the state of the pyhdf SDS object in some way making it unusable.

codecov · 2024-08-22T17:31:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.05%. Comparing base (6860030) to head (5e27be4).
Report is 69 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2886      +/-   ##
==========================================
+ Coverage   95.99%   96.05%   +0.06%     
==========================================
  Files         368      370       +2     
  Lines       53985    54320     +335     
==========================================
+ Hits        51821    52177     +356     
+ Misses       2164     2143      -21

Flag	Coverage Δ
behaviourtests	`3.99% <0.00%> (-0.03%)`	⬇️
unittests	`96.15% <100.00%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coveralls · 2024-08-22T22:13:18Z

Pull Request Test Coverage Report for Build 10527846171

Details

19 of 19 (100.0%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.001%) to 96.151%

Totals
Change from base Build 10404388488:	0.001%
Covered Lines:	52408
Relevant Lines:	54506

💛 - Coveralls

mraspaud

Looks good, thanks a lot for sorting this out.
I have a suggestion inline

satpy/readers/hdf4_utils.py

mraspaud

Lgtm

Update pyhdf-based arrs to be manually tokenized

e7d5656

This avoids a bug in dask or cloudpickle that alters the state of the pyhdf SDS object in some way making it unusable.

djhoese added bug component:readers labels Aug 22, 2024

djhoese requested review from mraspaud and gerritholl August 22, 2024 17:23

djhoese self-assigned this Aug 22, 2024

mraspaud reviewed Aug 23, 2024

View reviewed changes

satpy/readers/hdf4_utils.py Show resolved Hide resolved

djhoese added 2 commits August 23, 2024 09:46

Remove unnecessary *args from from_sds function

8a9d85c

Fix missed use of *args

5e27be4

mraspaud approved these changes Aug 23, 2024

View reviewed changes

djhoese merged commit 4a75b65 into pytroll:main Aug 23, 2024
17 of 18 checks passed

djhoese deleted the bugfix-pyhdf-tokenize branch August 23, 2024 15:25

maxrjones mentioned this pull request Aug 27, 2024

kerchunk interop #2889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pyhdf-based arrs to be manually tokenized #2886

Update pyhdf-based arrs to be manually tokenized #2886

djhoese commented Aug 22, 2024

codecov bot commented Aug 22, 2024 •

edited

Loading

coveralls commented Aug 22, 2024 •

edited

Loading

mraspaud left a comment

mraspaud left a comment

Update pyhdf-based arrs to be manually tokenized #2886

Update pyhdf-based arrs to be manually tokenized #2886

Conversation

djhoese commented Aug 22, 2024

codecov bot commented Aug 22, 2024 • edited Loading

Codecov Report

coveralls commented Aug 22, 2024 • edited Loading

Pull Request Test Coverage Report for Build 10527846171

Details

💛 - Coveralls

mraspaud left a comment

Choose a reason for hiding this comment

mraspaud left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 22, 2024 •

edited

Loading

coveralls commented Aug 22, 2024 •

edited

Loading