Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pyhdf-based arrs to be manually tokenized #2886

Merged
merged 3 commits into from
Aug 23, 2024

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Aug 22, 2024

This avoids a bug in dask or cloudpickle that alters the state of the pyhdf SDS object in some way making it unusable. The PR in dask that started triggering this was in dask/dask#11320. My guess is that the repeated pickling/serialization is losing some hidden state in the pyhdf SDS object and then pyhdf or HDF-C no longer knows how to work with it.

We could register SDS with normalize_token in dask or we could just do what I do in this PR and come up with the token/name ourselves. Note this is the name for the dask array/task. This is similar to work done in the past by @mraspaud for the HDF5 utility package to make sure things are consistent across file variable loads.

One alternative to the PR as it is right now would be to copy from_sds as a method on the HDF utility class so it knows how to use self.filename automatically for the tokenizing. Also I'm realizing maybe the src_path shouldn't be required as it is only used if the name kwarg isn't provided. Thoughts @mraspaud @gerritholl ?

This avoids a bug in dask or cloudpickle that alters the state of the pyhdf SDS object in some way making it unusable.
Copy link

codecov bot commented Aug 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.05%. Comparing base (6860030) to head (5e27be4).
Report is 69 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2886      +/-   ##
==========================================
+ Coverage   95.99%   96.05%   +0.06%     
==========================================
  Files         368      370       +2     
  Lines       53985    54320     +335     
==========================================
+ Hits        51821    52177     +356     
+ Misses       2164     2143      -21     
Flag Coverage Δ
behaviourtests 3.99% <0.00%> (-0.03%) ⬇️
unittests 96.15% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Aug 22, 2024

Pull Request Test Coverage Report for Build 10527846171

Details

  • 19 of 19 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.001%) to 96.151%

Totals Coverage Status
Change from base Build 10404388488: 0.001%
Covered Lines: 52408
Relevant Lines: 54506

💛 - Coveralls

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks a lot for sorting this out.
I have a suggestion inline

satpy/readers/hdf4_utils.py Show resolved Hide resolved
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@djhoese djhoese merged commit 4a75b65 into pytroll:main Aug 23, 2024
17 of 18 checks passed
@djhoese djhoese deleted the bugfix-pyhdf-tokenize branch August 23, 2024 15:25
@maxrjones maxrjones mentioned this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MODIS and SEADAS test failures
4 participants