Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the correct fits image dtype #531

Merged
merged 3 commits into from
Dec 5, 2024
Merged

Conversation

bamford
Copy link
Contributor

@bamford bamford commented Nov 29, 2024

Currently, when kerchunk.fits.process_file is used on a FITS image containing floating-point data, the dtype is not identified as bigendian. This PR corrects the lookup table so it specifies bigendian float dtypes.

A test is provided to illustrate the issue and prevent recursion.

@bamford bamford changed the title Fits image dtype Use the correct fits image dtype Nov 29, 2024
@bamford bamford marked this pull request as ready for review November 29, 2024 16:32
@martindurant
Copy link
Member

Thanks for submitting this. Please let me know when it's ready.

@bamford
Copy link
Contributor Author

bamford commented Dec 3, 2024

Hi @martindurant, I think the PR is ready. It is a tiny change, and the relevant tests pass. I guess no one has been trying kerchunk on floating point FITS images yet. The failing tests seem to be due to something else.

Just to let you know my use case... In the past, I've converted FITS images to zarr so I can process them as an on-disk, dask-chunked, xarray Dataset, with all the benefits that implies. I'm now dealing with a large dataset (the Euclid survey) and I was looking for a way to do the same thing without doubling up on disk space. Fortunately, I discovered kerchunk and it is perfect! It took a little while to work out the details, but the documentation was very helpful. I've now got it working for a pile of fairly complicated, multi-extension fits files, and have run a quick test to show I can do efficient calculations over large numbers of files. Many thanks to you and the team.

@martindurant
Copy link
Member

I've now got it working for a pile of fairly complicated, multi-extension fits files, and have run a quick test to show I can do efficient calculations over large numbers of files

Would love to see a notebook or any sharable material out of this!

@martindurant
Copy link
Member

@emfdavid , any idea why "s3://noaahrrr-bdp-pds/hrrr.20220804/conus/hrrr.t01z.wrfsfcf01.grib2" would no longer be publicly accessable?

@akrherz
Copy link

akrherz commented Dec 3, 2024

noaahrrr-bdp-pds

Just a typo and should be noaa-hrrr-bdp-pds ?

@martindurant
Copy link
Member

Oh, I see the context now: the URL is intentionally wrong, but it now raises PermissionError rather than FileNotFound. I think for the purposes of the test, either will do:

    def test_parse_grib_idx_no_file():
        with pytest.raises((FileNotFoundError, PermissionError)):

@emfdavid
Copy link
Contributor

emfdavid commented Dec 4, 2024

The behavior may be different depending on whether the bucket or the blob doesn't exist?

If you are making an intentional not found test, put some answer in your question for your future self to debug and make the key something like "s3://noaahrrr-bdp-pds/hrrr.20220804/definitely_doesnt_exist_test/hrrr.t01z.wrfsfcf01.grib2"

@martindurant
Copy link
Member

It may be a behaviour change in S3, but I suspect that the bucket does now exists, but we can't read from it, whereas before it didn't exist. Either way, changing the line in the test as I indicated will fix this.

@bamford
Copy link
Contributor Author

bamford commented Dec 4, 2024

I've now got it working for a pile of fairly complicated, multi-extension fits files, and have run a quick test to show I can do efficient calculations over large numbers of files

Would love to see a notebook or any sharable material out of this!

Here is a gist of an example notebook.. You won't be able to run it, but it illustrates what I needed to do to get a particular set of FITS files into a meaningful structure. _rename is a bit hacky (as is _fix_byte_order) and would be unnecessary with a bit more flexibility in kerchunk.fits.process_files.

See discussion in otherwise unrelated PR fsspec#531.
@bamford
Copy link
Contributor Author

bamford commented Dec 4, 2024

It may be a behaviour change in S3, but I suspect that the bucket does now exists, but we can't read from it, whereas before it didn't exist. Either way, changing the line in the test as I indicated will fix this.

I've made this change, but not sure if you wanted me to fix it here. If not, I can revert.

@martindurant martindurant merged commit 013798e into fsspec:main Dec 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants