-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Flip big endian arrays before concatenation #1068
base: main
Are you sure you want to change the base?
Fix: Flip big endian arrays before concatenation #1068
Conversation
…ilable in ESGF and in the CDS. Endianness between files is not consistent, and this raises an error when concatenating. The best way to solve this we saw is to look for the ">" that flags big endian datatypes and, if found, call numpy methods byteswap and newbyteorder to reverse the endianness of the underlying arrays."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cheers for the code changes! Please see a couple comments from me, also could I ask you to please write a unit test for the new function? Cheers 🍺
esmvalcore/preprocessor/_io.py
Outdated
if cube.dtype.byteorder == ">": | ||
logger.warning("Changing cube endianess to little. This may be " | ||
"memory intensive.") | ||
cube.data = cube.data.byteswap().newbyteorder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a flat-out data realization in memory, could you try exploring the use of core_data()
instead of data
please? That way we keep things lazy
Oh an one more thing - could you please open a PR directly into the ESMValCore repo rather than using a fork to the repo? Testing is better that way 👍 |
Docstring fix Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
Thank you @valeriupredoi Shall I write the test in tests/integration/._io? I don't see tests for this module in the tests/unit
OK, I did not know this was possible. |
Does the endianness extend to the coordinates as well? What about scalar coordinates or cell measures? |
Maybe you can put this utility function in |
Yes, it does extend. I did not take that into account since it is the variable the one that is raising the error. _io_concatenate seems to handle well the coordinates. Now, maybe the function should flip all the arrays in cube.coords() too, this would be more consistent, but I don't now how to do that, since I don't see a setter for the cube coords. I have it now solved for the data, also for the lazy case. |
Nice that you addressed the lazy thing. Could you push the related commit and we can continue the discussion from there? |
I think it is almost ready. Please have a look. I managed to fix the coordinates and bounds too. Apart from the unit test, I tested in with the CORDEX data that were causing the error and it worked fine. |
Nice and good dig! Depending on your ambition, you might want to consider adding this functionality to dask itself by addressing the issue you linked. Then we could do away with the type detection workaround that you had to put in place here. Do we need to take cell_measures into account as well? This would typically play a role for |
I think it would be good to open an issue about this in iris and see if can be solved there. |
Description
This fix is needed to support the CCLM CORDEX model data as it is available in ESGF and in the CDS. Endianness between files is not consistent, and this raises an error when concatenating. The best way to solve this we saw is to look for the ">" that flags big endian datatypes and, if found, call numpy methods byteswap and newbyteorder to reverse the endianness of the underlying arrays.
Link to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.