Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Flip big endian arrays before concatenation #1068

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

markelg
Copy link

@markelg markelg commented Apr 7, 2021

Description

This fix is needed to support the CCLM CORDEX model data as it is available in ESGF and in the CDS. Endianness between files is not consistent, and this raises an error when concatenating. The best way to solve this we saw is to look for the ">" that flags big endian datatypes and, if found, call numpy methods byteswap and newbyteorder to reverse the endianness of the underlying arrays.

Link to documentation:


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


…ilable in ESGF and in the CDS. Endianness between files is not consistent, and this raises an error when concatenating. The best way to solve this we saw is to look for the ">" that flags big endian datatypes and, if found, call numpy methods byteswap and newbyteorder to reverse the endianness of the underlying arrays."
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers for the code changes! Please see a couple comments from me, also could I ask you to please write a unit test for the new function? Cheers 🍺

esmvalcore/preprocessor/_io.py Outdated Show resolved Hide resolved
if cube.dtype.byteorder == ">":
logger.warning("Changing cube endianess to little. This may be "
"memory intensive.")
cube.data = cube.data.byteswap().newbyteorder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a flat-out data realization in memory, could you try exploring the use of core_data() instead of data please? That way we keep things lazy

@valeriupredoi
Copy link
Contributor

Oh an one more thing - could you please open a PR directly into the ESMValCore repo rather than using a fork to the repo? Testing is better that way 👍

markelg and others added 2 commits May 26, 2021 08:46
Docstring fix

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
@markelg
Copy link
Author

markelg commented May 26, 2021

Thank you @valeriupredoi Shall I write the test in tests/integration/._io? I don't see tests for this module in the tests/unit

Oh an one more thing - could you please open a PR directly into the ESMValCore repo rather than using a fork to the repo? Testing is better that way +1

OK, I did not know this was possible.

@zklaus
Copy link

zklaus commented May 26, 2021

Does the endianness extend to the coordinates as well? What about scalar coordinates or cell measures?

@valeriupredoi
Copy link
Contributor

Shall I write the test in tests/integration/._io? I don't see tests for this module in the tests/unit

Maybe you can put this utility function in esmvalcore/preprocessor/_other.py and add the test to tests/unit/preprocessor/_other/test_other.py? I am not 100% sure it warrants itself inside the IO module since it's not something that gets used very often? Up to you -

@markelg
Copy link
Author

markelg commented May 26, 2021

Does the endianness extend to the coordinates as well? What about scalar coordinates or cell measures?

Yes, it does extend. I did not take that into account since it is the variable the one that is raising the error. _io_concatenate seems to handle well the coordinates. Now, maybe the function should flip all the arrays in cube.coords() too, this would be more consistent, but I don't now how to do that, since I don't see a setter for the cube coords. I have it now solved for the data, also for the lazy case.

@zklaus
Copy link

zklaus commented May 27, 2021

Nice that you addressed the lazy thing. Could you push the related commit and we can continue the discussion from there?

@markelg
Copy link
Author

markelg commented May 27, 2021

I think it is almost ready. Please have a look. I managed to fix the coordinates and bounds too. Apart from the unit test, I tested in with the CORDEX data that were causing the error and it worked fine.

@zklaus
Copy link

zklaus commented May 27, 2021

Nice and good dig! Depending on your ambition, you might want to consider adding this functionality to dask itself by addressing the issue you linked. Then we could do away with the type detection workaround that you had to put in place here.

Do we need to take cell_measures into account as well? This would typically play a role for areacella or similar and is quite important particularly for regional grids since the cell size calculations for projections are not trivial in general.

@bouweandela
Copy link
Member

Endianness between files is not consistent, and this raises an error when concatenating.

I think it would be good to open an issue about this in iris and see if can be solved there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants