Fix: Flip big endian arrays before concatenation #1068

markelg · 2021-04-07T16:23:38Z

Description

This fix is needed to support the CCLM CORDEX model data as it is available in ESGF and in the CDS. Endianness between files is not consistent, and this raises an error when concatenating. The best way to solve this we saw is to look for the ">" that flags big endian datatypes and, if found, call numpy methods byteswap and newbyteorder to reverse the endianness of the underlying arrays.

Link to documentation:

Before you get started

☝ Create an issue to discuss what you are going to do

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🧪 The new functionality is relevant and scientifically sound
🛠 This pull request has a descriptive title and labels
🛠 Code is written according to the code quality guidelines
🧪 and 🛠 Documentation is available
🛠 Unit tests have been added
🛠 Changes are backward compatible
🛠 Any changed dependencies have been added or removed correctly
🛠 The list of authors is up to date
🛠 All checks below this pull request were successful

…ilable in ESGF and in the CDS. Endianness between files is not consistent, and this raises an error when concatenating. The best way to solve this we saw is to look for the ">" that flags big endian datatypes and, if found, call numpy methods byteswap and newbyteorder to reverse the endianness of the underlying arrays."

valeriupredoi

cheers for the code changes! Please see a couple comments from me, also could I ask you to please write a unit test for the new function? Cheers 🍺

esmvalcore/preprocessor/_io.py

valeriupredoi · 2021-04-19T11:09:50Z

esmvalcore/preprocessor/_io.py

+        if cube.dtype.byteorder == ">":
+            logger.warning("Changing cube endianess to little. This may be "
+                           "memory intensive.")
+            cube.data = cube.data.byteswap().newbyteorder()


this is a flat-out data realization in memory, could you try exploring the use of core_data() instead of data please? That way we keep things lazy

valeriupredoi · 2021-04-19T12:14:30Z

Oh an one more thing - could you please open a PR directly into the ESMValCore repo rather than using a fork to the repo? Testing is better that way 👍

Docstring fix Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

markelg · 2021-05-26T07:07:59Z

Thank you @valeriupredoi Shall I write the test in tests/integration/._io? I don't see tests for this module in the tests/unit

Oh an one more thing - could you please open a PR directly into the ESMValCore repo rather than using a fork to the repo? Testing is better that way +1

OK, I did not know this was possible.

zklaus · 2021-05-26T11:14:55Z

Does the endianness extend to the coordinates as well? What about scalar coordinates or cell measures?

valeriupredoi · 2021-05-26T11:31:58Z

Shall I write the test in tests/integration/._io? I don't see tests for this module in the tests/unit

Maybe you can put this utility function in esmvalcore/preprocessor/_other.py and add the test to tests/unit/preprocessor/_other/test_other.py? I am not 100% sure it warrants itself inside the IO module since it's not something that gets used very often? Up to you -

markelg · 2021-05-26T18:01:27Z

Does the endianness extend to the coordinates as well? What about scalar coordinates or cell measures?

Yes, it does extend. I did not take that into account since it is the variable the one that is raising the error. _io_concatenate seems to handle well the coordinates. Now, maybe the function should flip all the arrays in cube.coords() too, this would be more consistent, but I don't now how to do that, since I don't see a setter for the cube coords. I have it now solved for the data, also for the lazy case.

zklaus · 2021-05-27T08:21:04Z

Nice that you addressed the lazy thing. Could you push the related commit and we can continue the discussion from there?

…ds into account.

markelg · 2021-05-27T09:38:21Z

I think it is almost ready. Please have a look. I managed to fix the coordinates and bounds too. Apart from the unit test, I tested in with the CORDEX data that were causing the error and it worked fine.

zklaus · 2021-05-27T09:44:48Z

Nice and good dig! Depending on your ambition, you might want to consider adding this functionality to dask itself by addressing the issue you linked. Then we could do away with the type detection workaround that you had to put in place here.

Do we need to take cell_measures into account as well? This would typically play a role for areacella or similar and is quite important particularly for regional grids since the cell size calculations for projections are not trivial in general.

bouweandela · 2021-07-30T14:33:36Z

Endianness between files is not consistent, and this raises an error when concatenating.

I think it would be good to open an issue about this in iris and see if can be solved there.

valeriupredoi requested changes Apr 19, 2021

View reviewed changes

markelg and others added 2 commits May 26, 2021 08:46

Update esmvalcore/preprocessor/_io.py

cc0a2b5

Docstring fix Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Merge branch 'master' into fix_endianness_before_concatenation

c13f3fe

moved endianness fix function to _other.py and added unit test.

d07e3d1

The function that fixes the endianness now takes also coords and boun…

35763b4

…ds into account.

francesco-cmcc mentioned this pull request Sep 9, 2021

ESMValCore Preprocessor: fixing unmatching attributes in pairwise concatenation. #1311

Closed

10 tasks

bouweandela mentioned this pull request Jan 17, 2022

Possible improvements in cube concatenation to be discussed with iris developers #1423

Open

bouweandela mentioned this pull request Jun 1, 2023

Dataset problem: CLMcom-CCLM4-8-17 has big endian data #2057

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Flip big endian arrays before concatenation #1068

Fix: Flip big endian arrays before concatenation #1068

markelg commented Apr 7, 2021 •

edited

Loading

valeriupredoi left a comment

valeriupredoi Apr 19, 2021

valeriupredoi commented Apr 19, 2021

markelg commented May 26, 2021 •

edited

Loading

zklaus commented May 26, 2021

valeriupredoi commented May 26, 2021

markelg commented May 26, 2021

zklaus commented May 27, 2021

markelg commented May 27, 2021

zklaus commented May 27, 2021

bouweandela commented Jul 30, 2021

Fix: Flip big endian arrays before concatenation #1068

Are you sure you want to change the base?

Fix: Flip big endian arrays before concatenation #1068

Conversation

markelg commented Apr 7, 2021 • edited Loading

Description

Before you get started

Checklist

valeriupredoi left a comment

Choose a reason for hiding this comment

valeriupredoi Apr 19, 2021

Choose a reason for hiding this comment

valeriupredoi commented Apr 19, 2021

markelg commented May 26, 2021 • edited Loading

zklaus commented May 26, 2021

valeriupredoi commented May 26, 2021

markelg commented May 26, 2021

zklaus commented May 27, 2021

markelg commented May 27, 2021

zklaus commented May 27, 2021

bouweandela commented Jul 30, 2021

markelg commented Apr 7, 2021 •

edited

Loading

markelg commented May 26, 2021 •

edited

Loading