Possible improvements in cube concatenation to be discussed with `iris` developers #1423

valeriupredoi · 2022-01-17T12:33:28Z

Hey @ESMValGroup/esmvaltool-coreteam - I got a reply on a long forgotten issue I had opened in the SciTools GH repo about cube concatenation, see SciTools/iris#3696 (comment) - and was wondering if we need to float any ideas about possible improvements of this functionality - shout out to the iris folk who are always improving their package based on our suggestions BTW 🍺

bouweandela · 2022-01-17T14:22:32Z

This seems a relevant issue SciTools/iris#4446 and I recently opened SciTools/iris#4453. Another issue that could be interesting is #1068, though that might be an issue with Dask instead of iris, not sure.

zklaus · 2022-01-17T15:54:26Z

I am rather confused about the original issue mentioned by @valeriupredoi. Basically, this is about what to do when concatenating two cubes that have an overlap in the concatenation dimension. But there really is no sensible default. The first three options that come to my mind are:

Prefer the earlier cube and throw away the overlapping part in the later cube
Prefer the later cube and throw away the overlapping part in the earlier cube
Average the cubes on the overlap

Instead of a simple average, any number of interpolation and mixing options are imaginable.

Point is, all of these options make sense in different situations, so what should a poor library do about? I think Iris behavior of not doing the concatenation is quite sensible.

valeriupredoi · 2022-01-18T12:50:46Z

I agree with Klaus, shall we give people more time to chime in if they want to, say til the end of this week, than close both issues?

@zklaus ->

I am rather confused about the original issue mentioned by @valeriupredoi.

That's an old issue that I honestly can't remember what I was trying to get out of it - I believe I wrote a fix for us, and that's been used since, of matter here is Will's will (heh pun intended!) to make the concatenation a bit more flexible, and if we want to send feedback on that 👍

schlunma · 2022-01-18T12:52:16Z

I agree with Klaus

+1

zklaus · 2022-01-18T17:43:06Z

Yeah, what we did in #615 is force option two from my little list above.

valeriupredoi · 2022-01-19T15:45:00Z

aha! here's a relevant concatenation issue #932

WilliamIngramAtmosphericPhysics · 2022-01-31T15:43:29Z

there really is no sensible default. The first three options that come to my mind are:

Prefer the earlier cube and throw away the overlapping part in the later cube

Prefer the later cube and throw away the overlapping part in the earlier cube

Average the cubes on the overlap

To me the sensible default seems

Check the earlier & later cube agree within rounding error for the overlapping part. If so, prefer whichever is more convenient to code because it doesn't matter. If not, fail telling the user clearly how the cubes contradict each other.

But am I missing something?

zklaus · 2022-01-31T16:08:33Z

there really is no sensible default. The first three options that come to my mind are:

Prefer the earlier cube and throw away the overlapping part in the later cube

Prefer the later cube and throw away the overlapping part in the earlier cube

Average the cubes on the overlap

To me the sensible default seems
* Check the earlier & later cube agree within rounding error for the overlapping part.  If so, prefer whichever is more convenient to code because it doesn't matter.  If not, fail telling the user clearly how the cubes contradict each other.
But am I missing something?

That seems to be an extraordinarily costly operation that still would only hide a manifest error in the input data. Perhaps I should clarify a little bit the use-cases that I/we have in mind.

Generally speaking, they have to do with different experiments that are direct continuations of each other. The first that comes to mind is a historical simulation, i.e. one that uses observations as forcings for a past period, that serves as the starting point for a scenario, i.e. a climate simulation of a future time period that is driven by best-estimate forcings. In this situation, it is fairly common to later extend the historical experiment, which leads to the overlap and is interesting then to compare with the earlier scenario.
Another situation is at the beginning of these historical simulations. They frequently take their initial conditions from long-running constant-forcing, so-called control simulations. These control simulations usually run alongside the historical simulations to provide a baseline comparison, again creating an overlap. In this situation, it is interesting to compare time series that are stiched together at different points in time.

Many different situations are imaginable, but I struggle to come up with one, where it is expected that the time series actually agree on the overlap.

WilliamIngramAtmosphericPhysics · 2022-01-31T17:42:46Z

Perhaps I should clarify a little bit the use-cases that I/we have in mind.

Ah, thanks. They had not occurred to me because they are not concatenating the 2 cubes as I understand the words. What I would say you want to do is to concatenate the earlier part of the cube of data from the original run with the cube of data from the spun-off run.
So the transparent way to do what you want seemed immediately to me to take a slice in time (or in principle any dimension) to remove the data you do not want to concatenate, & concatenate what you do want.
But OK, while it is very easy to take a slice like that if you know which dimension is where, & how much you want to keep or to discard, it may not always be simple to establish that, so there may be good reason to allow users to leave it to the code.

Still, IMO apparently-valid data should not be discarded without explicit instructions. So I'd suggest adding arguments "discard_from_1st" & "discard_from_2nd". If one was set, overlap would be accepted & data discarded as stated. If both, it would fail saying why. If neither, & it found overlap, it should fail or do what I previously suggested. (The latter sounds better to me, but I have absolutely no use case.)

Many different situations are imaginable, but I struggle to come up with one, where it is expected that the time series actually agree on the overlap.

The ones I thought of in response to what you said do (apart possibly from rounding error) - & have only a "technical" overlap of 1 timelevel. I imagined e.g. breaking down big datasets to process a year/decade/century at a time, but doing each completely, so each cube had, say, 00Z on New Year's Day & 24:00Z on New Year's Eve; or spinning off an instantaneous CO2 doubling from a control run, creating its cubes including its starting-point as physically necessary for analysis, & then having reason to add earlier decades/centuries. I suppose anything longer would not have seemed "concatenation" to me.

WilliamIngramAtmosphericPhysics · 2022-02-01T15:18:42Z

I'd suggest adding arguments "discard_from_1st" & "discard_from_2nd".

I was forgetting one can concatenate more than 1 cube!

So "discard_from_earlier_in_list" & "discard_from_later_in_list" seem right to me - not earlier or later in time, as it should work for all coordinates, & also the user may not know or want to know if the coordinate values increase or decrease (they may not even have the physical direction they expect, e.g. pressure v height), while they will normally know what order they're specifying the cubes they want to combine. (OK, if they want the 2nd cube to take priority over both the 1st & 3rd, say, they'd have to concatenate twice.)

But as always, I may be missing something.

zklaus · 2022-02-01T15:40:26Z

The issue with that approach would be that the order of the cubes in the list can be changed on concatenation, precisely to line all the (e.g.) times up correctly. Overall, I think this is all more hassle than it's worth since, as you correctly mention, slicing the cubes beforehand appropriately is the right and explicit approach.

I do think that in downstream (from Iris) projects (such as ESMValTool) corresponding helper functions can evolve. If they turn into something sufficiently general, we might propose it for inclusion in a later version of Iris.

WilliamIngramAtmosphericPhysics · 2022-02-02T11:16:38Z

The issue with that approach would be that the order of the cubes in the list can be changed on concatenation,

To me that seems an advantage - you can get any cube to over-ride any cube just by specifying them in order of importance, & only 1 of the 2 arguments I suggested is needed.

Overall, I think this is all more hassle than it's worth

:-)

zklaus · 2022-02-02T11:20:25Z

Sorry for being unclear. What I meant is that Iris already re-orders the cubes on concatenate as it deems appropriate. As such, it is not clear that there is a good way to take any prior order into account.

WilliamIngramAtmosphericPhysics · 2022-02-02T13:23:56Z

I don't think you were at all unclear, but I must have been - what I was suggesting is that if the user knows or suspects there is overlap, & wants concatenation to go ahead, & knows which cubes he wants data to be discarded from, they could specify the cubes in "priority" order & set the flag to say so.

But as you say, it may be best to forget the idea.

zklaus · 2022-02-02T19:25:22Z

Don't get me wrong, I think the topic could benefit from addressing and helping the user to realize different use-cases. I just wouldn't bake it into the main CubeList.concatenate function at this point, but rather draw up a few concrete use-cases and implement support for them in another place; perhaps a separate function in Iris, perhaps in a tool like ESMValTool, perhaps somewhere else.

valeriupredoi · 2022-04-28T14:07:47Z

here's another facet to this general issue SciTools/iris#4720

valeriupredoi added the question Further information is requested label Jan 17, 2022

rcomer mentioned this issue Mar 14, 2022

Cube list concatenation oddity SciTools/iris#3696

Closed

zklaus mentioned this issue May 9, 2022

Tracing a file's ancestry #1572

Open

dhohn mentioned this issue Feb 23, 2024

concatenate experiments handles time ordering incorrectly #2342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible improvements in cube concatenation to be discussed with `iris` developers #1423

Possible improvements in cube concatenation to be discussed with `iris` developers #1423

valeriupredoi commented Jan 17, 2022 •

edited

Loading

bouweandela commented Jan 17, 2022

zklaus commented Jan 17, 2022

valeriupredoi commented Jan 18, 2022

schlunma commented Jan 18, 2022

zklaus commented Jan 18, 2022

valeriupredoi commented Jan 19, 2022

WilliamIngramAtmosphericPhysics commented Jan 31, 2022

zklaus commented Jan 31, 2022

WilliamIngramAtmosphericPhysics commented Jan 31, 2022

WilliamIngramAtmosphericPhysics commented Feb 1, 2022

zklaus commented Feb 1, 2022

WilliamIngramAtmosphericPhysics commented Feb 2, 2022

zklaus commented Feb 2, 2022

WilliamIngramAtmosphericPhysics commented Feb 2, 2022

zklaus commented Feb 2, 2022

valeriupredoi commented Apr 28, 2022

Possible improvements in cube concatenation to be discussed with iris developers #1423

Possible improvements in cube concatenation to be discussed with iris developers #1423

Comments

valeriupredoi commented Jan 17, 2022 • edited Loading

bouweandela commented Jan 17, 2022

zklaus commented Jan 17, 2022

valeriupredoi commented Jan 18, 2022

schlunma commented Jan 18, 2022

zklaus commented Jan 18, 2022

valeriupredoi commented Jan 19, 2022

WilliamIngramAtmosphericPhysics commented Jan 31, 2022

zklaus commented Jan 31, 2022

WilliamIngramAtmosphericPhysics commented Jan 31, 2022

WilliamIngramAtmosphericPhysics commented Feb 1, 2022

zklaus commented Feb 1, 2022

WilliamIngramAtmosphericPhysics commented Feb 2, 2022

zklaus commented Feb 2, 2022

WilliamIngramAtmosphericPhysics commented Feb 2, 2022

zklaus commented Feb 2, 2022

valeriupredoi commented Apr 28, 2022

Possible improvements in cube concatenation to be discussed with `iris` developers #1423

Possible improvements in cube concatenation to be discussed with `iris` developers #1423

valeriupredoi commented Jan 17, 2022 •

edited

Loading