Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time dimension in hybrid height #5369

Open
2 tasks
trexfeathers opened this issue Jul 6, 2023 · 9 comments
Open
2 tasks

Time dimension in hybrid height #5369

trexfeathers opened this issue Jul 6, 2023 · 9 comments
Assignees
Milestone

Comments

@trexfeathers
Copy link
Contributor

trexfeathers commented Jul 6, 2023

✨ Feature Request

As described in CF conventions

Motivation

From @matthew-mizielinski. Considering how to represent hybrid height over glaciers, where the orography moves/changes over time.

Additional context

@pp-mo expects problems with FF or PP loading, as it would require a specific sequence of merge steps.

Click to expand this section...
Please add additional verbose information in this section e.g., references, screenshots, listings etc

Tasks

  1. Type: Documentation
    pp-mo stephenworsley
  2. pp-mo stephenworsley
@trexfeathers
Copy link
Contributor Author

@matthew-mizielinski has confirmed that data like this currently generates 1 Cube for each time point, rather than a single Cube with a time dimension.

@trexfeathers trexfeathers added this to the v3.11 milestone May 2, 2024
@trexfeathers
Copy link
Contributor Author

trexfeathers commented May 2, 2024

It will be problematic for UK Met Office strategy (climate - IPCC) if this misses the 3.11 (October) release.

@stephenworsley
Copy link
Contributor

I believe it is currently possible to construct a hybrid height coordinate that varies over time. What is not possible is to merge multiple 2D cubes with varying orographies together. This would require a substantial change to merge behaviour. I suspect this may be covered by #5375 which has been a particularly stubborn issue to untangle.

@trexfeathers
Copy link
Contributor Author

I believe it is currently possible to construct a hybrid height coordinate that varies over time. What is not possible is to merge multiple 2D cubes with varying orographies together. This would require a substantial change to merge behaviour. I suspect this may be covered by #5375 which has been a particularly stubborn issue to untangle.

@stephenworsley let us know what you need. If necessary we have a whole team of developers (given the strategic importance of this).

@matthew-mizielinski
Copy link

Shout if a discussion on this would be useful -- I'm sure we can come up with a minimal test data set to work with.

@stephenworsley
Copy link
Contributor

@matthew-mizielinski minimal test data would absolutely be appreciated, and yes, I think it would be good to set up a discussion when possible.

@stephenworsley
Copy link
Contributor

One possible idea for resolving the merge issue:

Provide a keyword argument for the merge method which you can pass the name of an AuxCoord or a tuple of coord names. this tells merge which coordinates it ought to expand the dimensions of. Further information is likely to be required in the case where multiple dimensions are being added by merge, perhaps a tuple of dimension names in which to expand for each AuxCoord. This keyword could also be passed down from the load function.

This approach shouldn't break existing functionality and should allow sufficient controll of the merging process. I expect there may be some attention we would need to give to AuxCoordFactorys to make sure they behave sensibly during this process since I'm not aware of any other functions which add a dimension to a coordinate that another coordinate is derived from, but I don't expect this to be too much of a problem.

@stephenworsley
Copy link
Contributor

An alternate approach to explore could involve concatenating instead of merging and using the new_axis utility to expand the dimensions of the orography coordinate appropriately. This ought to be enabled now via #4896, though I'm not sure how this handles derived coordinates.

@pp-mo
Copy link
Member

pp-mo commented Sep 16, 2024

Some summary points from our offline discussion today (@pp-mo @stephenworsley @matthew-mizielinski )

Usecase example

we investigated a specific usecase which demonstrates the issue here.

  • monthly files spread across multiple years, so timepoints are monthly
  • each phenomenon (stash) has dimensions (time, model_level, y, x)
  • the orography is surface_altitude(time, y, x), and it changes each year (so same on adjacent 12 month points)

We tried loading selected monthly files, e.g.
iris.load(['sep30, 'oct30', 'jan31']) # imaginary monthly files (!)

  • The source PP fields (as seen from "load_raw") are of course 2d.
  • with normal load (i.e. not 'load_raw'), adjacent months produce a single phenomenon cube with a common 2D orography
  • a mixture of years produces a data-cube per year, and a single merged orography cube
  • in a normal (i.e. merge-processed) load which spans multiple years (but considering only one phenomenon for now)
    • there are multiple data cubes, each containing one year, with ...
      • a single scalar timepoint
      • an associated 2D orography ("surface_height") aux-coord, (matching the year timepoint)
      • a 2D factory coord (not mapped to time)
    • a single orography cube, with has a time dimension, merged from all the timepoints

N.B. we have sample test data to demo this

Solutions acceptable to the user

@matthew-mizielinski said, for his expected usage, it should be easy to identify what data suffers from the "missing merge" like this, and potentially add a specific load keyword as a "hint" (as suggested above), or call into a post-load adjustment utility.

Summary of findings regarding the existing code

  • we can see why it doesn't "just work", because merge cannot merge factories ...
  • ... and in any case, factory references are attached separately to each raw datacube,
    and always as a single, 2d field, since no merged orography is available at the "raw cube" stage
  • however, it appears that concatenate can now "merge factories" : see here
  • likewise, the promote_aux_coord_to_dim_coord utility now has the ability to "promote" a set of (user-specified) scalar coords to a length-1 extra dimension : see here
  • contrary to @pp-mo prior concerns, the relationship between raw orography fields and data fields is not obscure,
    since the orography info is all correctly labelled with timepoints matching the data.
    Hence, in the above usecase, orography always loads as a single cube with a "complete" time dimension (unlike the data fields). Therefore it is not absolutely necessary to change the low-level loading mechanisms
  • we are concerned that re-writing merge (or concatenate) to achieve this automatically would be very involved
    • although it seems logically feasible, since all the relevant metadata exists in the loaded data as we have it
    • ... however the code is very complex, and some previous attempts to extend it had to be abandoned due to unforeseen changes affecting backwards compatibility
    • so, it seems high-risk to propose a major overhaul which could make it even more complicated
  • it also seems hard to work out, automatically + in general, which coords should be merged to create an extra factory dimension
  • hence, a separate "additional" facility, with user-hint input, seems more likely to succeed

Possible solutions we can envisage

User presentation (API)

  1. a general, automatic fix to merge operations within loading (but see complexity objections, above)
  2. or a load (and/or merge/concatenate) keyword to enable the "extra" factory building on load
  3. or a post-load utility call.

In case (2) we might need to worry about selecting the correct cubes to work with in the 'additional' operation.
The general 'load+merge' behaviour can produce multiple cubes where one was expected if there is a small mismatch somewhere : In this case it could be hard to apply the 'additional' operation to the correct subsets.
But we can limit the expected results, e.g. only allow it in "load_cubes", where a single cube is expected from applying each provided constraint.
Likewise, a user-operated post-merge operation could be specified to work only with "suitable" data expected to produce a single result cube.

Calculation

( ignoring for now the "better general merge" approach + looking for easy wins )

In general , we can solve merge/concat problems of this nature by

  1. either reducing all data to have a single point in the problem dimension, then merging everything
  2. or promoting single-point data to get a length-1 dimension, and concatenating everything

In this case, since we observe that concatenate can combine factories while merge cannot, it seems that (2) is probably easiest

So it looks like, a viable proof-of-concept solution could :

  1. accept a set of input cubes which (the user says) "ought" to merge into a single result,
    plus, probably, user-hints of which factory/coords to work on
  2. promote any cubes with scalar time to have a length-1 time dimension,
    - including the relevant factory and all the aux-coords which are its dependencies
  3. concatenate, expecting a single cube result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants