Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case insensitive lenient special name support? #4453

Closed
bouweandela opened this issue Dec 7, 2021 · 11 comments
Closed

Case insensitive lenient special name support? #4453

bouweandela opened this issue Dec 7, 2021 · 11 comments

Comments

@bouweandela
Copy link
Member

✨ Feature Request

Motivation

It was recently reported to me by @axel-lauer that he experienced frustration with using the iris library via ESMValTool because it failed to merge some cubes because some had a longitude coordinate with long_name "Longitude" (the CMIP6 standard), while others had a longitude coordinate with long_name "longitude" (the CMIP5 standard).

Would this already be solved by #4446 or would some additional changes be needed, such as changing special lenient name behaviour such that is case insensitive (if that doesn't conflict with the CF Conventions)?

@rcomer
Copy link
Member

rcomer commented Dec 7, 2021

Hi @bouweandela, would those coordinates also have standard_names? If so, I think the lenient comparison would ignore the long_name clash.

@bouweandela
Copy link
Member Author

Yes, they would have standard_names, so that sounds promising!

@rcomer rcomer closed this as completed Dec 7, 2021
@rcomer rcomer reopened this Dec 7, 2021
@rcomer
Copy link
Member

rcomer commented Dec 7, 2021

Sorry, finger slipped!

@bjlittle bjlittle self-assigned this Dec 15, 2021
@rcomer
Copy link
Member

rcomer commented Dec 15, 2021

It seems I had misremembered in the above. The mismatched long_names do cause an error for lenient arithmetic (with Iris3.1):

import iris
import iris.cube
import iris.coords

coord1 = iris.coords.DimCoord(range(5), standard_name="longitude", long_name="Longitude")
coord2 = iris.coords.DimCoord(range(5), standard_name="longitude", long_name="longitude")

cube1 = iris.cube.Cube(range(5), dim_coords_and_dims=[(coord1, 0)])
cube2 = iris.cube.Cube(range(5), dim_coords_and_dims=[(coord2, 0)])

cube1 - cube2
ValueError: Insufficient matching coordinate metadata to resolve cubes, cannot map dimension (0,) of the RHS cube 'unknown' to the LHS cube 'unknown'.

You can have mismatched var_names though:

coord1.var_name = "foo"
coord3 = coord1.copy()
coord3.var_name = "bar"

cube3 = iris.cube.Cube(range(5), dim_coords_and_dims=[(coord3, 0)])

print(cube1 - cube3)
unknown / (unknown)                 (longitude: 5)
    Dimension coordinates:
        longitude                             x

@bjlittle
Copy link
Member

bjlittle commented Dec 15, 2021

For equality, lenient behaviour is really only your friend in the cases outlined here i.e., when you're comparing two things, and one thing has a value and the other not (i.e., "something" vs None)

The only other special treatment is with regards to var_name (currently).

That all said, for strict/lenient equality a long_name="latitude" compared with long_name="Latitude" will always be False, as you're comparing "something" vs "something".

Yes, we could say, well for lenient behaviours make str equality case insensitive, which is ideal to adhere to the CMIP6 standard, but on the flip side it then breaks someone else's workflow where they do actually care about long_name="longitude" vs long_name="LONGITUDE" being different (don't know why, but they might)... unfortunately, there is just no way for us to know this; it's incredibly local and specific to users, their data and their workflow i.e., different people care about different things (and that's totally okay).

IMHO adding baked in local exceptions to iris really is a slippy slope from a maintenance perspective, and I'd be keen to avoid it.

In summary, without thinking too deeply about this, the obvious options are:

  1. either iris opens up the ability for users to define generic behavious to be applied during lenient operations at the API level,
  2. users align metadata values beforehand in the knowledge of how iris strict and lenient operations behave, or
  3. allow users that want "special" lenient treatment, a hook to monkey-patch the local behaviour that they want and need for their workflow

Personally, knowing what I know, I'm attracted to option [3] from the perspective that it's completely achievable with very little development effort. This is compared to option [1] which is a HUGE development undertaking, and option [2] which is a HUGE inconvenience for users.

Also, going back to your question @bouweandela, as it stands, no #4446 wouldn't help you alone here... however, it certainly would help you along with option [3].

@bjlittle
Copy link
Member

bjlittle commented Dec 15, 2021

Okay, I'm going to proof-of-concept option [3] (above) in the context of your use case and see how it hangs together.

This'll hopefully give us a concrete strawman (and hopefully proposal) of how we might be able to move forwards with this issue.

@bjlittle bjlittle pinned this issue Dec 15, 2021
@rcomer
Copy link
Member

rcomer commented Dec 15, 2021

I like the idea of option 3 from the user POV too. Could it work similarly to the FUTURE flag? I.e. I can set it once at the top of my script so it modifies all behaviour in my script, or I can choose to use it as a context manager to just modify the behaviour in this one section.

@bjlittle
Copy link
Member

I like the idea of option 3 from the user POV too. Could it work similarly to the FUTURE flag? I.e. I can set it once at the top of my script so it modifies all behaviour in my script, or I can choose to use it as a context manager to just modify the behaviour in this one section.

That would be lovely, right? 🤔

I think there are at least two threads here:

  • does the option [3] mechanism work, and is it palatable to both developers and users alike
  • how will it be used, and how do we best exercise control of when these "special" behaviours kick-in, so to speak

@bouweandela
Copy link
Member Author

bouweandela commented Feb 14, 2022

Thanks for looking into this @bjlittle and @rcomer and my apologies for being slow to reply. It sounds like option 3 is as good as it gets. I personally feel that being super strict on things is what scares scientists off from using iris, but maybe the iris library is not the place to fix that and we need we need libraries like esmvalcore to carry out all the homogenising of metadata before passing things on to iris. I do try to report the more generic things back to iris, that's why i opened this issue.

Another example of super strict behaviour that only makes sense to programmers and not to climate scientists is refusing to merge floating point coordinates that only differ by 1 ULP.

@bouweandela
Copy link
Member Author

bouweandela commented Feb 14, 2022

Thinking since more about option 3: does having this option simplify the user's code in practice? It is fairly straightforward to write code to homogenise the capitalisation of coordinate names, it's just a bit unexpected that this is needed. Do the CF conventions have anything to say on whether or not variable names are case sensitive?

@ESadek-MO
Copy link
Contributor

We've discussed this as a team @SciTools/peloton and feel this is a case for user workaround. This is beyond the intended scope of lenient operations.

@ESadek-MO ESadek-MO closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants