Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_by_coords could use allclose instead of equal to compare coordinates #4465

Open
ghislainp opened this issue Sep 26, 2020 · 4 comments · May be fixed by #4467
Open

combine_by_coords could use allclose instead of equal to compare coordinates #4465

ghislainp opened this issue Sep 26, 2020 · 4 comments · May be fixed by #4467

Comments

@ghislainp
Copy link
Contributor

Is your feature request related to a problem? Please describe.

When a coordinate in different dataset / netcdf files has slightly different values, combine_by_coords considers the coordinate are different and attempts a concatenation of the coordinates.

Concretely, I produce netcdf with (lat, lon, time) coordinates, annually. Apparently the lat is not the same in all the files (difference is 1e-14), which I suspect is due to different pyproj version used to produce the lon,lat grid. Reprocessing all the annual netcdf is not an option. When using open_mfdataset on these netcdf, the lat coordinate is concatenated which leads to a MemoryError in my case.

Describe the solution you'd like
Two options:

  • add a coord_tolerance argument to xr.combine_by_coords and use np.allclose to compare the coordinates. In line 69 combine.py the comparison uses strict equality "if not all(index.equals(indexes[0]) for index in indexes[1:]):". This does not break the compatibility because coord_tolerance=0 should be the default.

  • add an argument to explicity list the coordinates to NOT concatenate. I tried to play with the coords argument to solve my problem, but was not succesfull.

Describe alternatives you've considered

  • I certainly could find a workaround for this specific case, but I often had issue with the magic in combine_by_coords, and imho adding more control by the user would be useful in general.

Additional context

@dcherian
Copy link
Contributor

You can totally skip this comparison by specifying join="override" (since lat, lon are "indexes"; the join kwarg controls how they are aligned; override will skip the comparison and just use the lat or lon from the first dataset).

Adding tolerance seems OK to me.

@shoyer
Copy link
Member

shoyer commented Sep 26, 2020

See #2217 for related discussion about adding a tolerance parameter to alignment.

I think adding a tolerance parameter would be a welcome addition. It's not commutative, which is a little inelegant, but it's definitely a strict improvement over join='override'.

@ghislainp
Copy link
Contributor Author

Interesting discussion #2217. As far as I understand, it solves a wider problem than using a tolerance on and a member by member comparison... but more complex to implement.

Here is the simple solution that works for me (combine.py:70):

        if not (all(index.equals(indexes[0]) for index in indexes[1:])
                or (tolerance > 0 and \
                    all(index.is_numeric() for index in indexes) and \
                    all(np.allclose(index, indexes[0], atol=tolerance, rtol=0) for index in indexes[1:]))):

I've not tested in depth, I can make a PR, tests, doc if you agree with this solution.

Regarding the name of the arg. "tolerance" is nice, but allclose has atol and rtol. My solution above only set atol, but both may be useful. Should we use:
2args: atol, rtol
2 args: atolerance, rtolerance
1 args: tolerance could be a number (->atol) or a tuple interpreted as atol, rtol = tolerance
or a dict
.... ?

@shoyer
Copy link
Member

shoyer commented Sep 26, 2020

I would suggest only specifying tolerance as an absolute tolerance, which would match the argument name we have on the .sel() method.

Is there a case where you think rtol would be much better, or would let you do something that isn't possible with an absolute tolerance?

@ghislainp ghislainp linked a pull request Sep 27, 2020 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants