Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change an == to an is. Fix tests so that this won't happen again. #2648

Merged
merged 2 commits into from
Jan 5, 2019

Conversation

WeatherGod
Copy link
Contributor

@WeatherGod WeatherGod changed the title Change an == to an is. Fix tests so that this won't happen again. WIP: Change an == to an is. Fix tests so that this won't happen again. Jan 4, 2019
@@ -606,7 +606,7 @@ def open_mfdataset(paths, chunks=None, concat_dim=_CONCAT_DIM_DEFAULT,
# Coerce 1D input into ND to maintain backwards-compatible API until API
# for N-D combine decided
# (see https://github.com/pydata/xarray/pull/2553/#issuecomment-445892746)
if concat_dim is None or concat_dim == _CONCAT_DIM_DEFAULT:
if concat_dim is None or concat_dim is _CONCAT_DIM_DEFAULT:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want this to be an is test? Are we sure that the keyword argument will default to that string object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use that as the default value here.

Possibly a better choice would be to use something like ReprObject('<inferred>') for the sentinel object, e.g., as done here:

NA = utils.ReprObject('<NA>')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I am getting at is a matter of user-friendliness. A user looking up the call signature of this method will see the default as the string (and not an object), but, they'll never be able to explicitly force concat dimension inference, because passing the string themselves will not work (it'll be a different object).

@shoyer
Copy link
Member

shoyer commented Jan 4, 2019 via email

@WeatherGod
Copy link
Contributor Author

ok, so we use the ReprObject for the default, and then test if concat_dim is of type `ReprObject and then test its equivalance?

@shoyer
Copy link
Member

shoyer commented Jan 4, 2019

ok, so we use the ReprObject for the default, and then test if concat_dim is of type `ReprObject and then test its equivalance?

No, just check identity with the exact ReprObject used as the default value.

This is just a slightly more readable version of the common idiom of use object() as a default value, e.g., as shown here
http://effbot.org/zone/default-values.htm#what-to-do-instead

@WeatherGod
Copy link
Contributor Author

Is the following statement True or False: "The user should be allowed to explicitly declare that they want the concatenation dimension to be inferred by passing a keyword argument". If this is True, then you need to test equivalence. If it is False, then there is nothing more I need to do for the PR, as changing this to use a ReprObject is orthogonal to these changes.

@shoyer
Copy link
Member

shoyer commented Jan 4, 2019

Is the following statement True or False: "The user should be allowed to explicitly declare that they want the concatenation dimension to be inferred by passing a keyword argument".

This statement is False, but it looks like we have a related bug: we really should be importing _CONCAT_DIM_DEFAULT from xarray.core.combine, because we use identity comparison in that module. Right now this only works by virtue of the accident that identical string literals points to the same variable in CPython.

@WeatherGod WeatherGod changed the title WIP: Change an == to an is. Fix tests so that this won't happen again. Change an == to an is. Fix tests so that this won't happen again. Jan 5, 2019
@WeatherGod
Copy link
Contributor Author

I completely forgotten about that little quirk of cpython. I try to ignore implementation details like that. Heck, I still don't fully trust dictionaries to be ordered!

I removed the WIP. We can deal with the concat dim default object separately, including turning it into a ReprObject (not exactly sure what the advantage of it is over just using the string, but, meh).

@shoyer
Copy link
Member

shoyer commented Jan 5, 2019

I just pushed a commit to your branch that should fix the string identity issue (by not defining the constant multiple times).

@shoyer shoyer merged commit ee44478 into pydata:master Jan 5, 2019
@shoyer
Copy link
Member

shoyer commented Jan 5, 2019

Thanks

@shoyer shoyer mentioned this pull request Jan 6, 2019
3 tasks
@TomNicholas TomNicholas mentioned this pull request Jan 6, 2019
15 tasks
@WeatherGod WeatherGod deleted the fix_2647 branch January 7, 2019 15:42
dcherian pushed a commit to yohai/xarray that referenced this pull request Jan 24, 2019
* master:
  Remove broken Travis-CI builds (pydata#2661)
  Type checking with mypy (pydata#2655)
  Added Coarsen (pydata#2612)
  Improve test for GH 2649 (pydata#2654)
  revise top-level package description (pydata#2430)
  Convert ref_date to UTC in encode_cf_datetime (pydata#2651)
  Change an `==` to an `is`. Fix tests so that this won't happen again. (pydata#2648)
  ENH: switch Dataset and DataArray to use explicit indexes (pydata#2639)
  Use pycodestyle for lint checks. (pydata#2642)
  Switch whats-new for 0.11.2 -> 0.11.3
  DOC: document v0.11.2 release
  Use built-in interp for interpolation with resample (pydata#2640)
  BUG: pytest-runner no required for setup.py (pydata#2643)
shoyer pushed a commit that referenced this pull request Jan 24, 2019
…#2648)

* Change an `==` to an `is`. Fix tests so that this won't happen again.

Closes #2647 and re-affirms #1988.

* Reuse the same _CONCAT_DIM_DEFAULT object
shoyer pushed a commit that referenced this pull request Jun 25, 2019
* concatenates along a single dimension

* Wrote function to find correct tile_IDs from nested list of datasets

* Wrote function to check that combined_tile_ids structure is valid

* Added test of 2d-concatenation

* Tests now check that dataset ordering is correct

* Test concatentation along a new dimension

* Started generalising auto_combine to N-D by integrating the N-D concatentation algorithm

* All unit tests now passing

* Fixed a failing test which I didn't notice because I don't have pseudoNetCDF

* Began updating open_mfdataset to handle N-D input

* Refactored to remove duplicate logic in open_mfdataset & auto_combine

* Implemented Shoyers suggestion in #2553 to rewrite the recursive nested list traverser as an iterator

* --amend

* Now raises ValueError if input not ordered correctly before concatenation

* Added some more prototype tests defining desired behaviour more clearly

* Now raises informative errors on invalid forms of input

* Refactoring to alos merge along each dimension

* Refactored to literally just apply the old auto_combine along each dimension

* Added unit tests for open_mfdatset

* Removed TODOs

* Removed format strings

* test_get_new_tile_ids now doesn't assume dicts are ordered

* Fixed failing tests on python3.5 caused by accidentally assuming dict was ordered

* Test for getting new tile id

* Fixed itertoolz import so that it's compatible with older versions

* Increased test coverage

* Added toolz as an explicit dependency to pass tests on python2.7

* Updated 'what's new'

* No longer attempts to shortcut all concatenation at once if concat_dims=None

* Rewrote using itertools.groupby instead of toolz.itertoolz.groupby to remove hidden dependency on toolz

* Fixed erroneous removal of utils import

* Updated docstrings to include an example of multidimensional concatenation

* Clarified auto_combine docstring for N-D behaviour

* Added unit test for nested list of Datasets with different variables

* Minor spelling and pep8 fixes

* Started working on a new api with both auto_combine and manual_combine

* Wrote basic function to infer concatenation order from coords.

Needs better error handling though.

* Attempt at finalised version of public-facing API.

All the internals still need to be redone to match though.

* No longer uses entire old auto_combine internally, only concat or merge

* Updated what's new

* Removed uneeded addition to what's new for old release

* Fixed incomplete merge in docstring for open_mfdataset

* Tests for manual combine passing

* Tests for auto_combine now passing

* xfailed weird behaviour with manual_combine trying to determine concat_dim

* Add auto_combine and manual_combine to API page of docs

* Tests now passing for open_mfdataset

* Completed merge so that #2648 is respected, and added tests.

Also moved concat to it's own file to avoid a circular dependency

* Separated the tests for concat and both combines

* Some PEP8 fixes

* Pre-empting a test which will fail with opening uamiv format

* Satisfy pep8speaks bot

* Python 3.5 compatibile after changing some error string formatting

* Order coords using pandas.Index objects

* Fixed performance bug from GH #2662

* Removed ToDos about natural sorting of string coords

* Generalized auto_combine to handle monotonically-decreasing coords too

* Added more examples to docstring for manual_combine

* Added note about globbing aspect of open_mfdataset

* Removed auto-inferring of concatenation dimension in manual_combine

* Added example to docstring for auto_combine

* Minor correction to docstring

* Another very minor docstring correction

* Added test to guard against issue #2777

* Started deprecation cycle for auto_combine

* Fully reverted open_mfdataset tests

* Updated what's new to match deprecation cycle

* Reverted uamiv test

* Removed dependency on itertools

* Deprecation tests fixed

* Satisfy pycodestyle

* Started deprecation cycle of auto_combine

* Added specific error for edge case combine_manual can't handle

* Check that global coordinates are monotonic

* Highlighted weird behaviour when concatenating with no data variables

* Added test for impossible-to-auto-combine coordinates

* Removed uneeded test

* Satisfy linter

* Added airspeedvelocity benchmark for combining functions

* Benchmark will take longer now

* Updated version numbers in deprecation warnings to fit with recent release of 0.12

* Updated api docs for new function names

* Fixed docs build failure

* Revert "Fixed docs build failure"

This reverts commit ddfc6dd.

* Updated documentation with section explaining new functions

* Suppressed deprecation warnings in test suite

* Resolved ToDo by pointing to issue with concat, see #2975

* Various docs fixes

* Slightly renamed tests to match new name of tested function

* Included minor suggestions from shoyer

* Removed trailing whitespace

* Simplified error message for case combine_manual can't handle

* Removed filter for deprecation warnings, and added test for if user doesn't supply concat_dim

* Simple fixes suggested by shoyer

* Change deprecation warning behaviour

* linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants