Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce length of cftime resample tests #2879

Merged

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Apr 8, 2019

The main issue is that we were resampling the same time indexes across a large range of frequencies, in some cases producing very long results, e.g. resampling an index that spans 27 years to a frequency of 12 hours.

This modifies the primary test so that it constructs time indexes whose ranges are based on the frequencies we resample to. Now in total the tests in test_cftimeindex_resample.py take around 6 seconds.

@jwenfai I did some coverage analysis offline, and these tests produce the same coverage that we had before (I found it necessary to be sure to test cases where the reference index had either a shorter or longer frequency than the resample frequency). Do you think what I have here is sufficient? I think we could potentially shorten things even more, but I'm not sure if it's worth the effort.

See below for the new profiling results; now the longest cftime tests are no longer associated with resample.

$ pytest -k cftime --durations=50
...
0.18s call     xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_cftime_datetime_data
0.11s call     xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_cftime_datetime_data
0.10s call     xarray/tests/test_backends.py::TestNetCDF4Data::test_roundtrip_cftime_datetime_data
0.09s call     xarray/tests/test_backends.py::TestNetCDF4ClassicViaNetCDF4Data::test_roundtrip_cftime_datetime_data
0.09s call     xarray/tests/test_backends.py::TestNetCDF4ViaDaskData::test_roundtrip_cftime_datetime_data
0.08s teardown xarray/tests/test_cftime_offsets.py::test_add_year_end_onOffset[julian-(2, 12)-()-<YearEnd: n=-1, month=12>-(1, 12)-()]
0.06s call     xarray/tests/test_backends.py::TestNetCDF3ViaNetCDF4Data::test_roundtrip_cftime_datetime_data
0.06s call     xarray/tests/test_backends.py::TestGenericNetCDFData::test_roundtrip_cftime_datetime_data
0.05s call     xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_cftime_datetime_data
0.04s call     xarray/tests/test_conventions.py::TestCFEncodedDataStore::test_roundtrip_cftime_datetime_data
0.03s call     xarray/tests/test_dataset.py::test_differentiate_cftime[True]
0.03s call     xarray/tests/test_dataset.py::test_trapz_datetime[cftime-True]
0.02s call     xarray/tests/test_coding_times.py::test_contains_cftime_datetimes_dask_3d[standard]
0.02s call     xarray/tests/test_backends.py::test_use_cftime_standard_calendar_default_out_of_range[2500-gregorian]
0.02s call     xarray/tests/test_dataset.py::test_differentiate_cftime[False]
0.02s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-right-None-4A-MAY]
0.02s call     xarray/tests/test_backends.py::test_use_cftime_standard_calendar_default_in_range[gregorian]
0.02s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-None-right-11Q-JUN]
0.02s call     xarray/tests/test_backends.py::test_use_cftime_standard_calendar_default_out_of_range[2500-proleptic_gregorian]
0.02s call     xarray/tests/test_backends.py::test_use_cftime_true[1500-gregorian]
0.02s call     xarray/tests/test_backends.py::test_use_cftime_true[2500-proleptic_gregorian]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[2000-gregorian]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-None-right-4A-MAY]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_standard_calendar_default_out_of_range[2500-standard]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[1500-julian]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-left-right-4A-MAY]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-right-None-11Q-JUN]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-None-right-4A-MAY]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-left-None-7M]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-left-None-4A-MAY]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-left-right-4A-MAY]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-None-None-11Q-JUN]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-right-right-4A-MAY]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_standard_calendar_default_out_of_range[1500-proleptic_gregorian]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-left-None-11Q-JUN]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[2500-julian]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_standard_calendar_default_out_of_range[1500-gregorian]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[1500-proleptic_gregorian]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-left-right-11Q-JUN]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[2000-standard]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[2500-standard]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-None-None-4A-MAY]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-right-right-11Q-JUN]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-right-right-7M]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[2500-gregorian]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-24-left-right-7M]
0.01s call     xarray/tests/test_backends.py::test_use_cftime_true[2000-proleptic_gregorian]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-right-None-7M]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-None-right-7M]
0.01s call     xarray/tests/test_cftimeindex_resample.py::test_resample[longer_da_freq-31-left-None-11Q-JUN]

@shoyer
Copy link
Member

shoyer commented Apr 8, 2019

Looks good to me. Thank you for looking into this promptly!

Let's wait for @jwenfai to review before merging.

@jwenfai
Copy link
Contributor

jwenfai commented Apr 9, 2019

Thanks for taking on the task of shortening test times! If the coverage is the same, I think the rewritten tests should be good.

Just two things I feel I should mention:

  • Testing even and odd multiples for resampling frequencies for a frequency class (e.g., '11MS' and '12M' for monthlies)
    I don't quite remember what the issue was but there were tests that passed for even/odd resampling frequencies but fail for the other. Perhaps the tests could be rewritten to (1) switch da_freq and freq and (2) use odd frequencies for constructing the DataArray so that when * 2 and \\ 2 operations are performed, even and odd resampling frequencies are obtained. There were some issues with resampling to 12H so maybe use a frequency that multiplies or divides to 12H.
  • Testing resampling from one freq type to another (e.g., from '3Q' to '11MS')
    Again I don't remember the details other than that a problem exists/existed. Perhaps something to do with _get_range_edges since that's the part of the code where the original index of a DataArray interacts with the resampling frequency.

Both of the problems I mentioned might have been from the earliest iteration of CFTimeIndex resampling so they might have no relevance now.

@spencerkclark
Copy link
Member Author

Thanks for having a look @jwenfai! I updated my PR following your suggestions; this added some more tests so now the total time is around 9 seconds.

I opted for listing the pairs of initial and resample frequencies explicitly rather than use a function to dynamically set one based on the other (I feel like it makes the tests easier to understand; also writing a function that converts one type of frequency to another similar-length-but-different-type frequency is messy).

@jwenfai
Copy link
Contributor

jwenfai commented Apr 9, 2019

Wow, that's quick. The updated tests look fine to me so go ahead and merge it.

@spencerkclark
Copy link
Member Author

I'm going to go ahead and merge this; we can revisit things again if we want to continue to speed these tests up.

@spencerkclark spencerkclark merged commit b9a920e into pydata:master Apr 11, 2019
@spencerkclark spencerkclark deleted the shorten-cftime-resample-tests branch April 11, 2019 11:42
dcherian added a commit to yohai/xarray that referenced this pull request Apr 19, 2019
* master: (29 commits)
  Handle the character array dim name  (pydata#2896)
  Partial fix for pydata#2841 to improve formatting. (pydata#2906)
  docs: Move quick overview one level up (pydata#2890)
  Manually specify chunks in open_zarr (pydata#2530)
  Minor improvement of docstring for Dataset (pydata#2904)
  Fix minor typos in docstrings (pydata#2903)
  Added docs example for `xarray.Dataset.get()` (pydata#2894)
  Bugfix for docs build instructions (pydata#2897)
  Return correct count for scalar datetime64 arrays (pydata#2892)
  Indexing with an empty array (pydata#2883)
  BUG: Fix pydata#2864 by adding the missing vrt parameters (pydata#2865)
  Reduce length of cftime resample tests (pydata#2879)
  WIP: type annotations (pydata#2877)
  decreased pytest verbosity (pydata#2881)
  Fix mypy typing error in cftime_offsets.py (pydata#2878)
  update links to https (pydata#2872)
  revert to 0.12.2 dev
  0.12.1 release
  Various fixes for explicit Dataset.indexes (pydata#2858)
  Fix minor typo in docstring (pydata#2860)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants