Time series enhancements #2690

neutrinoceros · 2020-06-25T18:24:41Z

PR Summary

This is part of an effort to harmonise and simplify the data loading api.

This PR includes the following improvements to DatasetSeries:

deprecate DatasetSeries.from_filenames() because it's redundant since DatasetSeries.__new__ was added, and I suspect it is not used
fix a docstring inconsistency
add tests for pattern detection
refactor the pattern parsing function (simplify + add '~' token expansion, to improve consistency between DatasetSeries() and yt.load())

⚠️ minor backward incompatibility
I changed the error raised in the pattern parsing function to FileNotFoundError to better reflect the issue, this could potentially break downstream code that relies on this function throwing YTOutputNotIdentified, though it seems very unlikely.

PR Checklist

Code passes flake8 checker
New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

neutrinoceros · 2020-06-25T19:48:36Z

So atm it looks like the failing tests are exactly the two I just added. This is because they are written for pytest and our CI is still running nose.
Note that the switch to pytest is being experimented in #2676

munkm

Ok, I left a few inline comments. Somebody who knows the time series stuff may be able to offer some insight here.

The deprecation of from_filenames seems reasonable given the existence of __new__, especially because the docstring on DatasetSeries has the same kwargs available as from_filenames, so it seems like there's no added flexibility from the latter. I am curious why it is still around (since from_filenames is from ~2013 and __new__ is from ~2014).

yt/data_objects/time_series.py

yt/data_objects/tests/test_time_series.py

munkm · 2020-06-25T20:35:19Z

yt/data_objects/time_series.py

-            file_list = td_filenames
-        else:
-            raise YTOutputNotIdentified(filenames, {})
+def get_filenames_from_glob_pattern(pattern):


in the docstring you name this arg outputs, not pattern. They should be consistent.

TBH I don't really understand why this needed to be changed from filenames? They should be output files after all.

Well this function's name indicate that it's meant to be passed a pattern. It's true that, in practice, it receives the "outputs" param from DatasetSeries.__new__(), but only in as a special-case handling where the parameter doesn't actually hold "outputs" yet.

If you’re not convinced I can change it, it’s not that crucial. Let me know !

Ok, but the way I'm reading this there's no separation in functionality right? outputs will always be sent to this function. There's no separation if it's a list or a pattern. In new we see outputs = get_filenames_from_glob_pattern(outputs), so it isn't always going to be sent a pattern and not a list.

Actually, do we have a test verifying that a list works?

It doesn't show up in the lines I've touched but there is a separation in DatasetSeries.__new__

yt/yt/data_objects/time_series.py

Line 141 in 6befef2

if isinstance(outputs, str):

I have no idea if we have a test for lists. I considered adding one but I don't know of any sample data that'd be appropriate for this.

I just verified that I could use a list with enzo_tiny_cosmology.

That said, I know this is nitpicky but I think it's good to be consistent with how we send args in defintions vs. in the place they're used in our code. I would prefer if the definition also used outputs so we're consistent, even if there's an implication it will be a pattern of some sort.

yt/data_objects/time_series.py

neutrinoceros · 2020-06-26T21:20:45Z

This can now viewed as a companion PR to #2695

…kens, raise FileNotFoundError if no match is found. Make new tests pass.

…me series

Co-authored-by: Madicken Munk <madicken.munk@gmail.com>

matthewturk · 2020-06-29T21:00:21Z

@munkm I also don't remember why we had both. I think there may have been a time where we were anticipating making a DatasetSeries from non-filenames, i.e., multiple actual Dataset objects.

This looks good to me.

neutrinoceros · 2020-06-29T21:03:01Z

I'm happy to update this if needed once we resolve the above conversations @munkm 😄

munkm · 2020-06-29T21:52:18Z

Wait, I'm confused. The thing you just reverted was ok based on our discussions?

neutrinoceros · 2020-06-29T21:56:19Z

Wait, I'm confused. The thing you just reverted was ok based on our discussions?

yes ! Did I misinterpret the following

If this update makes it more inclusive, that's fine with me!

?

munkm · 2020-06-29T21:57:13Z

yt/data_objects/time_series.py

+def get_filenames_from_glob_pattern(pattern):
+    epattern = os.path.expanduser(pattern)


Suggested change

def get_filenames_from_glob_pattern(pattern):

epattern = os.path.expanduser(pattern)

def get_filenames_from_glob_pattern(outputs):

epattern = os.path.expanduser(outputs)

munkm · 2020-06-29T21:58:26Z

Did I misinterpret the following

I think maybe (or maybe I did?). I was trying to say I was ok with what you were saying.

Eta: sorry, my bad! You didn't misinterpret. I did! I thought the reversion had already happened and you were going back to pre-pre functionality!

… add a docstring for clarity

neutrinoceros · 2020-06-29T22:10:22Z

Looks like we settled on something satisfying for everyone now ➡️ :partyparot:

neutrinoceros added enhancement Making something better backwards incompatible This change will change behavior api-consistency naming conventions, code deduplication, informative error messages, code smells... labels Jun 25, 2020

neutrinoceros force-pushed the time_series_enhancements branch from 8529390 to e7ec520 Compare June 25, 2020 18:28

munkm reviewed Jun 25, 2020

View reviewed changes

neutrinoceros mentioned this pull request Jun 26, 2020

Simplify loader functions #2695

Merged

3 tasks

neutrinoceros and others added 11 commits June 29, 2020 14:46

tests: add tests for timeseries pattern parsing

0090b8b

doc: fix an inconsistency in DatasetSeries docstring

dabe8a5

refactor: simplify time series pattern expansion + expand '~' user to…

e6d8d77

…kens, raise FileNotFoundError if no match is found. Make new tests pass.

deprecate DatasetSeries.from_filenames()

f7ae0f9

fix: catch new error in yt.load() when attempting to instanciate a ti…

f8a0198

…me series

better deprecation message

c8b2966

add comment and revert second glob.glob search to previous behaviour

abb6e0a

string formatting

3934476

Co-authored-by: Madicken Munk <madicken.munk@gmail.com>

refactor: raise OSError instead of FileNotFoundError

f39b82f

docstring improvement for DatasetSeries

9813cc9

docstring improvement for DatasetSeries (add a note on weak references)

5c74be0

neutrinoceros force-pushed the time_series_enhancements branch from 5c058ef to 5c74be0 Compare June 29, 2020 12:48

neutrinoceros mentioned this pull request Jun 29, 2020

ensure release and development directions are consistent #2554

Closed

matthewturk approved these changes Jun 29, 2020

View reviewed changes

revert to previous proposal for glob pattern search

2b094f0

munkm reviewed Jun 29, 2020

View reviewed changes

revert to original argument name for get_filenames_from_glob_pattern,…

e9733dc

… add a docstring for clarity

munkm approved these changes Jun 29, 2020

View reviewed changes

munkm merged commit cfdb582 into yt-project:master Jun 30, 2020

neutrinoceros mentioned this pull request Jul 3, 2020

[discussion] Meaningful errors #2721

Closed

neutrinoceros deleted the time_series_enhancements branch July 21, 2020 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time series enhancements #2690

Time series enhancements #2690

neutrinoceros commented Jun 25, 2020 •

edited

Loading

neutrinoceros commented Jun 25, 2020

munkm left a comment •

edited

Loading

munkm Jun 25, 2020

munkm Jun 25, 2020

neutrinoceros Jun 25, 2020

neutrinoceros Jun 28, 2020

munkm Jun 29, 2020 •

edited

Loading

munkm Jun 29, 2020

neutrinoceros Jun 29, 2020 •

edited

Loading

munkm Jun 29, 2020

neutrinoceros commented Jun 26, 2020

matthewturk commented Jun 29, 2020

neutrinoceros commented Jun 29, 2020

munkm commented Jun 29, 2020

neutrinoceros commented Jun 29, 2020

munkm Jun 29, 2020

munkm commented Jun 29, 2020 •

edited

Loading

neutrinoceros commented Jun 29, 2020

		def get_filenames_from_glob_pattern(pattern):
		epattern = os.path.expanduser(pattern)

Time series enhancements #2690

Time series enhancements #2690

Conversation

neutrinoceros commented Jun 25, 2020 • edited Loading

PR Summary

PR Checklist

neutrinoceros commented Jun 25, 2020

munkm left a comment • edited Loading

Choose a reason for hiding this comment

munkm Jun 25, 2020

Choose a reason for hiding this comment

munkm Jun 25, 2020

Choose a reason for hiding this comment

neutrinoceros Jun 25, 2020

Choose a reason for hiding this comment

neutrinoceros Jun 28, 2020

Choose a reason for hiding this comment

munkm Jun 29, 2020 • edited Loading

Choose a reason for hiding this comment

munkm Jun 29, 2020

Choose a reason for hiding this comment

neutrinoceros Jun 29, 2020 • edited Loading

Choose a reason for hiding this comment

munkm Jun 29, 2020

Choose a reason for hiding this comment

neutrinoceros commented Jun 26, 2020

matthewturk commented Jun 29, 2020

neutrinoceros commented Jun 29, 2020

munkm commented Jun 29, 2020

neutrinoceros commented Jun 29, 2020

munkm Jun 29, 2020

Choose a reason for hiding this comment

munkm commented Jun 29, 2020 • edited Loading

neutrinoceros commented Jun 29, 2020

neutrinoceros commented Jun 25, 2020 •

edited

Loading

munkm left a comment •

edited

Loading

munkm Jun 29, 2020 •

edited

Loading

neutrinoceros Jun 29, 2020 •

edited

Loading

munkm commented Jun 29, 2020 •

edited

Loading