Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time series enhancements #2690

Merged
merged 13 commits into from
Jun 30, 2020

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented Jun 25, 2020

PR Summary

This is part of an effort to harmonise and simplify the data loading api.

This PR includes the following improvements to DatasetSeries:

  • deprecate DatasetSeries.from_filenames() because it's redundant since DatasetSeries.__new__ was added, and I suspect it is not used
  • fix a docstring inconsistency
  • add tests for pattern detection
  • refactor the pattern parsing function (simplify + add '~' token expansion, to improve consistency between DatasetSeries() and yt.load())

⚠️ minor backward incompatibility
I changed the error raised in the pattern parsing function to FileNotFoundError to better reflect the issue, this could potentially break downstream code that relies on this function throwing YTOutputNotIdentified, though it seems very unlikely.

PR Checklist

  • Code passes flake8 checker
  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

@neutrinoceros neutrinoceros added enhancement Making something better backwards incompatible This change will change behavior api-consistency naming conventions, code deduplication, informative error messages, code smells... labels Jun 25, 2020
@neutrinoceros neutrinoceros force-pushed the time_series_enhancements branch from 8529390 to e7ec520 Compare June 25, 2020 18:28
@neutrinoceros
Copy link
Member Author

So atm it looks like the failing tests are exactly the two I just added. This is because they are written for pytest and our CI is still running nose.
Note that the switch to pytest is being experimented in #2676

Copy link
Member

@munkm munkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I left a few inline comments. Somebody who knows the time series stuff may be able to offer some insight here.

The deprecation of from_filenames seems reasonable given the existence of __new__, especially because the docstring on DatasetSeries has the same kwargs available as from_filenames, so it seems like there's no added flexibility from the latter. I am curious why it is still around (since from_filenames is from ~2013 and __new__ is from ~2014).

yt/data_objects/time_series.py Show resolved Hide resolved
yt/data_objects/tests/test_time_series.py Outdated Show resolved Hide resolved
file_list = td_filenames
else:
raise YTOutputNotIdentified(filenames, {})
def get_filenames_from_glob_pattern(pattern):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the docstring you name this arg outputs, not pattern. They should be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't really understand why this needed to be changed from filenames? They should be output files after all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this function's name indicate that it's meant to be passed a pattern. It's true that, in practice, it receives the "outputs" param from DatasetSeries.__new__(), but only in as a special-case handling where the parameter doesn't actually hold "outputs" yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you’re not convinced I can change it, it’s not that crucial. Let me know !

Copy link
Member

@munkm munkm Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but the way I'm reading this there's no separation in functionality right? outputs will always be sent to this function. There's no separation if it's a list or a pattern. In new we see outputs = get_filenames_from_glob_pattern(outputs), so it isn't always going to be sent a pattern and not a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, do we have a test verifying that a list works?

Copy link
Member Author

@neutrinoceros neutrinoceros Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't show up in the lines I've touched but there is a separation in DatasetSeries.__new__

if isinstance(outputs, str):

I have no idea if we have a test for lists. I considered adding one but I don't know of any sample data that'd be appropriate for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just verified that I could use a list with enzo_tiny_cosmology.

That said, I know this is nitpicky but I think it's good to be consistent with how we send args in defintions vs. in the place they're used in our code. I would prefer if the definition also used outputs so we're consistent, even if there's an implication it will be a pattern of some sort.

yt/data_objects/time_series.py Show resolved Hide resolved
yt/data_objects/time_series.py Show resolved Hide resolved
yt/data_objects/time_series.py Outdated Show resolved Hide resolved
@neutrinoceros neutrinoceros mentioned this pull request Jun 26, 2020
3 tasks
@neutrinoceros
Copy link
Member Author

This can now viewed as a companion PR to #2695

@matthewturk
Copy link
Member

@munkm I also don't remember why we had both. I think there may have been a time where we were anticipating making a DatasetSeries from non-filenames, i.e., multiple actual Dataset objects.

This looks good to me.

@neutrinoceros
Copy link
Member Author

I'm happy to update this if needed once we resolve the above conversations @munkm 😄

@munkm
Copy link
Member

munkm commented Jun 29, 2020

Wait, I'm confused. The thing you just reverted was ok based on our discussions?

@neutrinoceros
Copy link
Member Author

Wait, I'm confused. The thing you just reverted was ok based on our discussions?

yes ! Did I misinterpret the following

If this update makes it more inclusive, that's fine with me!

?

Comment on lines 60 to 61
def get_filenames_from_glob_pattern(pattern):
epattern = os.path.expanduser(pattern)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_filenames_from_glob_pattern(pattern):
epattern = os.path.expanduser(pattern)
def get_filenames_from_glob_pattern(outputs):
epattern = os.path.expanduser(outputs)

@munkm
Copy link
Member

munkm commented Jun 29, 2020

Did I misinterpret the following

I think maybe (or maybe I did?). I was trying to say I was ok with what you were saying.

Eta: sorry, my bad! You didn't misinterpret. I did! I thought the reversion had already happened and you were going back to pre-pre functionality!

@neutrinoceros
Copy link
Member Author

Looks like we settled on something satisfying for everyone now ➡️ :partyparot:

@munkm munkm merged commit cfdb582 into yt-project:master Jun 30, 2020
@neutrinoceros neutrinoceros deleted the time_series_enhancements branch July 21, 2020 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-consistency naming conventions, code deduplication, informative error messages, code smells... backwards incompatible This change will change behavior enhancement Making something better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants