Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: add static typing to Dataset._get_field_info_helper #4229

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented Nov 27, 2022

PR Summary

While working on #4227 I stumbled again on this method, which turns out to be extremely delicate to correctly type.

Because this method is private, it doesn't need the amount of type flexibility it currently has: being strict on input type would make it much easier to maintain, and wouldn't take anything away from users.
I'll use this PR as an experimental playground to see how much of the codebase actually falls apart if I take away type flexibility and how much work is needed to fix it. I'm hoping the answer to both questions is "not that much".

@neutrinoceros neutrinoceros added the do not merge this change should not be merged until later label Nov 27, 2022
@neutrinoceros
Copy link
Member Author

Turns out the situation is much worse than I thought, as this method also gets called with very different data types (I think this happens with region objects), and we might receive something like

ftype = (slice(None, None, None), slice(None, None, None), 0.25)

I don't know yet how to improve type consistency without breaking anything. Hopefully I can get back to this soon.

@matthewturk
Copy link
Member

So, it's also entirely possible that we can get rid of the _last_freq check now that we don't allow aliases. It was originally put in place so that if you made two calls in rapid succession:

obj['some_ftype', 'some_fname']
obj['some_fname']

the second would implicitly use the some_ftype. For instance, this was commonly an issue when just accessing things like particle fields. But now that we don't allow aliases to be accessed without being explicit, then perhaps we don't need the complexity.

@neutrinoceros
Copy link
Member Author

Thank you @matthewturk for pointing this out. This is slightly orthogonal to what I initially wanted to accomplish here, but indeed I agree that we can get rid of this complexity, and this may be an opportunity to do so. I'll explore that in this PR too if it doesn't cause too much scope inflation !

@neutrinoceros neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch 5 times, most recently from d244813 to 1eadd1e Compare December 4, 2022 17:00
@neutrinoceros
Copy link
Member Author

@matthewturk do you think there's any reason to keep the _last_freq and _last_finfo attributes at all (other than historical) ? I've started working on getting them out completely but I'm not sure this is what you meant to suggest.

@neutrinoceros neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch 2 times, most recently from 6610011 to d6b4b56 Compare December 4, 2022 19:40
@matthewturk
Copy link
Member

@neutrinoceros as of right now, I cannot think of any reason to retain them. That being said I think if we can verify that people's workflows won't change any more than they already had to with the deprecation of aliased fields, we should be okay, but if we can't, I dunno.

@neutrinoceros
Copy link
Member Author

Ok. I think if I can make the transition internally without having to change tests, it should be safe to assume downstream workflows will no be affected.

It's starting to look like we can remove quite a bit of complexity without changing the observable behavior, so I'm hopeful this can be done.

@neutrinoceros neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from fc5ef50 to 240c58d Compare December 4, 2022 21:46
@neutrinoceros neutrinoceros changed the title EXP: tentative to add static typing to Dataset._get_field_info_helper TYP: add static typing to Dataset._get_field_info_helper Dec 4, 2022
@neutrinoceros neutrinoceros added enhancement Making something better dead code removing internal bits that have no effect refactor improve readability, maintainability, modularity and removed do not merge this change should not be merged until later labels Dec 4, 2022
@neutrinoceros neutrinoceros changed the title TYP: add static typing to Dataset._get_field_info_helper TYP: add static typing to Dataset._get_field_info_helper (⏰ wait for #4227) Dec 5, 2022
@neutrinoceros neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from 868debe to 28bf32f Compare December 5, 2022 11:24
@neutrinoceros
Copy link
Member Author

This should be pretty stable now. For reference, I've checked that this doesn't break yt_astro_analysis (the only call to _get_field_info there is stable).
I'll wait for #4227 to go in so I can use more specific type hints defined therein, after what this PR should be ready !

@@ -37,7 +37,7 @@ def _strip_ftype(field):


np.random.seed(int(0x4D3D3D3))
units = [base_ds._get_field_info(*f).units for f in fields]
units = [base_ds._get_field_info(f).units for f in fields]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, this is because it's always a tuple now right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More exactly, it's because we don't have a second positional argument anymore. Passing a full tuple as the first argument already worked on main, I just made it the only way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this line use base_ds._get_field_info(f) at all instead of of base_ds.field_info[f].units?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, probably no good reason, but this feels just slightly out of scope here. Do you mind if I leave the line as is ?

@neutrinoceros neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from 72d2c96 to 08c6866 Compare December 5, 2022 21:59
@neutrinoceros neutrinoceros changed the title TYP: add static typing to Dataset._get_field_info_helper (⏰ wait for #4227) TYP: add static typing to Dataset._get_field_info_helper Dec 5, 2022
@neutrinoceros neutrinoceros marked this pull request as ready for review December 6, 2022 08:45
matthewturk
matthewturk previously approved these changes Dec 16, 2022
Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine -- can you tell me what is holding us off from merging?

@neutrinoceros
Copy link
Member Author

@matthewturk nothing, it's ready to go

@mtryan83
Copy link
Contributor

mtryan83 commented Dec 21, 2022

Funny how 1b1b242 (the cleanup commit) breaks tests that were just added in #4175 ... I don't think I can afford just dropping these tests (which of course crossed my mind for a sec), because the errors are pretty serious:

Failed: DID NOT RAISE <class 'yt.utilities.exceptions.YTFieldNotFound'>

This seems to come from a change in how _determine_fields or _tupleize_field works. I'm not really sure what the intended behavior is. Prior to 1b1b242, you'd get

dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")
---------------------------------------------------------------------------
YTFieldNotFound                           Traceback (most recent call last)
Cell In [6], line 2
      1 dd = firefly_test_dataset.all_data()
----> 2 dd._determine_fields("pt2only_field")

File ~/workspace/yt/yt/data_objects/data_containers.py:1542, in YTDataContainer._determine_fields(self, fields)
   1539     continue
   1541 ftype, fname = self._tupleize_field(field)
-> 1542 finfo = self.ds._get_field_info(ftype, fname)
   1544 # really ugly check to ensure that this field really does exist somewhere,
   1545 # in some naming convention, before returning it as a possible field type
   1546 if (
   1547     (ftype, fname) not in self.ds.field_info
   1548     and (ftype, fname) not in self.ds.field_list
   (...)
   1552     and (ftype, fname) not in self._container_fields
   1553 ):

File ~/workspace/yt/yt/data_objects/static_output.py:867, in Dataset._get_field_info(self, ftype, fname)
    866 def _get_field_info(self, ftype, fname=None):
--> 867     field_info, candidates = self._get_field_info_helper(ftype, fname)
    869     if field_info.name[1] in ("px", "py", "pz", "pdx", "pdy", "pdz"):
    870         # escape early as a bandaid solution to
    871         # https://github.com/yt-project/yt/issues/3381
    872         return field_info

File ~/workspace/yt/yt/data_objects/static_output.py:987, in Dataset._get_field_info_helper(self, ftype, fname)
    985             self._last_finfo = self.field_info[(ftype, fname)]
    986             return self._last_finfo, candidates
--> 987 raise YTFieldNotFound(field=INPUT, ds=self)

YTFieldNotFound: Could not find field ('all', 'pt2only_field') in ParticleData.
Did you mean:
	('pt2', 'pt2only_field')

Now you get

dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")

[('pt2', 'pt2only_field')]

Here firefly_test_dataset has two particle types (pt1 and pt2) and the field pt2only_field is only in pt2.

If the new behavior is intended, the two failing tests should be removed/ignored (and test_field_unique_string_specification should probably be tweaked as well) . I can do both of those, just let me know how.

@neutrinoceros
Copy link
Member Author

Now you get

dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")

[('pt2', 'pt2only_field')]

Here firefly_test_dataset has two particle types (pt1 and pt2) and the field pt2only_field is only in pt2.

If the new behavior is intended, the two failing tests should be removed/ignored

Indeed it looks to me that the result on this branch is exactly what I expect, so I will adjust the tests accordingly.
Thank you so much @mtryan83 for spontaneously helping me out, you just saved me a couple hours of vain head scratching, I really appreciate it.

@neutrinoceros
Copy link
Member Author

@mtryan83 so for now I've just dropped the broken tests entirely, but it made me realise that match_any_particle_types is a very confusing flag name (I know it because I suggested it and I still don't really get what it's supposed to mean 🙉), so maybe we should re-iterate over that, but I'm thinking it'd be easier to do after the present PR goes in, if that sounds acceptable to you.

@neutrinoceros neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from d9e6a4c to 183425c Compare December 28, 2022 18:58
@matthewturk
Copy link
Member

So what do we think, should this go ahead?

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Dec 30, 2022

Looks to me like we have green light from everybody involved.

@matthewturk matthewturk merged commit 5d08a39 into yt-project:main Dec 30, 2022
@neutrinoceros neutrinoceros deleted the exp_static_typing_get_field_info_helper branch December 30, 2022 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dead code removing internal bits that have no effect enhancement Making something better refactor improve readability, maintainability, modularity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants