TYP: add static typing to Dataset._get_field_info_helper #4229

neutrinoceros · 2022-11-27T17:22:51Z

PR Summary

While working on #4227 I stumbled again on this method, which turns out to be extremely delicate to correctly type.

Because this method is private, it doesn't need the amount of type flexibility it currently has: being strict on input type would make it much easier to maintain, and wouldn't take anything away from users.
I'll use this PR as an experimental playground to see how much of the codebase actually falls apart if I take away type flexibility and how much work is needed to fix it. I'm hoping the answer to both questions is "not that much".

neutrinoceros · 2022-11-28T09:22:44Z

Turns out the situation is much worse than I thought, as this method also gets called with very different data types (I think this happens with region objects), and we might receive something like

ftype = (slice(None, None, None), slice(None, None, None), 0.25)

I don't know yet how to improve type consistency without breaking anything. Hopefully I can get back to this soon.

matthewturk · 2022-12-01T14:58:31Z

So, it's also entirely possible that we can get rid of the _last_freq check now that we don't allow aliases. It was originally put in place so that if you made two calls in rapid succession:

obj['some_ftype', 'some_fname']
obj['some_fname']

the second would implicitly use the some_ftype. For instance, this was commonly an issue when just accessing things like particle fields. But now that we don't allow aliases to be accessed without being explicit, then perhaps we don't need the complexity.

neutrinoceros · 2022-12-04T13:07:26Z

Thank you @matthewturk for pointing this out. This is slightly orthogonal to what I initially wanted to accomplish here, but indeed I agree that we can get rid of this complexity, and this may be an opportunity to do so. I'll explore that in this PR too if it doesn't cause too much scope inflation !

neutrinoceros · 2022-12-04T18:16:01Z

@matthewturk do you think there's any reason to keep the _last_freq and _last_finfo attributes at all (other than historical) ? I've started working on getting them out completely but I'm not sure this is what you meant to suggest.

matthewturk · 2022-12-04T19:50:07Z

@neutrinoceros as of right now, I cannot think of any reason to retain them. That being said I think if we can verify that people's workflows won't change any more than they already had to with the deprecation of aliased fields, we should be okay, but if we can't, I dunno.

neutrinoceros · 2022-12-04T20:45:54Z

Ok. I think if I can make the transition internally without having to change tests, it should be safe to assume downstream workflows will no be affected.

It's starting to look like we can remove quite a bit of complexity without changing the observable behavior, so I'm hopeful this can be done.

neutrinoceros · 2022-12-05T13:59:48Z

This should be pretty stable now. For reference, I've checked that this doesn't break yt_astro_analysis (the only call to _get_field_info there is stable).
I'll wait for #4227 to go in so I can use more specific type hints defined therein, after what this PR should be ready !

matthewturk · 2022-12-05T15:08:43Z

doc/helper_scripts/show_fields.py

@@ -37,7 +37,7 @@ def _strip_ftype(field):


 np.random.seed(int(0x4D3D3D3))
-units = [base_ds._get_field_info(*f).units for f in fields]
+units = [base_ds._get_field_info(f).units for f in fields]


hm, this is because it's always a tuple now right?

More exactly, it's because we don't have a second positional argument anymore. Passing a full tuple as the first argument already worked on main, I just made it the only way to do it.

why does this line use base_ds._get_field_info(f) at all instead of of base_ds.field_info[f].units?

I have no idea, probably no good reason, but this feels just slightly out of scope here. Do you mind if I leave the line as is ?

matthewturk

This seems fine -- can you tell me what is holding us off from merging?

neutrinoceros · 2022-12-16T17:35:23Z

@matthewturk nothing, it's ready to go

mtryan83 · 2022-12-21T21:02:09Z

Funny how 1b1b242 (the cleanup commit) breaks tests that were just added in #4175 ... I don't think I can afford just dropping these tests (which of course crossed my mind for a sec), because the errors are pretty serious:
Failed: DID NOT RAISE <class 'yt.utilities.exceptions.YTFieldNotFound'>

This seems to come from a change in how _determine_fields or _tupleize_field works. I'm not really sure what the intended behavior is. Prior to 1b1b242, you'd get

dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")
---------------------------------------------------------------------------
YTFieldNotFound                           Traceback (most recent call last)
Cell In [6], line 2
      1 dd = firefly_test_dataset.all_data()
----> 2 dd._determine_fields("pt2only_field")

File ~/workspace/yt/yt/data_objects/data_containers.py:1542, in YTDataContainer._determine_fields(self, fields)
   1539     continue
   1541 ftype, fname = self._tupleize_field(field)
-> 1542 finfo = self.ds._get_field_info(ftype, fname)
   1544 # really ugly check to ensure that this field really does exist somewhere,
   1545 # in some naming convention, before returning it as a possible field type
   1546 if (
   1547     (ftype, fname) not in self.ds.field_info
   1548     and (ftype, fname) not in self.ds.field_list
   (...)
   1552     and (ftype, fname) not in self._container_fields
   1553 ):

File ~/workspace/yt/yt/data_objects/static_output.py:867, in Dataset._get_field_info(self, ftype, fname)
    866 def _get_field_info(self, ftype, fname=None):
--> 867     field_info, candidates = self._get_field_info_helper(ftype, fname)
    869     if field_info.name[1] in ("px", "py", "pz", "pdx", "pdy", "pdz"):
    870         # escape early as a bandaid solution to
    871         # https://github.com/yt-project/yt/issues/3381
    872         return field_info

File ~/workspace/yt/yt/data_objects/static_output.py:987, in Dataset._get_field_info_helper(self, ftype, fname)
    985             self._last_finfo = self.field_info[(ftype, fname)]
    986             return self._last_finfo, candidates
--> 987 raise YTFieldNotFound(field=INPUT, ds=self)

YTFieldNotFound: Could not find field ('all', 'pt2only_field') in ParticleData.
Did you mean:
	('pt2', 'pt2only_field')

Now you get

dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")

[('pt2', 'pt2only_field')]

Here firefly_test_dataset has two particle types (pt1 and pt2) and the field pt2only_field is only in pt2.

If the new behavior is intended, the two failing tests should be removed/ignored (and test_field_unique_string_specification should probably be tweaked as well) . I can do both of those, just let me know how.

neutrinoceros · 2022-12-22T10:11:24Z

Now you get
dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")

[('pt2', 'pt2only_field')]
Here firefly_test_dataset has two particle types (pt1 and pt2) and the field pt2only_field is only in pt2.

If the new behavior is intended, the two failing tests should be removed/ignored

Indeed it looks to me that the result on this branch is exactly what I expect, so I will adjust the tests accordingly.
Thank you so much @mtryan83 for spontaneously helping me out, you just saved me a couple hours of vain head scratching, I really appreciate it.

neutrinoceros · 2022-12-22T10:23:44Z

@mtryan83 so for now I've just dropped the broken tests entirely, but it made me realise that match_any_particle_types is a very confusing flag name (I know it because I suggested it and I still don't really get what it's supposed to mean 🙉), so maybe we should re-iterate over that, but I'm thinking it'd be easier to do after the present PR goes in, if that sounds acceptable to you.

…field_info

…freq and Dataset._last_finfo)

…et_field_info

matthewturk · 2022-12-30T13:38:43Z

So what do we think, should this go ahead?

neutrinoceros · 2022-12-30T13:46:11Z

Looks to me like we have green light from everybody involved.

neutrinoceros added the do not merge this change should not be merged until later label Nov 27, 2022

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch 5 times, most recently from d244813 to 1eadd1e Compare December 4, 2022 17:00

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch 2 times, most recently from 6610011 to d6b4b56 Compare December 4, 2022 19:40

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from fc5ef50 to 240c58d Compare December 4, 2022 21:46

neutrinoceros changed the title ~~EXP: tentative to add static typing to Dataset._get_field_info_helper~~ TYP: add static typing to Dataset._get_field_info_helper Dec 4, 2022

neutrinoceros added enhancement Making something better dead code removing internal bits that have no effect refactor improve readability, maintainability, modularity and removed do not merge this change should not be merged until later labels Dec 4, 2022

neutrinoceros changed the title ~~TYP: add static typing to Dataset._get_field_info_helper~~ TYP: add static typing to Dataset._get_field_info_helper (⏰ wait for #4227) Dec 5, 2022

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from 868debe to 28bf32f Compare December 5, 2022 11:24

matthewturk reviewed Dec 5, 2022

View reviewed changes

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from 72d2c96 to 08c6866 Compare December 5, 2022 21:59

neutrinoceros changed the title ~~TYP: add static typing to Dataset._get_field_info_helper (⏰ wait for #4227)~~ TYP: add static typing to Dataset._get_field_info_helper Dec 5, 2022

neutrinoceros marked this pull request as ready for review December 6, 2022 08:45

matthewturk previously approved these changes Dec 16, 2022

View reviewed changes

neutrinoceros mentioned this pull request Dec 17, 2022

TYP: add missing type hints #4251

Merged

neutrinoceros dismissed chrishavlin’s stale review via fc1f614 December 21, 2022 20:08

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from b00377c to fc1f614 Compare December 21, 2022 20:08

neutrinoceros added 18 commits December 28, 2022 19:53

TYP: narrow down type of Dataset.particle_types

189c7d1

TYP: add static typing to Dataset._get_field_info_helper

93508b3

RFC: internal var renaming

e0dc8bc

RFC: cleanup calls to _get_field_info using the 2 args signature

71f5754

RFC: drop second positional argument in Dataset._get_field_info

ccd65c5

TYP: reduce runtime type flexibility in private routine Dataset._get_…

4764124

…field_info

RFC: drop unused caching from Dataset._get_field_info (Dataset._last_…

e84267a

…freq and Dataset._last_finfo)

TYP: improve exception specificity in Dataset._get_field_info

0e4b2a5

RFC: simplify one-liner

8f8875b

MNT: adjust YTFieldNotFound error message to evolutions in Dataset._g…

5c950af

…et_field_info

RFC: simplify unused branch

d7b5391

MNT: cleanup unused private method

ade1c28

RFC: simplify internal logic in Dataset._get_field_info_helper

85759e5

RFC: fix duck typing in Dataset._get_field_info

b7c622c

TYP: use yt._typing explicit type annotations more extensively

35e836f

RFC: avoid unpacking field tuples only to reassemble in the same scope

f9dc662

TYP: add type hints to YTDataContainer._first_matching_field

43c1004

TST: drop a couple badly defined tests

183425c

neutrinoceros force-pushed the exp_static_typing_get_field_info_helper branch from d9e6a4c to 183425c Compare December 28, 2022 18:58

matthewturk enabled auto-merge December 30, 2022 13:53

matthewturk approved these changes Dec 30, 2022

View reviewed changes

matthewturk merged commit 5d08a39 into yt-project:main Dec 30, 2022

neutrinoceros deleted the exp_static_typing_get_field_info_helper branch December 30, 2022 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: add static typing to Dataset._get_field_info_helper #4229

TYP: add static typing to Dataset._get_field_info_helper #4229

neutrinoceros commented Nov 27, 2022 •

edited

Loading

neutrinoceros commented Nov 28, 2022

matthewturk commented Dec 1, 2022

neutrinoceros commented Dec 4, 2022

neutrinoceros commented Dec 4, 2022

matthewturk commented Dec 4, 2022

neutrinoceros commented Dec 4, 2022

neutrinoceros commented Dec 5, 2022

matthewturk Dec 5, 2022

neutrinoceros Dec 5, 2022

chrishavlin Dec 21, 2022

neutrinoceros Dec 21, 2022

matthewturk left a comment

neutrinoceros commented Dec 16, 2022

mtryan83 commented Dec 21, 2022 •

edited

Loading

neutrinoceros commented Dec 22, 2022

neutrinoceros commented Dec 22, 2022

matthewturk commented Dec 30, 2022

neutrinoceros commented Dec 30, 2022 •

edited

Loading

TYP: add static typing to Dataset._get_field_info_helper #4229

TYP: add static typing to Dataset._get_field_info_helper #4229

Conversation

neutrinoceros commented Nov 27, 2022 • edited Loading

PR Summary

neutrinoceros commented Nov 28, 2022

matthewturk commented Dec 1, 2022

neutrinoceros commented Dec 4, 2022

neutrinoceros commented Dec 4, 2022

matthewturk commented Dec 4, 2022

neutrinoceros commented Dec 4, 2022

neutrinoceros commented Dec 5, 2022

matthewturk Dec 5, 2022

Choose a reason for hiding this comment

neutrinoceros Dec 5, 2022

Choose a reason for hiding this comment

chrishavlin Dec 21, 2022

Choose a reason for hiding this comment

neutrinoceros Dec 21, 2022

Choose a reason for hiding this comment

matthewturk left a comment

Choose a reason for hiding this comment

neutrinoceros commented Dec 16, 2022

mtryan83 commented Dec 21, 2022 • edited Loading

neutrinoceros commented Dec 22, 2022

neutrinoceros commented Dec 22, 2022

matthewturk commented Dec 30, 2022

neutrinoceros commented Dec 30, 2022 • edited Loading

neutrinoceros commented Nov 27, 2022 •

edited

Loading

mtryan83 commented Dec 21, 2022 •

edited

Loading

neutrinoceros commented Dec 30, 2022 •

edited

Loading