-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: add static typing to Dataset._get_field_info_helper #4229
TYP: add static typing to Dataset._get_field_info_helper #4229
Conversation
Turns out the situation is much worse than I thought, as this method also gets called with very different data types (I think this happens with region objects), and we might receive something like
I don't know yet how to improve type consistency without breaking anything. Hopefully I can get back to this soon. |
So, it's also entirely possible that we can get rid of the
the second would implicitly use the |
Thank you @matthewturk for pointing this out. This is slightly orthogonal to what I initially wanted to accomplish here, but indeed I agree that we can get rid of this complexity, and this may be an opportunity to do so. I'll explore that in this PR too if it doesn't cause too much scope inflation ! |
d244813
to
1eadd1e
Compare
@matthewturk do you think there's any reason to keep the |
6610011
to
d6b4b56
Compare
@neutrinoceros as of right now, I cannot think of any reason to retain them. That being said I think if we can verify that people's workflows won't change any more than they already had to with the deprecation of aliased fields, we should be okay, but if we can't, I dunno. |
Ok. I think if I can make the transition internally without having to change tests, it should be safe to assume downstream workflows will no be affected. It's starting to look like we can remove quite a bit of complexity without changing the observable behavior, so I'm hopeful this can be done. |
fc5ef50
to
240c58d
Compare
868debe
to
28bf32f
Compare
This should be pretty stable now. For reference, I've checked that this doesn't break yt_astro_analysis (the only call to |
@@ -37,7 +37,7 @@ def _strip_ftype(field): | |||
|
|||
|
|||
np.random.seed(int(0x4D3D3D3)) | |||
units = [base_ds._get_field_info(*f).units for f in fields] | |||
units = [base_ds._get_field_info(f).units for f in fields] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this is because it's always a tuple now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More exactly, it's because we don't have a second positional argument anymore. Passing a full tuple as the first argument already worked on main, I just made it the only way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this line use base_ds._get_field_info(f)
at all instead of of base_ds.field_info[f].units
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, probably no good reason, but this feels just slightly out of scope here. Do you mind if I leave the line as is ?
72d2c96
to
08c6866
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine -- can you tell me what is holding us off from merging?
@matthewturk nothing, it's ready to go |
b00377c
to
fc1f614
Compare
This seems to come from a change in how dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")
---------------------------------------------------------------------------
YTFieldNotFound Traceback (most recent call last)
Cell In [6], line 2
1 dd = firefly_test_dataset.all_data()
----> 2 dd._determine_fields("pt2only_field")
File ~/workspace/yt/yt/data_objects/data_containers.py:1542, in YTDataContainer._determine_fields(self, fields)
1539 continue
1541 ftype, fname = self._tupleize_field(field)
-> 1542 finfo = self.ds._get_field_info(ftype, fname)
1544 # really ugly check to ensure that this field really does exist somewhere,
1545 # in some naming convention, before returning it as a possible field type
1546 if (
1547 (ftype, fname) not in self.ds.field_info
1548 and (ftype, fname) not in self.ds.field_list
(...)
1552 and (ftype, fname) not in self._container_fields
1553 ):
File ~/workspace/yt/yt/data_objects/static_output.py:867, in Dataset._get_field_info(self, ftype, fname)
866 def _get_field_info(self, ftype, fname=None):
--> 867 field_info, candidates = self._get_field_info_helper(ftype, fname)
869 if field_info.name[1] in ("px", "py", "pz", "pdx", "pdy", "pdz"):
870 # escape early as a bandaid solution to
871 # https://github.com/yt-project/yt/issues/3381
872 return field_info
File ~/workspace/yt/yt/data_objects/static_output.py:987, in Dataset._get_field_info_helper(self, ftype, fname)
985 self._last_finfo = self.field_info[(ftype, fname)]
986 return self._last_finfo, candidates
--> 987 raise YTFieldNotFound(field=INPUT, ds=self)
YTFieldNotFound: Could not find field ('all', 'pt2only_field') in ParticleData.
Did you mean:
('pt2', 'pt2only_field') Now you get dd = firefly_test_dataset.all_data()
dd._determine_fields("pt2only_field")
[('pt2', 'pt2only_field')] Here firefly_test_dataset has two particle types ( If the new behavior is intended, the two failing tests should be removed/ignored (and test_field_unique_string_specification should probably be tweaked as well) . I can do both of those, just let me know how. |
Indeed it looks to me that the result on this branch is exactly what I expect, so I will adjust the tests accordingly. |
@mtryan83 so for now I've just dropped the broken tests entirely, but it made me realise that |
…freq and Dataset._last_finfo)
d9e6a4c
to
183425c
Compare
So what do we think, should this go ahead? |
Looks to me like we have green light from everybody involved. |
PR Summary
While working on #4227 I stumbled again on this method, which turns out to be extremely delicate to correctly type.
Because this method is private, it doesn't need the amount of type flexibility it currently has: being strict on input type would make it much easier to maintain, and wouldn't take anything away from users.
I'll use this PR as an experimental playground to see how much of the codebase actually falls apart if I take away type flexibility and how much work is needed to fix it. I'm hoping the answer to both questions is "not that much".