Make sure positions are in code_length in get_radius function. #4091

brittonsmith · 2022-08-22T13:37:58Z

PR Summary

PR #4079 changed results for particle_radius calculation within a yt frontend embedded in ytree. After some digging, I discovered that positions for that frontend are returned in "unitary" units which, for that data, are not the same as "code_length". Unless we have a specification requiring positions to be returned in "code_length" for all frontends, this change is probably necessary. This may be quietly failing for other frontends.

PR Checklist

New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

neutrinoceros · 2022-08-22T15:24:56Z

Good catch. I note that unyt_array.to will unconditionally create a copy, so maybe we could make it conditional to preserve optimal performances when input units are already compliant.

matthewturk · 2022-08-22T15:26:30Z

Once upon a time in the long past we talked about having some types of unit arrays that were "IndexArrays" (@jzuhone worked on this for a while) where the units were immutable internally to the array. Because of the amount of time we spend doing stuff like unit checks and unit conversions even internally in selection_routines.pyx (and its included files) this would likely be a huge speedup if we could sidestep 100% of those checks. (For instance, a fun profile is to look at how time is spent computing ghost zones for a FLASH dataset.)

brittonsmith · 2022-08-22T19:03:43Z

Good catch. I note that unyt_array.to will unconditionally create a copy, so maybe we could make it conditional to preserve optimal performances when input units are already compliant.

Good points. I'll do this.

brittonsmith · 2022-08-23T09:21:05Z

Is this test failure related?

neutrinoceros · 2022-08-23T09:44:54Z

It's not. It's just a random failure. I note that it's the third time I'm seeing it this week, so maybe we'll need to do something about it, but for now let's just try again
@yt-fido test this please

yipihey · 2022-08-23T20:02:46Z

I had commented on this on the closed PR but perhaps it can help the discussion here?:

Perhaps we are mixing two concepts?
One is "code_units" and one is the "original_units", i.e. which units are used in the file we are using. I think it is advantageous to require frontends to not modify the numbers it reads from disk.
I think of two different use cases of yt.

Debugging a code that generates the data we are reading in. It is desirable that yt attempts to not modify the data it reads so we can have some certainty we are debugging the code that generates the data.
If I use yt to convert from one data format to another. E.g. I read all particle positions in enzo and then store them as a simpler hdf5 file where the positions are a simple array. In this case I want to make sure that I'm just changing the shape of the data but do not change any of the numbers.

The positive side effect that it also avoid unnecessary computation which saves a little time.

Is there already the concept of "original_units" ? I.e. give us a simple way to check in what units the data was provided?

brittonsmith · 2022-08-24T10:54:47Z

Is there already the concept of "original_units" ? I.e. give us a simple way to check in what units the data was provided?

The closest we have to this is the field info container (e.g., ds.field_info). For example, for the Enzo particle positions, it shows:

>>> ds.field_info['all', 'particle_position_x']
On-Disk Field ('all', 'particle_position_x'): (units: 'code_length', particle field)

In theory, get_radius could check the field_info to see what units the data should be in, then convert the relevant variables (e.g., center, domain_width, radius2) to that. This would likely be the solution that would result in converting units for the data array least frequently, but there is still the potential to mess this up if units are not checked at all. For example, cached data may have been converted to a different unit by another operation.

neutrinoceros · 2022-08-25T10:36:18Z

yt/fields/field_functions.py

@@ -18,8 +18,11 @@ def get_radius(data, field_prefix, ftype):
    if any(data.ds.periodicity):
        rdw = radius2.v
    for i, ax in enumerate("xyz"):
+        pos = data[ftype, f"{field_prefix}{ax}"]
+        if str(pos.units) != "code_length":
+            pos.convert_to_units("code_length")


So now we're converting in place, changing the data outputted by ytree. Are we certain this isn't going to cause problems in ytree ?
Also I don't think I understand why it can't be changed in ytree.

Unless we can guarantee that every yt frontend is reading particle positions in "code_length", the problem will not be limited to ytree. It makes me nervous that we are not checking units before operating on memory views.

I'm happy to switch it back to making a copy. Would the solution I proposed above be better? That is, should we check the units of the position field using the field info container and the convert the other variables to that?

Unless we can guarantee that every yt frontend is reading particle positions in "code_length", the problem will not be limited to ytree. It makes me nervous that we are not checking units before operating on memory views.

ok that makes sense.

I'm happy to switch it back to making a copy.

I think it'd be the best option because it's the most conservative approach for these edge cases where we're not sure what to expect, and also an incentive for frontend development to use "code_lenght" more uniformly since it allows to use the fast track where no copy is performed.

Would the solution I proposed above be better? That is, should we check the units of the position field using the field info container and the convert the other variables to that?

I guess I don't have strong opinions there. I think the current solution is already good in that it is probably the simplest.

I agree with sticking with the simplest solution for now. I'll switch thing back to using a copy and forgo any further optimization.

neutrinoceros

I think this is now in an uncontroversial state that should preserve both backward compatibility for ytree and recent optimization when available.

Make sure positions are in code_length in get_radius function.

5a234f1

brittonsmith added the bug label Aug 22, 2022

neutrinoceros added this to the 4.1.0 milestone Aug 22, 2022

neutrinoceros added the release critical Highest priority (in a milestone) label Aug 22, 2022

Only convert units if we need to.

6239c55

neutrinoceros reviewed Aug 25, 2022

View reviewed changes

Make a copy when converting position units.

98cab40

neutrinoceros approved these changes Aug 25, 2022

View reviewed changes

neutrinoceros enabled auto-merge August 25, 2022 18:19

neutrinoceros merged commit 379cd1a into yt-project:main Aug 25, 2022

brittonsmith deleted the rfix branch August 26, 2022 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure positions are in code_length in get_radius function. #4091

Make sure positions are in code_length in get_radius function. #4091

brittonsmith commented Aug 22, 2022

neutrinoceros commented Aug 22, 2022

matthewturk commented Aug 22, 2022

brittonsmith commented Aug 22, 2022

brittonsmith commented Aug 23, 2022

neutrinoceros commented Aug 23, 2022

yipihey commented Aug 23, 2022

brittonsmith commented Aug 24, 2022

neutrinoceros Aug 25, 2022

brittonsmith Aug 25, 2022

neutrinoceros Aug 25, 2022

brittonsmith Aug 25, 2022

neutrinoceros left a comment

Make sure positions are in code_length in get_radius function. #4091

Make sure positions are in code_length in get_radius function. #4091

Conversation

brittonsmith commented Aug 22, 2022

PR Summary

PR Checklist

neutrinoceros commented Aug 22, 2022

matthewturk commented Aug 22, 2022

brittonsmith commented Aug 22, 2022

brittonsmith commented Aug 23, 2022

neutrinoceros commented Aug 23, 2022

yipihey commented Aug 23, 2022

brittonsmith commented Aug 24, 2022

neutrinoceros Aug 25, 2022

Choose a reason for hiding this comment

brittonsmith Aug 25, 2022

Choose a reason for hiding this comment

neutrinoceros Aug 25, 2022

Choose a reason for hiding this comment

brittonsmith Aug 25, 2022

Choose a reason for hiding this comment

neutrinoceros left a comment

Choose a reason for hiding this comment