ENH: Optimized radius calculation. #4079

yipihey · 2022-08-14T02:13:36Z

PR Summary

Changes one of the routines that calculated spherical radius.
get_radius in fields/field_functions.py
It runs approximately twice as fast as the original routine it replaces.
Performance benefits come from minimizing allocations and triggering unit conversions.

The script I used for testing is here:

import yt

for i in range(10):
    ds = yt.frontends.enzo.EnzoDataset(fn)
    ds._periodicity = (False, False, False) # Checking periodicity is slow in yt
    readit = ds.all_data()  # already defines xyz

    r = readit[("index", "radius")]

yt.SlicePlot(ds, "z", ("index", "radius")).save()

where fn can be obtained in a separate script as

import yt
fn = yt.load_sample("HiresIsolatedGalaxy").filename

Executes approximately twice as fast and avoids unnecessary memory allocation and unyt calls.

neutrinoceros · 2022-08-14T06:13:41Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

neutrinoceros · 2022-08-14T06:23:23Z

I made minor adjustments to the original post (showed what fn is supposed to be, added import yt, made the loop run 10 times instead of 1).
I find that this reduces the runtime for the example script by about 10%
I'm impressed how much you're able to gain with such small edits, this is great !

neutrinoceros

I'd like at least one other maintainer to sign off given how sensitive this kind of computation is, but this looks sensible to me !

yt/fields/field_functions.py

matthewturk

Thank you, @yipihey -- I really like where this is going. If we could get @jzuhone to take a quick look I'm happy with it going in.

matthewturk · 2022-08-14T09:37:52Z

yt/fields/field_functions.py

-    np.sqrt(radius2, radius2)
+    # Using the views into the array is not changing units and as such keeps
+    # from having to do symbolic manipulations
+    np.sqrt(radius2.d, radius2.d)


I like this.

yt/fields/field_functions.py

brittonsmith · 2022-08-22T13:40:48Z

yt/fields/field_functions.py

        np.subtract(
-            data[ftype, f"{field_prefix}{ax}"].in_base(unit_system.name),


The removal of the unit conversion (i.e., in_base(unit_system.name)) has changed radius values for frontends where the default unit is not "code_length". I'm not sure how much performance is lost by adding it back, but I think it is necessary unless we mandate position units always come back in "code_length", which I don't think we do.

Yes perhaps we are mixing two concepts.
One is "code_units" and one is the "original_units", i.e. which units are used in the file we are using. I think it is advantageous to require frontends to not modify the numbers it reads from disk.
I think of two different use cases of yt.

Debugging a code that generates the data we are reading in. It is desirable that yt attempts to not modify the data it reads so we can have some certainty we are debugging the code that generates the data.

If I use yt to convert from one data format to another. E.g. I read all particle positions in enzo and then store them as a simpler hdf5 file where the positions are a simple array. In this case I want to make sure that I'm just changing the shape of the data but do not change any of the numbers.

The positive side effect that it also avoid unnecessary computation which saves a little time.

Is there already the concept of "original_units" ? I.e. give us a simple way to check in what units the data was provided?

yipihey added 2 commits August 13, 2022 18:47

Optimized get_radius function.

e6d15b3

Executes approximately twice as fast and avoids unnecessary memory allocation and unyt calls.

Clean up comments

699c2ad

[pre-commit.ci] auto fixes from pre-commit.com hooks

3aaeac6

for more information, see https://pre-commit.ci

neutrinoceros added enhancement Making something better performance labels Aug 14, 2022

neutrinoceros previously approved these changes Aug 14, 2022

View reviewed changes

yt/fields/field_functions.py Outdated Show resolved Hide resolved

matthewturk reviewed Aug 14, 2022

View reviewed changes

ENH: save an unused array copy

81b0ac8

neutrinoceros dismissed their stale review via 81b0ac8 August 14, 2022 12:04

neutrinoceros enabled auto-merge (squash) August 14, 2022 12:13

neutrinoceros disabled auto-merge August 14, 2022 12:14

neutrinoceros enabled auto-merge (squash) August 14, 2022 12:15

jzuhone approved these changes Aug 16, 2022

View reviewed changes

neutrinoceros merged commit eb3e840 into yt-project:main Aug 16, 2022

yipihey deleted the optimized_radius branch August 16, 2022 18:47

brittonsmith mentioned this pull request Aug 22, 2022

Make sure positions are in code_length in get_radius function. #4091

Merged

2 tasks

brittonsmith reviewed Aug 22, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Optimized radius calculation. #4079

ENH: Optimized radius calculation. #4079

yipihey commented Aug 14, 2022 •

edited by neutrinoceros

Loading

neutrinoceros commented Aug 14, 2022

neutrinoceros commented Aug 14, 2022

neutrinoceros left a comment

matthewturk left a comment

matthewturk Aug 14, 2022

brittonsmith Aug 22, 2022

yipihey Aug 22, 2022

		np.subtract(
		data[ftype, f"{field_prefix}{ax}"].in_base(unit_system.name),

ENH: Optimized radius calculation. #4079

ENH: Optimized radius calculation. #4079

Conversation

yipihey commented Aug 14, 2022 • edited by neutrinoceros Loading

PR Summary

neutrinoceros commented Aug 14, 2022

neutrinoceros commented Aug 14, 2022

neutrinoceros left a comment

Choose a reason for hiding this comment

matthewturk left a comment

Choose a reason for hiding this comment

matthewturk Aug 14, 2022

Choose a reason for hiding this comment

brittonsmith Aug 22, 2022

Choose a reason for hiding this comment

yipihey Aug 22, 2022

Choose a reason for hiding this comment

yipihey commented Aug 14, 2022 •

edited by neutrinoceros

Loading