Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Optimized radius calculation. #4079

Merged
merged 4 commits into from
Aug 16, 2022

Conversation

yipihey
Copy link
Contributor

@yipihey yipihey commented Aug 14, 2022

PR Summary

Changes one of the routines that calculated spherical radius.
get_radius in fields/field_functions.py
It runs approximately twice as fast as the original routine it replaces.
Performance benefits come from minimizing allocations and triggering unit conversions.

The script I used for testing is here:

import yt

for i in range(10):
    ds = yt.frontends.enzo.EnzoDataset(fn)
    ds._periodicity = (False, False, False) # Checking periodicity is slow in yt
    readit = ds.all_data()  # already defines xyz

    r = readit[("index", "radius")]

yt.SlicePlot(ds, "z", ("index", "radius")).save()

where fn can be obtained in a separate script as

import yt
fn = yt.load_sample("HiresIsolatedGalaxy").filename

Executes approximately twice as fast and avoids unnecessary memory
allocation and unyt calls.
@neutrinoceros
Copy link
Member

pre-commit.ci autofix

@neutrinoceros
Copy link
Member

I made minor adjustments to the original post (showed what fn is supposed to be, added import yt, made the loop run 10 times instead of 1).
I find that this reduces the runtime for the example script by about 10%
I'm impressed how much you're able to gain with such small edits, this is great !

@neutrinoceros neutrinoceros added enhancement Making something better performance labels Aug 14, 2022
neutrinoceros
neutrinoceros previously approved these changes Aug 14, 2022
Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like at least one other maintainer to sign off given how sensitive this kind of computation is, but this looks sensible to me !

yt/fields/field_functions.py Outdated Show resolved Hide resolved
Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @yipihey -- I really like where this is going. If we could get @jzuhone to take a quick look I'm happy with it going in.

np.sqrt(radius2, radius2)
# Using the views into the array is not changing units and as such keeps
# from having to do symbolic manipulations
np.sqrt(radius2.d, radius2.d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

yt/fields/field_functions.py Outdated Show resolved Hide resolved
@neutrinoceros neutrinoceros enabled auto-merge (squash) August 14, 2022 12:13
@neutrinoceros neutrinoceros disabled auto-merge August 14, 2022 12:14
@neutrinoceros neutrinoceros enabled auto-merge (squash) August 14, 2022 12:15
@neutrinoceros neutrinoceros merged commit eb3e840 into yt-project:main Aug 16, 2022
@yipihey yipihey deleted the optimized_radius branch August 16, 2022 18:47
np.subtract(
data[ftype, f"{field_prefix}{ax}"].in_base(unit_system.name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the unit conversion (i.e., in_base(unit_system.name)) has changed radius values for frontends where the default unit is not "code_length". I'm not sure how much performance is lost by adding it back, but I think it is necessary unless we mandate position units always come back in "code_length", which I don't think we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes perhaps we are mixing two concepts.
One is "code_units" and one is the "original_units", i.e. which units are used in the file we are using. I think it is advantageous to require frontends to not modify the numbers it reads from disk.
I think of two different use cases of yt.

  1. Debugging a code that generates the data we are reading in. It is desirable that yt attempts to not modify the data it reads so we can have some certainty we are debugging the code that generates the data.
  2. If I use yt to convert from one data format to another. E.g. I read all particle positions in enzo and then store them as a simpler hdf5 file where the positions are a simple array. In this case I want to make sure that I'm just changing the shape of the data but do not change any of the numbers.

The positive side effect that it also avoid unnecessary computation which saves a little time.

Is there already the concept of "original_units" ? I.e. give us a simple way to check in what units the data was provided?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Making something better performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants