Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up combine_wave_lists using new merge_sorted function #1243

Merged
merged 4 commits into from
Aug 28, 2023

Conversation

rmjarvis
Copy link
Member

@jchiang87 found that a significant amount of time in some recent imSim runs was spend doing the combine_wave_list function when adding the bulge+disk+knots galaxy components. This is partly because the SEDs probably have more wavelengths than we really need. But regardless, this function was not the most efficient bit of code. This PR significantly improves the algorithm. In the typical case, it's more than 3x faster. And when all the inputs are the same (which was the case for imSim in Jim's test) it's almost 5x faster.

The new underlying functionality is a new utility function merge_sorted, which merges two or more numpy arrays when the inputs are all already sorted. This is a pretty simple function (familiar to those who know merge sort), but there is apparently no native numpy function that does this. I did find sortednp who also implements this, but I just went ahead and wrote my own rather than add a new dependency.

The timing script I wrote compares the new version with the old one, and with Jim's suggested change of checking for equality in order to skip the merge entirely. (The new version also includes this check, btw.) Here are the timing results on my laptop:

Time for 10000 iterations of combine_wave_list
old time =  0.6298755840000001
jims time =  0.6358873750000003
new time =  0.18641429200000026
Time for 10000 iterations of combine_wave_list with identical wave_lists
old time =  0.7706999579999998
jims time =  0.1962956250000012
new time =  0.16020912499999973

Copy link
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't take a close look yet. I can go through in detail if you feel it is needed.

I do think the tests are missing edge cases. Passing mixtures of length zero, length one, and identical vs non-identical arrays might rustle a bug. Also NaNs?

@rmjarvis
Copy link
Member Author

Good ideas Matt. Thanks. The only tricky one was getting NaNs to work the same way that np.unique treats them (puts them last). The rest were already fine. :)

@rmjarvis rmjarvis added this to the v2.5 milestone Aug 26, 2023
@rmjarvis rmjarvis added optimization/performance Related to the speed and/or memory consumption of some aspect of the code desc Of possible interest to LSST DESC members looking for a project labels Aug 26, 2023
@rmjarvis rmjarvis merged commit 8633357 into main Aug 28, 2023
9 checks passed
@rmjarvis rmjarvis deleted the merge_sorted branch August 28, 2023 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
desc Of possible interest to LSST DESC members looking for a project optimization/performance Related to the speed and/or memory consumption of some aspect of the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants