This repository has been archived by the owner on Oct 28, 2023. It is now read-only.
generated from EmbarkStudios/opensource-template
-
Notifications
You must be signed in to change notification settings - Fork 83
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve performance by about 50% (#14)
* Improve performance by about 50% I ran cargo flamegraph, and it turns out a huge portion of the runtime was spent in find_match and find_better_match. It's a very, very hot loop. Almost all of the work done in the inner loop (find_better_match) is a function of two u8s... it can be precomputed! Also, the alpha masks can be rendered into these precomputed cost functions, avoiding the need to do any alpha computations in the loop. I ran all the examples and couldn't find any visible artifacts. * Fix bugs: exp() numerical precision, and loop bounds bug I found a few small bugs while looking at @zicklag's comments on #14 First: there is a numerical precision bug with the calculation of distance gaussians. The exp() function used to be f64::exp(), and I was using f32::exp(). Second: there were missing entries in the precomputed function table. Loop bounds are exclusive... but 256u8 is not a u8... so it needs 0..=255u8 which was made for this situation.
- Loading branch information
1 parent
d7b087b
commit fd1b0f0
Showing
1 changed file
with
66 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters