You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wasn't able to get better performance than baseline (gin) for ARGB version on Windows:
Routes tried:
Various IPP methods like separated convolution. These were faster, but broke down at higher radii, probably because under the hood they are a 2D kernel.
The "rotated" vector implementation (where the queue is vertical) with FloatVectorOperations — this was consistently slow
Rotated vector implementation in IPP — again, performance seems to be worse than 4x the single channel, no matter how it's implemented (looping around the channels, allocating 4 channels worth of queue/temp storage, etc)
Things to try:
Investigate whether Alpha needs to be blurred at all, given the pixels are premultiplied.
Custom dot-product with the kernel
The "journey of a pixel" sliding kernel idea from my blog
The text was updated successfully, but these errors were encountered:
I wasn't able to get better performance than baseline (gin) for ARGB version on Windows:
Routes tried:
Things to try:
The text was updated successfully, but these errors were encountered: