-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activations incorrectly applied in Wavenet models. #125
Comments
Well-spotted! First reaction: this may explain #101. |
Real quick:
Actually, I don't think so: The "standard" WaveNet configuration uses NeuralAmpModelerCore/NAM/wavenet.cpp Line 36 in cd92997
which is ok. |
Check out #131 and lmk if you agree that this resolves the issue of incorrectness. (No guarantees on efficiency, but looking at the CPU usage of NeuralAmpModelerPlugin with this it's ballpark what I'd expect.) I'm happy to take a separate Issue & PR if you have suggestions to improve performance. |
Yes, that resolves the issue. Performance impact: I feel duty bound to tell you that there is a significant performance risk, although I, personally, would be inclined to leave it until an actual performance issue comes up. This is not on the 99% code-path. There is a potential fix, which might provide a 30-50% improvement in a model that actually turns on gating (which is going run twice as slowly anyway). The fix would not improve performance in the 99% case. This is hotspot code. For the non-gated case, ~13.8% of total execution time is spent in activations::ActivationFastTanh::apply. of which 7.4% gets spent doing the fastTanH function (which is properly vectorized). (80% of total execution time is spent in _Layer::process). Since you asked, and I happened to know. :-) It didn't quite make my cut-list for optimization targets that are worth pursuing. But there's something odd going on there. It's hard to account for the 6.4% non-SIMD overhead . The profiling is for non-gated code, so we're talking about 8x n-frames and 16x n-frames matrices. But there's still a LOT of non-SIMD code getting executed there. The risk is that the overhead will get much worse if wavenet is passing 16x1 data. The fix would be to add the following function to the activation interface: virtual void activation::apply(Eigen::Block block). which would allow compilers to vectorize much more efficiently. Currently there's a virtual function call in the loop (activation::apply), across which compilers cannot optimize. The "is it a multiple of 4" check would get run once for each 8x_num_frames or 16x_num_frames block, instead of once for each 8x1 or 16x1 column array. If you have an actual model that actually uses gating, I can provide you with a quick answer as to whether there's a performance problem if you like. |
Thanks; this is great info. I'd be happy to take PRs to improve the code if you (or I, or anyone else) has bandwidth, but it's not critical to me at the moment. Thanks again for the original issue; it gave me the hint that got me to resolve #101 🎉 |
Marking as resolved since they're no longer incorrect; tracking efficiency improvements in #133 |
I'm sorry I'm unable to push this change. I have an existing fork, which I'm using as a submodule in PiPedal, and the submodule is fragile, to say the least. Were it not for that, I'd push to a clean fork. My fork has diverged to the point where I'm not entirely comfortable pushing everything I have to you. But I think I you need this commit. And github won't let me create a second fork.
Here's the commit for the code. Let me know if there's a better way to do this. There may be some way to pull on this specific commit; but I am unfortunately a git illiterate. If there's a better way to do this, let me know.
rerdavies@102968f
The issue:
Wavenet.cpp
makes the following call:wavenet.cpp: 50
(and a 2nd similar call, a couple of lines further on). Because this is a row-based Eigen:Block, the rows and columns of the block are NOT contiguous in memory, so the existing overload (the first in the original code) must apply the activation column by column. As written, the pattern in which the activations are applied are pretty non-sensical.
There are six other occurrences of calls to activation->apply, but all of them use column-based blocks, whose data is stored in contiguous memory. As a result, they take the second overload of_activation->apply (in the original code), so they all do the right thing. So the revision does not change the behaviour of code in other models.
I propose the change rather hesitantly. I think you need to check to see whether the training code for wavenet replicates the bug. If the models are TRAINED with this bug, then applying this fix would probably be a bad thing.
I did try various Wavenet models to see if the fix makes a difference. Curiously, it does not change the sound of the amp models significantly, as far as I can tell. The fix DOES apply to the Wavenet model used by most models on ToneHunt. So no audible difference in sound output; but the fix is definitely correct.
The text was updated successfully, but these errors were encountered: