-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations to local inhibition #769
Conversation
Wow, I speedup to local inhibition has been long sought for! Thanks! 👍 #123 I'll review the PR and the linked discussion. I just hope the smaller neighborhood won't result in worse results (?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial quick review, I'll have to look at the forum thread. I like some of the optimizations, some I'm not sure are worth the trade off in readability / speed.
src/htm/types/Types.hpp
Outdated
@@ -156,6 +157,10 @@ static const htm::Real32 Epsilon = htm::Real(1e-6); | |||
#endif | |||
|
|||
|
|||
static const UInt SizeInt = sizeof(Int); | |||
static const UInt shftInt = (SizeInt*CHAR_BIT)-1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally won't be needed. if it is, define it only in the file where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally won't be needed. if it is, define it only in the file where needed.
Ok.
this is quite significant! 😮 Thank you. c++ MNIST example is another good benchmark. We'll see the gains on more platforms here. |
I'd appreciate it if you could let me know if the speed improvements I saw on my machine are reproducible.
There should be absolutely no change in results. Here's my reasoning (let me know if I'm wrong): The way the original code is structured, the program will run through each neighbor of a column, keeping count of how many neighbors that column has (numNeighbors) and keeping count of how many of those neighbors have a larger number of overlaps than the column being examined (numBigger). After having run through all the neighbors, the program checks if numBigger < (numNeighbors +1)*(C), where C is some predetermined number. However, in a wrapAround neighborhood, the number of neighbors of a column (the final value of numNeighbors) will always be predetermined by the radius (because the neighborhood wraps around, and therefore a column will never be "near an edge"). Thus we can know the final value of numNeighbors without counting neighbors. What the pull-request code does is calculate the final value of numNeighbors before checking the neighbors of a column, and then afterward, as it checks each neighbor, the code verifies if numBigger >= (numNeighbors +1)*(C) (the negation of the previously mentioned condition condition), and stops counting neighbors if that condition is true. This produces results that are equivalent to those of the original code because if the condition numBigger >= (numNeighbors +1)*(C) is true at any point, then it would still be true after counting all subsequent neighbors (as numBigger never decreases), and if the condition is not true at a given point, the program will continue to examine neighbors, as the original code would have.
I'm not familiar with that c++ MNIST one. Is that one the preferred benchmark then? I'll check to see if the pull-request has any effect on it. Thank you for your work, breznak. |
Thank you for the confirmed reasoning, I really like what the PR does (whatever the optimizations turn out).
perfect! Yes, the program behaves correctly.
You can run it in
I'd say both are, we figured the "benchmark_hotgym" is sometimes too artificial, and average results on real data are different. (MNIST takes quite a while to run, though). I'm comparing the speeds on |
Benchmarks of this PR (on my machine)
master
PR
I've confirmed your results from this PR on my machine, the avg speedup is bout 20% (24s from 31s), which is pretty significant!! Great work, thank you 💯
|
Okay, turns out I'm a failure. Every single one of the "micro-optimizations" actually made the code slower without the DEBUG flag on. My fault. I tried to revert those changes by making two additional commits to restore the code to the original format (current master branch), leaving only the "fewer neighbors" change. I'm not sure if I did that correctly (I've never used git or github before). Should I instead close this request and open a new one, to make things cleaner? Thanks again for everything, breznak, including the clarifications. I'll look into mnist_sp. |
What?! SP local is now around 12s! (compared to our master, and former this PR above #769 (comment) )
Yep, just remember "premature optimization is the root of all evil" :) Compilers are pushed to the limits by really smart people, so we mostly aim for the algorithmic improvements (as you did here) that yeld huge changes. Code micro-optimizations are better off left to the compiler.
No need, this PR is fine now as is. Really good job, thank you! |
I'm pretty sure it isn't. The removal of the false "micro-optimizations" should result in a drop of just 1-3s max. |
ok, must have been effect of utilization of my machine. Now I'm getting 19-25s on master, and 10-13s in this PR, which is about 2x speedup. |
@kineyev can you find some toggle in the PR here like "Allow maintainers to push changes to your branch"? I've made some cleanups and would attach that to this PR. |
"Allow edits from maintainers." ? It's checked. Is there anything else I need to do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kineyev this is good as is, really good work, thank you! 👍
Tell me, would you like to try pushing the speed even further? Here are 2 ideas:
- the code that computes
predN
only depends oninhibitionRadius_
which only changes atisUpdateRound()
. You could use that info and update thenumNeighbors
only then. I'm not sure if that worth it or if it'd give any improvement. - I'm not really fan of the
*Neighborhood
classes, see if the code could use a Topology_t (DefaultTopology
) class instead. (that still uses a Neighborhood, but is a better API for changing topologies). CC @ctrl-z-9000-times what do you think of this? - I'd like to try trading memory for speed and pre-computing
Neighborhood(column)
for each col and storing it in a map.- for
global
inhibition this depends only on column, so it should work. - for
local
depends on (col, inh radius); see the first bullet. - this would depend whether the function
advance_()
used inoperator++
in Neighborhoods takes considerable ammount of time. COuld you look at that?
- for
CC @dkeeney @ctrl-z-9000-times this PR does a great job at optimizing SP local inhibition performance for Do you think it's worth the trouble breaking the API and drop the wrapAround param?
|
I would like to help, sure, but I can't say for sure when I'll get around to doing it, since I won't have much free time in the next few weeks. I'll see what I can do though.
I'll check it out. Should be easy enough to verify.
I'll look into it. I'm not familiar with that Topology_t class.
I'll look into it as well, but I'll look at the Topology_t class first, otherwise it could be wasted effort. There is a significant amount of time spent on the on both the "operator*()" and "advance_()" functions of wrapping neighborhoods. I'll let you know as soon as I have something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really happy with the code and speed improvements now. Thank you @kineyev !
Possible follow-ups for discussion:
- drop the
wrapAround
param from SP (default to true) //cleanup, faster code - explore possibility of cacheing the Neighborhood results in memory
- use Topology instead of (directly) Neighborhood in SP
it's in the
good to know, so that would be worth it!
The Topology class would not save us here, the To the time/gain ratio is imho:
|
Really good job on this PR.
Lets wait until someone has time to look into this before we decide if it is worth it. |
Check far fewer neighbors in spatialpooler.cpp->local inhibition->wraparound;
Other miscellaneous optimizations resulting in fewer instructions in the compiled code.
As per https://discourse.numenta.org/t/spatial-pooler-local-inhibition-in-htm-core/7130