-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropout WIP #535
base: master
Are you sure you want to change the base?
Dropout WIP #535
Conversation
@@ -471,10 +471,16 @@ class SparseDistributedRepresentation : public Serializable | |||
* @param rng The random number generator to draw from. If not given, this | |||
* makes one using the magic seed 0. | |||
*/ | |||
void addNoise(Real fractionNoise); | |||
void addNoise(Real fractionNoise); //TODO the name is confusing, rename to shuffle ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it OK to rename this to shuffle() and have addNoise for the new fn? @ctrl-z-9000-times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at your new addNoise
function and I think it will have issues with keeping the sparsity at a reasonable level. I think that the sparsity of an SDR after this method is called on it will always tend towards 50%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be indeed wrong. What I intended:
- have SDR of current input
- flip 0.01% bits
- have a new SDR
- flip 0.01%bits
So the sparsity would remain the same (actially grow, because we have much more off bits, so flipping on would be more probable). But it should remain the x% (2%) + 0.001%
void SparseDistributedRepresentation::addNoise2(const Real probability, Random& rng) { | ||
NTA_ASSERT( probability >= 0.0f and probability <= 1.0f ); | ||
const ElemSparse numFlip = static_cast<ElemSparse>(size * probability); | ||
if (numFlip == 0) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to write an effective implementation, but this has problem with p << size. Should we bother with such cases? return/assert?
input.addNoise2(0.01f, rng_); //TODO apply at synapse level in Conn? | ||
//TODO fix for probability << input.size | ||
//TODO apply killCells to active output? | ||
//TODO apply dropout to segments? (so all are: synapse, segment, cell/column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proof of concept dropout applied to input (as noise) and output (as killCells).
- I'd prefer this be applied in Connections (in adaptSegment?)
- where to apply?
- ideally all of: SP, TM. & synapse, segment, cell, column
- but that would be computationally infeasible, so..?
Deterministic test are still expected to fail, until we decide on values and update the exact outputs. |
Maybe I don't understand this change, but it seems this will make the HTM perform worse. While it's interesting that the HTM keeps working even when some of its components are disabled, I don't think this belongs in the mainline. Maybe instead you could make an example/demonstration of these fault-tolerance properties (like numenta did in their SP paper). |
it's commonly used in deeplearning where it improves a lot. To be exact, dropout helps to prevent overfitting. While HTM is already more robust to that (sparse SDR for output, stimulus threshold on input segments) I want to see if this helps and how much. I am looking for biological confirmation and datasets to proof if this works better. (It does slowdown a bit but that is an implementation detail).
umm..no components are disabled permanently, this temporarily flips the bit, adding noise to the input. |
Hotgym example internally uses dropout |
WIP dropout implementation
EDIT:
Motivation: I believe this change can be considered biological (noise on signal during transfer) and also acording to deep learning (more robust representations). It should be supported by measurable of SDR quality #155