Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier: use map to allow sparse categories WIP #680

Open
wants to merge 1 commit into
base: hotgym_predictor
Choose a base branch
from

Conversation

breznak
Copy link
Member

@breznak breznak commented Sep 20, 2019

use map internally for categories_, instead of vector, which allows us
to have sparse {1,2,999} categories (=3 total). Instead, with vector
this would have to be 999 categories!

Do not merge, for review comments only. Attempt to switch from using continuous label indices to non-cont. labels in Classifier.

use map internally for categories_, instead of vector, which allows us
to have sparse {1,2,999} categories (=3 total). Instead, with vector
this would have to be 999 categories!
@breznak breznak mentioned this pull request Sep 20, 2019
2 tasks
@ctrl-z-9000-times
Copy link
Collaborator

I'm not against you changing the classifier to use sparse categories. There are pro's and con's for both map and vector solutions. Most importantly, the map solution is easier to use than the vector.

@Thanh-Binh
Copy link

@breznak do you get any better performance after changing to use map?

@breznak
Copy link
Member Author

breznak commented Nov 5, 2019

better performance after changing to use map?

I didn't get back to this PR yet, and haven't tested performance extensively.

OT: Coincidentely, I'm just running performance benchmarks right now, the master branch is faster since I've last measured (comparison with Martin's GPU version.). And I have prepared another PR that simplifies and makes the TM faster. I'm now tuning PGO (but results seem worse than w/o it (??))
CC @marty1885

@marty1885
Copy link

marty1885 commented Nov 5, 2019

Generally linear searching through a std::vector is faster than the O(log n) lookup in a map when you have say < 120 elements. I usually code up a STL compatible linear_map for a small look-up table in performance critical applications. Well.. I can't share the code as my implementation was for a commercial project.

Side note: The SDRClassifer in my library is in fact CLAClassifer (which I believe have been deprecated in this repo). I'm hesitate to pull in NN stuff. That is a steep slippery slope.

@breznak
Copy link
Member Author

breznak commented Nov 5, 2019

td::vector is faster than the O(log n) lookup in a map when you have say < 120 elements. I usually code up a STL compatible linear_map for a small look-up table in performance critical applications

speaking in pseudocode, this is an internal switch:

if small: use vector
else: use hashmap

?

We might try that for Connections, but I'm not sure I'll go such microoptization (yet).

Side note: The SDRClassifer in my library is in fact CLAClassifer (which I believe have been deprecated in this repo). I'm hesitate to pull in NN stuff. That is a steep slippery slope.

I'm not sure what CLAClassifier was, but the "NN stuff" means you don't want to use "other than HTM NN stuff"? As the current SDRClassifier is a simple softmax regression mapping SDR -> result

@Thanh-Binh
Copy link

Thanks all

@marty1885
Copy link

marty1885 commented Nov 6, 2019

@breznak

speaking in pseudocode, this is an internal switch:

Exactly. I wonder if we will ever need a hash map tho. Even DNNs rarely do > 120 class classification.

As the current SDRClassifier is a simple softmax regression mapping SDR -> result

Maybe I'm wrong. I thought the SDRClassifer in NuPIC/HTM.core is a simple MLP with softmax activation. I could easily go too far and make Etaler support advanced Deep Learning. (I that a good thing)?

I'm not sure what CLAClassifier was

Described here. https://www.youtube.com/watch?v=QZBtaP_gcn0
Then I guess it becomes KNN Classifer in NuPIC. https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/knn_classifier.py

@Thanh-Binh
Copy link

@marty1885 you are right. Currently, SDRClassifier has only one hidden layer + softmax. Maybe by adding more (recurrent) layer we can get better classification quality

@breznak
Copy link
Member Author

breznak commented Nov 6, 2019

will [we] ever need a hash map tho. Even DNNs rarely do > 120 class classification ?

  • ok, that's another valid point. Maybe I got carried away and we're fine with the vectors
    • my other motivation for hashmap was that we wouldn't have to do transformation to indices, but use the raw values ("cat")

SDRClassifer in NuPIC/HTM.core is a simple MLP with softmax activation

just a single layer + softmax, but right..

SDRClassifier has only one hidden layer + softmax. Maybe by adding more (recurrent) layer we can get better classification quality

adding more (and reccurent only make sense for sequences) should not make any difference if we think HTM works. It is fair to test if (and I hope it should not) it makes things better. But if HTM works correctly, all the (incl. temporal) information is in SDR and well distributed for trivial classification. So from the point of view of the theory improved classifiers should not (need to) exist in the repo.

Actually, my plan is to go in the opposite direction and introduce biiological classifier. Works like 1-NN. I plan to test it on MNIST, create "etalons" for each digit (0-9)-> SDR_0-9. Then classification would be nearest neighbor (=max overlap) over the etalons and the tried sample.

@Thanh-Binh
Copy link

@breznak

biiological classifier.

Can you tell more about it?

@breznak
Copy link
Member Author

breznak commented Nov 6, 2019

Works like 1-NN. I plan to test it on MNIST, create "etalons" for each digit (0-9)-> SDR_0-9

should be really simple and builds on the principles of SDR representations.
For finite set of labels (classification, not regression) we can train HTM's SP, then produce etalon SDR for each of the categories. Classification is then reduced to simple "which looks closes to the representation I have produced now?" (max overlap). If the SP works correctly as described, this should be all needed for recognizing discriminative information in the patterns.

@marty1885
Copy link

marty1885 commented Nov 6, 2019

@breznak
May you describe the algorithm in more details? It sounds like my implementation of CLAClassifer.

@breznak
Copy link
Member Author

breznak commented Nov 6, 2019

May you describe the algorithm in more details? It sounds like my implementation of CLAClassifer.

let's demonstrate it on a simple example MNIST:

  1. unsupervised training of HTM (SP) on train data
  2. for each label (0-9)
  • pick some (here simplty 1) data point
  • get SDR of this data, note the relation label ~ SDR_{label}
  • this is all needed "training" for classifier.
  1. classification:
  • for any data d
  • get SDR_d
  • go through the stored SDRs in 2/, find closes match (= min over SDR.overlap(SDR_d, SDR_{label})
  • return label associated with the closes match SDR from above.

Basically, it assumes all needed info is already in the SDR, and that overlap describes closeness of representations, which translates to closeness/semantic similarity of their origins (raw data).

@marty1885
Copy link

I am 60% sure they are the same algorithm.

Again, in a MINIST example.

  1. Train a SP unsupervised

    • Assuming SP generates a n bit SDR
  2. Initialize 10 arrays a of size n

  3. For each data d in training set (or a subset of training set)

    • data d is associated with label l
    • add d to a[l] (vector addition)
  4. Classifcation

    • for any data d' and an threshold th. 0 < th <= 1
    • th is a threshold to reduce the effect of noise
    • Go through all arrays in 2/3. Compute s[i] = SDR.overlap(d', a[i] > max(a[i])*th)
    • return argmax(s), which is the best match we can find

@Thanh-Binh
Copy link

@breznak @marty1885 if I remember well, your algo looks like what htmresearch does for object classification using sensor and location information. By learning, it use union of all SDR representation for the same object. By inference, it calculates the overlap between the current SDR and the learned SDR representation.
I am very interested in your classification results vs Numenta SDRClassifier

@breznak
Copy link
Member Author

breznak commented Nov 7, 2019

seems the same in principle to me.

add d to a[l] (vector addition)

d is your raw data, or SDR (sp.compute(d))?
And another detail is, as I understood it, your vector a[L] is integer elements, while in my version it's binary, as it's still a SDR. What you're using sounds like sim-hashing (added and described in our new SimHashDocumentEncoder).

By learning, it use union of all SDR representation for the same object. By inference, it calculates the overlap between the current SDR and the learned SDR representation.

yes, this is the principle. IMHO the only plausible way to do classification with HTM.

@marty1885
Copy link

d is your raw data, or SDR (sp.compute(d))?

Ahh, I forget that. Yes, d is indeed sp.compute(x)

while in my version it's binary

I tried binary initially. But it turned out to be a bad idea as noise ended up turning the SDR into a Dense DR.

@breznak
Copy link
Member Author

breznak commented Nov 7, 2019

I tried binary initially. But it turned out to be a bad idea as noise ended up turning the SDR into a Dense DR.

interesting, indeed. It would suggest the SP had not learned properly (too small SDR, not enough sparse, ...) or that the concepts (of "1 one" etc) are learned as more entities.

Would your code be easily applicable to this codebase, or am I better off writing it from scratch?

@marty1885
Copy link

marty1885 commented Nov 8, 2019

Would your code be easily applicable to this codebase, or am I better off writing it from scratch?

It is easy to code from scratch. You can use my code from HTMHelper. Although I'm using a dense array instead of SDR there.

It would suggest the SP had not learned properly

I suspect it is the problem of MNIST itself. Images does not fulfill the properties of a SDR. ex. 1 and 7 can have many overlapping bits even though they are different numbers. So SP will have a hard time separating them.

@dkeeney dkeeney mentioned this pull request Nov 30, 2019
@Zbysekz Zbysekz closed this Jun 26, 2020
@Zbysekz Zbysekz deleted the classifier_map_wip branch June 26, 2020 06:57
@breznak breznak restored the classifier_map_wip branch June 26, 2020 07:04
@breznak breznak reopened this Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants