-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comment #26
Comments
I don't really understand why you've decided to add this comment, but you're right. If you want to see applications, look at https://github.com/dnbaker/frp for orthogonal JL transforms and Gaussian Kernel Projections. FALCONN-LIB uses it here for a LSH, as you've mentioned. |
This is the nearest thing to a WHT central station to discuss it a little. I'm interested in adding soft associative memory to neural networks that they can actually learn to use (ordinary RAM being very difficult to learn how to use.) You can also use locality sensitive hashing again to switch in different blocks of weight memory to associative memory to make truly vast memory systems that are still fast yet have soft characteristics. Or use LSH to switch between modules in deep neural networks to boost efficiency. One simple thing you can try is to use a WHT or random projection after the final layer of a neural network. That can help the output neurons cooperate more efficiently. |
I recommend this paper for adaptive random spinners. The way to do this is to have a There's isn't support for an FHT layer in PyTorch yet, but it could be quite useful. The gradient is simply the FHT again; however, there is the issue of normalization, which this library does not perform. (It's smart to avoid it, though; if you repeatedly apply it, you can normalize post hoc.) |
Thanks for the random spinners link. |
Then there are few less useful things. If you feed the output of a vector to vector random projection back to its input you get a kind of oscillator in which information will reverberate around in. You can probably understand that with digital signal processing math. You can use that in reservoir computing. If you connect those together the stored information spreads out in a similar way with entropy in a physical system. The intermediate calculations in the (out of place) WHT algorithm are wavelet like. If you normalized each layer of calculation then you can pick out the highest energy/magnitude entry in the entire system, note its value and location, then remove it by setting its value to zero. Then propagate to every other layer the effects of that zeroing. Then repeat a number of times. Anyway the random projections are related to random point picking on a hypersphere: http://mathworld.wolfram.com/HyperspherePointPicking.html This website is slightly messed up due to injected advertising however there is some information about using WHT random projections with smoothing for compressive sensing: I don't know how useful that was. It is information largely in the form of hints that you can follow up or not depending on your disposition. There is a book that has some information about the WHT: |
Edit (changed link): |
The central limit theorem applies not only to the sums, it also applies to sums and differences (eg. WHT.) Especially where there is any kind of randomness involved the output of the WHT will tend toward the Gaussian distribution. That can be quite awkward sometime. You can get the uniform distribution by normalizing the vector length of the output of the WHT and applying the Gaussian CDF function to each element. The vector elements then range between 0 and 1. Alternatively with no need to normalize you can apply atan2(x,y) to 2 elements at a time and get a uniform over the range of that function. Sometimes the WHT is too slow at a time cost of O(nln(n)). You could use what I call the hstep function if you only need different mixtures (windows onto) an input vector, say for associative memory. The problem with the out of place algorithm is it hammers memory bandwidth, you read data, you do +-, you write data. Even the CPU caches don't have enough bandwidth for that, the algorithm is all data movement. You can use the in-place algorithm to address that issue. |
Fixed Filter Bank Neural Networks. Using the fast Walsh Hadamard transform as the fixed filter bank. |
ReLU is a literal switch. An electrical switch is n volts in, n volts out when on. Zero volts out when off. |
If you apply a predetermined random pattern of sign flipping to an input array followed by the fast (Walsh) Hadamard transform you get a random projection of the input data. You can use that for unbiased dimension reduction or increase. Repeat the process for better quality.
The outputs of the random projection strongly follow the Gaussian distribution because the central limit theorem applies not only to sums but also sums and differences.
Anyway if you binarize the outputs of the the random projection you have a locality sensitive hash. If you interpret the output bits as +1,-1 you can multiply each bit by a weight and sum to get a recalled value. To train, recall and calculate the error. Divide by the number of bits. Then add or subtract that from each weight as appropriate to make the error zero. In that way you have created an associative memory.
Because the error has been distributed non-similar input will result in the error fragments being multiplied by arbitrary +1,-1 hash bit values. Again you can invoke the central limit theorem to see the error fragments sum to zero mean, low level Gaussian noise.
The memory capacity is just short of the number of bits. If you use the system under capacity you get some repetition code error correction.
Basically when you store an new memory in that system all the previous memories get contaminate by a little bit of Gaussian noise. However for an under capacity set of training data you can drive that noise to zero by repeated presentation.
This also provides a means to understand extreme learning machines and reservoir computing as associative memory.
The text was updated successfully, but these errors were encountered: