-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale Invariant CNN (SICNN) #576
Comments
One can run a single net on a multi-scale pyramid by weight sharing or run on whatever scale is desired by on-the-fly net reshaping in #594. A single extraction for a deep feature invariant to all scaling is not possible due to filter discretization, nonlinearities, and so on (although one can down and upsample features as they please). |
Angjoo Kanazawa, Abhishek Sharma, David Jacobs, Locally Scale-invariant Convolutional Neural Network, Deep Learning and Representation Learning Workshop: NIPS 2014. |
Hi, |
Must be great, but wouldn't that take much more time? i mean you need to transform every blob several time for max-pooling out right? |
Yeah, it does take more memory and time. Now I recommend you checking out this recent arxiv paper http://arxiv.org/abs/1506.02025 |
The Spatial Pyramid Pooling net of #548 improves the speed of Regions with Convolutional Neural Network Features by extracting features for each image only once while R-CNN does so for each region of interest in an image. The most important insight of SPP-net is that only the classifiers or the fully-connected layers require fixed-length vector. The convolution layers do not have to constrain the sizes of the images. The experiments show that full image is better than cropped ones and larger scales lead to higher accuracy.
The SPP-net simulate the multiple scales with fixed-size networks. The "scale-mismatch" problem is not solved. In #308, multi-scale feature extraction is achieved by packing the multiple scales of a image in a single large image. They can only process pre-defined discrete scales.
The authentic scale invariant CNN means that the extracted features can be scaled up or down to get the features of the images undergoing the same scaling. The feature of an image only has to be extracted once by the network.
Any ideas about the existing works in this direction?
The text was updated successfully, but these errors were encountered: