Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the RBM layer to learn binary codes for large scale image retrieval #274

Closed
wants to merge 3 commits into from
Closed

Conversation

kloudkl
Copy link
Contributor

@kloudkl kloudkl commented Mar 29, 2014

Using the Caffe reference ImageNet model, the features extracted for the training images of the ILSVRC 2013 image classification task are more than 25GB stored in leveldb. It is evident that the occupied space must be reduced for efficient and scalable large scale image retrieval. In recent years, Restricted

Boltzmann Machine (RBM) has been successfully used to embed real-valued features into binary codes for computation and storage efficient document retrieval [1] and image retrieval [2][3][4]. Deep Boltzmann Machines (DBM) consisted of stacked Multilayer RBM are able to map the floating point features of similar images or texts into neighborhoods in the Hamming space. Thus the feature dimensions are reduced at least 32 times and the similarities among the original features are reserved. The final binary features of the ILSVRC 2013 training images would be less than 1GB and fit comfortably into a single CPU memory card or GPU card.

This work is in progress.

[1] Salakhutdinov R. R, and Hinton, G. E. Semantic Hashing. Proceedings of the SIGIR Workshop on Information Retrieval and Applications of Graphical Models, Amsterdam. 2007.
[2] A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In Proc. of CVPR, pages 1–8, 2008.
[3] Ranzato, M., Krizhevsky, A. and Hinton, G. E. Factored 3-way restricted Boltzmann machines for modeling natural images. Proc. Thirteenth International Conference on Artificial Intelligence and Statistics. 2010.
[4] Krizhevsky, A. and Hinton, G.E. Using Very Deep Autoencoders for Content-Based Image Retrieval. European Symposium on Artificial Neural Networks ESANN-2011, Bruges, Belgium. 2011.

@shelhamer
Copy link
Member

Are you sure you want to do this as a Caffe layer instead of training RBMs on Caffe-extracted features with an existing package like Theano?

@kloudkl kloudkl mentioned this pull request Apr 4, 2014
@kloudkl
Copy link
Contributor Author

kloudkl commented Apr 4, 2014

Good idea! But the trained model still needs to be converted to Caffe format to facilitate online applications.

@kloudkl
Copy link
Contributor Author

kloudkl commented Apr 8, 2014

It is very straightforward to train a Gaussian Binary RBM using pylearn2 to convert the real-valued Caffe features into binary codes.

  1. Convert Caffe features into pylearn2 dataset
import leveldb
from pylearn2.datasets.dense_design_matrix import DenseDesignMatrix
from pylearn2.utils import serial
sys.path.append('../../python/caffe/proto')
from caffe_pb2 import *

db = leveldb.LevelDB('features_leveldb')
max_num_features = 1290
features = []
while index < max_num_features:
    try:
        key = str(index)
        value = db.Get(key)
        num_records += 1
        datum = Datum.FromString(value)
        feature = list(datum.float_data)
        features.append(feature)
    except:
        pass
    index += 1

matrix = DenseDesignMatrix(X = np.array(features, np.float32))
serial.save('features.joblib', matrix)
  1. Train a Gaussian Binary RBM model
PYLEARN_ROOT=/path/to/pylearn2
sudo PATH=/usr/local/cuda/bin:$PATH THEANO_FLAGS="device=gpu,floatX=float32" $PYLEARN_ROOT/pylearn2/scripts/train.py features_grbm_smd.yaml
  1. Use the learned weight matrix
model = serial.load(model_file)
## This is a numpy array with the shape (input_dim, output_dim)
weights = model.get_weights()
binary_features = features * weights

The binary_features are still stored as np.float32 and need to be further compacted into bits if you want to save more space.

@kloudkl kloudkl closed this Apr 8, 2014
@shelhamer
Copy link
Member

@kloudkl now that you have done this in pylearn2, it would be nice to compare it with a Caffe implementation. Will you resume your work on this?

@kloudkl
Copy link
Contributor Author

kloudkl commented Apr 10, 2014

As #308 has almost solved #189 which is one of the issues of the milestone 1.0, I will first finish #244 to solve #148 as soon as possible.

@shelhamer shelhamer mentioned this pull request Oct 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants