Contrastive loss layer for training siamese nets #959

nickcarlevaris · 2014-08-21T14:00:06Z

Hi All,

I've started work on a contrastive loss layer that, when combined with weight sharing #546, can be used to train siamese nets.

The layer implements:
loss = 1/2 (y d + (1-y) \max(margin-d, 0))
d = \sum_i (a_i - b_i)^2
where d is the distance between two features a and b, and y is binary, indicating if the two features are similar or dissimilar.

This loss function was proposed in:
Raia Hadsell, Sumit Chopra, Yann LeCun "Dimensionality Reduction by Learning an Invariant Mapping"

I still need to implement the GPU version and flesh out the tests, but before I go too far, is this something that you would be interested in merging? I can also add a small example based on MNIST for documentation if that is of interest.

Thanks,
Nick

shelhamer · 2014-08-21T16:53:56Z

A PR including this loss layer in CPU and GPU implementations, with tests, and an example siamese net model on MNIST would certainly be welcome!

It will be nice to bundle a use case of weight sharing to make it less of an experts-only feature.

nickcarlevaris · 2014-08-25T18:24:59Z

This is ready for review if you have a second. I've pushed the CPU and GPU implementations, tests, and an example in examples/siamese. I also rebased it against dev.

Let me know if there are any updates or changes you would like me to make.

Thanks,
Nick

ashafaei · 2014-08-25T23:38:32Z

src/caffe/layers/contrastive_loss_layer.cpp

+      diff_sq_.cpu_data(),  // (a_i-b_i)^2
+      summer_vec_.cpu_data(),
+      Dtype(0.0),
+      dist_sq_.mutable_cpu_data());  // \Sum (a_i-b_i)^2


Is there a reason you're not using the dot product as in the Euclidean Loss layer? I think the dot product method should be faster.

caffe_sub( count, bottom[0]->cpu_data(), bottom[1]->cpu_data(), diff_.mutable_cpu_data()); Dtype dot = caffe_cpu_dot(count, diff_.cpu_data(), diff_.cpu_data());

ashafaei · 2014-08-25T23:52:14Z

Great job @nickcarlevaris.

nickcarlevaris · 2014-08-26T18:10:45Z

@ashafaei, thanks for taking a look.

Unlike the Euclidean Loss layer, which needs the total sum-of-squares difference between the bottom blobs, the contrastive loss layer needs the squared distances between each row of the bottom blobs. This is stored in dist_sq_ in the code. Originally, I didn't use the dot product because I thought calling it once per row would it would be slower.

Based on your comments I went back and tried a few things --- it turns out that for cpu_forward, it is faster to use the dot product and just call it once per row. However, for the gpu_forward, it was much faster to not call the dot product multiple times, and instead do the elementwise difference and square, followed by a matrix vector multiplication to sum along the rows.

I've updated the PR accordingly, using the dot product for the cpu_forward and the matrix version for the gpu_forward.

amiralush · 2014-08-31T19:56:13Z

@nickcarlevaris thanks for the siamese example. I've walked it through and it seems to be converging nicely on the mnist dataset. However when I tried using it on a subset of 200 categories from Imagenet It doesn't converge. Could you speculate on the cause? I've tried different variations of networks including the imagenet architecture with pre-trained weigths. Nothing seems to work.

Thanks again!
Amir A.

shelhamer · 2014-09-01T06:42:38Z

Thanks for the loss and the nice example! Please switch the paths to fit the new standard of running from the Caffe project root adopted in #1003 for merge.

shelhamer · 2014-09-01T06:45:44Z

examples/siamese/mnist_siamese.ipynb

+      "    labels = np.fromstring(f.read(n), dtype=np.uint8)\n",
+      "    \n",
+      "# scale and reshape\n",
+      "images = images.reshape(n, 1, 28, 28).astype(np.float32) / 255.   "


Please configure the net by net.set_input_scale('data', 0.00390625) instead of manually dividing to match the prototxt definition.

nickcarlevaris · 2014-09-01T13:26:07Z

@shelhamer, thanks for the review. I'm out of town for a week and don't have my dev machine but I'll update the PR right when I get back.

@amiralush, I haven't tried training with the Imagenet data, but in general you may need to increase the size of the output space. Also, make sure that if you are using a margin of 1.0 that the weight initialization produces values roughly spread around a unit sphere in the output space.

ashafaei · 2014-09-02T22:01:59Z

src/caffe/layers/contrastive_loss_layer.cpp

+  for (int i = 0; i < bottom[0]->num(); ++i) {
+    dist_sq_.mutable_cpu_data()[i] = caffe_cpu_dot(channels,
+        diff_.cpu_data() + (i*channels), diff_.cpu_data() + (i*channels));
+    if (bottom[2]->cpu_data()[i]) {  // similar pairs


@nickcarlevaris @shelhamer Isn't it safer to recast the data to int type before doing this comparison? I'm not really sure how it will be interpreted by the compiler, but I think this is one of those situations where you should be explicit. This also applies to other locations where you make comparison based on the label. Look at the SoftMax loss for instance. They always say something like

bottom_diff[i * dim + static_cast<int>(label[i * spatial_dim + j])

add numpy formatting option to display entire array (read_npy)

… example using shared weights and the contrastive loss.

nickcarlevaris · 2014-09-08T20:38:59Z

@shelhamer and @ashafaei, I've update the PR based on the changes you suggested. I've also added doxygen style comments to the layer and rebased. Let me know if it needs anything else before merging.

Also, in the future, do you prefer that commits to a PR are made with --amend to squash multiple commits? Or should I leave them in so that the PR comments don't reference outdated diffs?

okn2020 · 2014-09-13T17:14:29Z

I think there is some sort of issue with restarting siamese net training from snapshot (with solver parameters) and by simple fine-tuning. When training restarted accuracy at first iteration is ok but quickly drops. I think it is related to net with shared weights, as I do not see the same behavior on nets without sharing.. any ideas?

okn2020 · 2014-09-13T17:18:28Z

@nickcarlevaris I am using latest Dev and merged your contrastive loss function and examples into it..

amiralush · 2014-09-13T18:37:47Z

@shelhamer, @okn2020 I have also experienced this when using a pre-trained network with weight sharing. It seems like the weight sharing update (Net::Update) mechanism is flawed. I've used identical pairs as input and computed the L2-distance between shared layers of the siamese network, the diff was not zero as I expected.

shelhamer · 2014-09-15T15:44:41Z

@nickcarlevaris thanks for the update. Squashing / rebasing is a nice last step before merge to tidy up. This is ready to merge once we double-check this resuming issue.

@jeffdonahue have you encountered #959 (comment) ?

okn2020 · 2014-09-16T10:40:41Z

@nickcarlevaris What do you think about @chyh1990 siamese implementation https://github.com/chyh1990/caffe/tree/veri ? My understanding that there k-way softmax is used on the top of the net and then contrastive loss is injected right below. Is it essentially the same? In recent papers I see people first train multi-class classification net with softmax, then replace top layer with contrastive loss and fine-tune it as siamese net.

shelhamer · 2014-09-19T03:38:12Z

@okn2020 I have reproduced the divergence on resume issue when restoring the iteration 20,000 snapshot.

I haven't however investigated the problem. I'm inclined to merge this as a useful example whether troubled by resuming or not and then let a fix follow.

Add contrastive loss layer, tests, and a siamese network example

shelhamer · 2014-10-02T23:55:36Z

@okn2020 @amiralush the divergence on resuming or fine-tuning issue was fixed by #594 since reshapes no longer trigger re-allocation in all cases.

okn2020 · 2014-10-12T18:38:03Z

@shelhamer thank you, will try it out! @nickcarlevaris @shelhamer not sure if it is right place to ask, could you give some tips how to combine this contrastive loss layer with k-way softmax in one net? If I train features layer separately with k-way softmax and then fine-tune with contrastive loss resulting accuracy is very low.

Add contrastive loss layer, tests, and a siamese network example

xiaoyong · 2015-02-28T07:27:59Z

@okn2020 You may check out the DeepID2 paper:
Yi Sun, Xiaogang Wang, Xiaoou Tang. "Deep Learning Face Representation by Joint Identification-Verification".

nickcarlevaris force-pushed the contrastive_loss branch 4 times, most recently from c9ba74e to b55b5c8 Compare August 25, 2014 17:52

ashafaei reviewed Aug 25, 2014
View reviewed changes

nickcarlevaris force-pushed the contrastive_loss branch from b55b5c8 to 921229a Compare August 26, 2014 17:21

shelhamer mentioned this pull request Aug 27, 2014

Loss layer for Siamese neural network #775

Closed

shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00

shelhamer reviewed Sep 1, 2014
View reviewed changes

ashafaei reviewed Sep 2, 2014
View reviewed changes

shelhamer mentioned this pull request Sep 5, 2014

add helper scripts to use LeNet model against MNIST data #1018

Closed

lelayf added a commit to crossgradient/caffe that referenced this pull request Sep 6, 2014

read bytes using simpler approach as in BVLC#959

179f3fd

add numpy formatting option to display entire array (read_npy)

Added contrastive loss layer, associated tests, and a siamese network…

d149c9a

… example using shared weights and the contrastive loss.

nickcarlevaris force-pushed the contrastive_loss branch from 921229a to d149c9a Compare September 8, 2014 20:17

shelhamer mentioned this pull request Sep 18, 2014

[cancelled] Next #1109

Merged

shelhamer merged commit d149c9a into BVLC:dev Sep 19, 2014

shelhamer added a commit that referenced this pull request Sep 19, 2014

Merge pull request #959 from nickcarlevaris/contrastive_loss

7c3c089

Add contrastive loss layer, tests, and a siamese network example

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#959 from nickcarlevaris/contrastive_loss

1de4e14

Add contrastive loss layer, tests, and a siamese network example

shelhamer mentioned this pull request Oct 2, 2014

Improve / Fix Weight Sharing #1211

Open

8 tasks

nickcarlevaris mentioned this pull request Oct 12, 2014

Backpropagation for training Siamese nets #1265

Closed

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#959 from nickcarlevaris/contrastive_loss

0cc448b

Add contrastive loss layer, tests, and a siamese network example

nickcarlevaris mentioned this pull request Apr 21, 2015

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contrastive loss layer for training siamese nets #959

Contrastive loss layer for training siamese nets #959

nickcarlevaris commented Aug 21, 2014

shelhamer commented Aug 21, 2014

nickcarlevaris commented Aug 25, 2014

ashafaei Aug 25, 2014

ashafaei commented Aug 25, 2014

nickcarlevaris commented Aug 26, 2014

amiralush commented Aug 31, 2014

shelhamer commented Sep 1, 2014

shelhamer Sep 1, 2014

nickcarlevaris commented Sep 1, 2014

ashafaei Sep 2, 2014

nickcarlevaris commented Sep 8, 2014

okn2020 commented Sep 13, 2014

okn2020 commented Sep 13, 2014

amiralush commented Sep 13, 2014

shelhamer commented Sep 15, 2014

okn2020 commented Sep 16, 2014

shelhamer commented Sep 19, 2014

shelhamer commented Oct 2, 2014

okn2020 commented Oct 12, 2014

xiaoyong commented Feb 28, 2015

Contrastive loss layer for training siamese nets #959

Contrastive loss layer for training siamese nets #959

Conversation

nickcarlevaris commented Aug 21, 2014

shelhamer commented Aug 21, 2014

nickcarlevaris commented Aug 25, 2014

ashafaei Aug 25, 2014

Choose a reason for hiding this comment

ashafaei commented Aug 25, 2014

nickcarlevaris commented Aug 26, 2014

amiralush commented Aug 31, 2014

shelhamer commented Sep 1, 2014

shelhamer Sep 1, 2014

Choose a reason for hiding this comment

nickcarlevaris commented Sep 1, 2014

ashafaei Sep 2, 2014

Choose a reason for hiding this comment

nickcarlevaris commented Sep 8, 2014

okn2020 commented Sep 13, 2014

okn2020 commented Sep 13, 2014

amiralush commented Sep 13, 2014

shelhamer commented Sep 15, 2014

okn2020 commented Sep 16, 2014

shelhamer commented Sep 19, 2014

shelhamer commented Oct 2, 2014

okn2020 commented Oct 12, 2014

xiaoyong commented Feb 28, 2015