Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321

nickcarlevaris · 2015-04-17T04:54:52Z

The current contrastive loss layer implements a slightly different loss than that proposed in Hadsell et all 2006. This PR updates it so that matches the original paper. This is in reference to issue #2308.

If d is the distance between two vectors describing a non-matching pair, the current code implements max(margin - d^2, 0) while the loss proposed by Hadsell et al is max(margin - d, 0)^2.

@SlevinKelevra and @melgor, you guys can give this PR a try and see if it works better than the current version.

… al 2006

seanbell · 2015-04-19T17:40:13Z

This is great -- I too was bothered that the implementation doesn't match, meaning that if you use caffe in a paper you need to clarify which version you're using. You also need to square the margin in the caffe version (to get the units to match), which can lead to subtle bugs.

A suggestion: rather than overwriting what's already there (which breaks already-saved models), why not add a prototxt parameter option to choose which version gets used?

melgor · 2015-04-20T08:30:36Z

It works similar like current version. But I think that @nickcarlevaris version should be merged and the old one deleted. Mainly because there is no reference about current loss function, what may cause a lot of question: is this version better, who create it?
So, I think that only one version should exist, from Hadsell et all 2006

seanbell · 2015-04-20T10:50:52Z

@melgor I understand that the two losses are similar. However, I've already trained networks with the current version, and I would rather not have to maintain a difference with the master branch just to avoid having old functionality deleted. I'm okay with the new default becoming [Hadsell 2006], but I think the old functionality should be obtainable with an option in the prototxt.

norouzi · 2015-04-21T02:24:20Z

Thanks @nickcarlevaris for the update!
I think there is a small problem in computing the gradient though, when squared distance (dist) becomes very close to zero. This causes the gradient to explode. I suggest adding a small value e.g. 1e-4 the denominator in the gradient, e.g., dividing by (dist + 1e-4).

…ation

nickcarlevaris · 2015-04-21T08:11:21Z

@norouzi, I pushed a commit that adds the epsilon to prevent the possibility of dividing by zero.

@seanbell, my inclination would be to overwrite the current cost function with the correct one. When I originally implemented this layer (#959), my intention was to make it match the Hadsell et al paper. It is only because I didn't double check the original paper that I ended up implementing this slightly different cost function. I see this more as fixing an error.

As far as breaking existing models goes. changing the loss layer shouldn't break the deployed version of any models, because it would only affect the training net. Also, it should be pretty easy to fine tune an existing model from the old cost to this new cost, since a network that is good for one should be pretty good for the other.

That being said, I am not strongly against keeping both. Lets see what the caffe maintainers think would be better.

seanbell · 2015-04-27T21:59:46Z

@nickcarlevaris Ah, I didn't realize you were the one who contributed the original layer as well. I understand that this feels like a "bugfix", but the old code is a valid loss function -- just different. They might be equivalent in performance, maybe not. I agree that the new layer definition makes more sense, and intuitively feels like it should work better.

However, the layer has been around through an entire publication cycle by now. So I think we should treat this as deprecation and not "bugfix". That means keeping the old version and making the new version the default. For example, I already have a public preprint on visual similarity ( http://www.seanbell.ca/tmp/siggraph2015-preprint-bell.pdf -- section 3) that uses the old layer definition. It would be nice if others can reproduce our results. I bet there will be many more at CVPR this year that also use the old layer.

Anyway, sure, let's see what the caffe maintainers think.

shelhamer · 2015-04-30T18:26:03Z

@nickcarlevaris @seanbell while this is a little tricky since the layer has already been adopted by existing work, I think it is best to

switch the loss function to that published in Hadsell et al. 2006 as that was the original intention
add an option to revert to the old behavior for "legacy" papers that made use of the other loss
encourage authors to reproduce their results with the established Hadsell et al. 2006

if you agree. @nickcarlevaris could you add a field to contrastive_loss_param to pick the original or your concave variant? I defer the naming to you, since you created the loss.

Sorry for the wait and more so my apologies for not spotting the discrepancy in the equation the first time around.

…ugh the legacy_version parameter.

nickcarlevaris · 2015-05-04T02:30:43Z

@seanbell @shelhamer, I updated the PR as suggested. You can now get at the old behavior through a "legacy_version" parameter. Let me know if everything looks OK. I can rebase and squash if needed.

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006

shelhamer · 2015-05-12T20:52:56Z

Thanks @nickcarlevaris!

cancan101 · 2015-06-24T21:47:57Z

I think docs need to be updated:

caffe/include/caffe/loss_layers.hpp

Lines 128 to 151 in 50ab52c

    
           /** 
        
            * @brief Computes the contrastive loss @f$ 
        
            *          E = \frac{1}{2N} \sum\limits_{n=1}^N \left(y\right) d + 
        
            *              \left(1-y\right) \max \left(margin-d, 0\right) 
        
            *          @f$ where @f$ 
        
            *          d = \left| \left| a_n - b_n \right| \right|_2^2 @f$. This can be 
        
            *          used to train siamese networks. 
        
            * 
        
            * @param bottom input Blob vector (length 3) 
        
            *   -# @f$ (N \times C \times 1 \times 1) @f$ 
        
            *      the features @f$ a \in [-\infty, +\infty]@f$ 
        
            *   -# @f$ (N \times C \times 1 \times 1) @f$ 
        
            *      the features @f$ b \in [-\infty, +\infty]@f$ 
        
            *   -# @f$ (N \times 1 \times 1 \times 1) @f$ 
        
            *      the binary similarity @f$ s \in [0, 1]@f$ 
        
            * @param top output Blob vector (length 1) 
        
            *   -# @f$ (1 \times 1 \times 1 \times 1) @f$ 
        
            *      the computed contrastive loss: @f$ E = 
        
            *          \frac{1}{2N} \sum\limits_{n=1}^N \left(y\right) d + 
        
            *          \left(1-y\right) \max \left(margin-d, 0\right) 
        
            *          @f$ where @f$ 
        
            *          d = \left| \left| a_n - b_n \right| \right|_2^2 @f$. 
        
            * This can be used to train siamese networks. 
        
            */

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see #2321

shelhamer · 2015-07-30T00:23:33Z

@cancan101 thanks for the report -- fixed in 7f70854.

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

Fixed contrastive loss layer to be the same as proposed in Hadsell et…

7e2fceb

… al 2006

nickcarlevaris mentioned this pull request Apr 17, 2015

Contrastive loss layer differs from loss equation #2308

Closed

added epsilon to prevent possible division by zero in gradient calcul…

d91c353

…ation

Added support for original implementation, using (margin - d^2), thro…

ca673fd

…ugh the legacy_version parameter.

shelhamer added a commit that referenced this pull request May 12, 2015

Merge pull request #2321 from nickcarlevaris/contrastive_loss_fix

2382b09

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006

shelhamer merged commit 2382b09 into BVLC:master May 12, 2015

shelhamer added a commit that referenced this pull request Jul 30, 2015

[docs] fix contrastive loss eq

7f70854

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see #2321

matthiasplappert pushed a commit to matthiasplappert/caffe that referenced this pull request Aug 10, 2015

[docs] fix contrastive loss eq

64624bc

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

rokm pushed a commit to rokm/caffe that referenced this pull request Aug 10, 2015

[docs] fix contrastive loss eq

9bf6503

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

cbfinn pushed a commit to cbfinn/caffe that referenced this pull request Aug 12, 2015

[docs] fix contrastive loss eq

4347c17

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

wangyida pushed a commit to wangyida/caffe that referenced this pull request Sep 14, 2015

[docs] fix contrastive loss eq

c488fde

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

wangyida pushed a commit to wangyida/caffe that referenced this pull request Sep 15, 2015

[docs] fix contrastive loss eq

178a68a

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

wangyida pushed a commit to wangyida/caffe that referenced this pull request Sep 15, 2015

[docs] fix contrastive loss eq

1b52b93

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

wangyida pushed a commit to wangyida/caffe that referenced this pull request Sep 16, 2015

[docs] fix contrastive loss eq

c926c9c

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

wangyida pushed a commit to wangyida/caffe that referenced this pull request Sep 22, 2015

[docs] fix contrastive loss eq

4035b1b

make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321

seanbell mentioned this pull request Nov 10, 2015

Constrain Prelu negative slope #3312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321

nickcarlevaris commented Apr 17, 2015

seanbell commented Apr 19, 2015

melgor commented Apr 20, 2015

seanbell commented Apr 20, 2015

norouzi commented Apr 21, 2015

nickcarlevaris commented Apr 21, 2015

seanbell commented Apr 27, 2015

shelhamer commented Apr 30, 2015

nickcarlevaris commented May 4, 2015

shelhamer commented May 12, 2015

cancan101 commented Jun 24, 2015

shelhamer commented Jul 30, 2015

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321

Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321

Conversation

nickcarlevaris commented Apr 17, 2015

seanbell commented Apr 19, 2015

melgor commented Apr 20, 2015

seanbell commented Apr 20, 2015

norouzi commented Apr 21, 2015

nickcarlevaris commented Apr 21, 2015

seanbell commented Apr 27, 2015

shelhamer commented Apr 30, 2015

nickcarlevaris commented May 4, 2015

shelhamer commented May 12, 2015

cancan101 commented Jun 24, 2015

shelhamer commented Jul 30, 2015