-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006 #2321
Conversation
This is great -- I too was bothered that the implementation doesn't match, meaning that if you use caffe in a paper you need to clarify which version you're using. You also need to square the margin in the caffe version (to get the units to match), which can lead to subtle bugs. A suggestion: rather than overwriting what's already there (which breaks already-saved models), why not add a prototxt parameter option to choose which version gets used? |
It works similar like current version. But I think that @nickcarlevaris version should be merged and the old one deleted. Mainly because there is no reference about current loss function, what may cause a lot of question: is this version better, who create it? |
@melgor I understand that the two losses are similar. However, I've already trained networks with the current version, and I would rather not have to maintain a difference with the master branch just to avoid having old functionality deleted. I'm okay with the new default becoming [Hadsell 2006], but I think the old functionality should be obtainable with an option in the prototxt. |
Thanks @nickcarlevaris for the update! |
@norouzi, I pushed a commit that adds the epsilon to prevent the possibility of dividing by zero. @seanbell, my inclination would be to overwrite the current cost function with the correct one. When I originally implemented this layer (#959), my intention was to make it match the Hadsell et al paper. It is only because I didn't double check the original paper that I ended up implementing this slightly different cost function. I see this more as fixing an error. As far as breaking existing models goes. changing the loss layer shouldn't break the deployed version of any models, because it would only affect the training net. Also, it should be pretty easy to fine tune an existing model from the old cost to this new cost, since a network that is good for one should be pretty good for the other. That being said, I am not strongly against keeping both. Lets see what the caffe maintainers think would be better. |
@nickcarlevaris Ah, I didn't realize you were the one who contributed the original layer as well. I understand that this feels like a "bugfix", but the old code is a valid loss function -- just different. They might be equivalent in performance, maybe not. I agree that the new layer definition makes more sense, and intuitively feels like it should work better. However, the layer has been around through an entire publication cycle by now. So I think we should treat this as deprecation and not "bugfix". That means keeping the old version and making the new version the default. For example, I already have a public preprint on visual similarity ( http://www.seanbell.ca/tmp/siggraph2015-preprint-bell.pdf -- section 3) that uses the old layer definition. It would be nice if others can reproduce our results. I bet there will be many more at CVPR this year that also use the old layer. Anyway, sure, let's see what the caffe maintainers think. |
@nickcarlevaris @seanbell while this is a little tricky since the layer has already been adopted by existing work, I think it is best to
if you agree. @nickcarlevaris could you add a field to Sorry for the wait and more so my apologies for not spotting the discrepancy in the equation the first time around. |
…ugh the legacy_version parameter.
@seanbell @shelhamer, I updated the PR as suggested. You can now get at the old behavior through a "legacy_version" parameter. Let me know if everything looks OK. I can rebase and squash if needed. |
Fixed contrastive loss layer to be the same as proposed in Hadsell et al 2006
Thanks @nickcarlevaris! |
I think docs need to be updated: caffe/include/caffe/loss_layers.hpp Lines 128 to 151 in 50ab52c
|
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see #2321
@cancan101 thanks for the report -- fixed in 7f70854. |
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
make documented equation match the correct implementation of the `max(margin - d, 0)^2` term in the loss. see BVLC#2321
The current contrastive loss layer implements a slightly different loss than that proposed in Hadsell et all 2006. This PR updates it so that matches the original paper. This is in reference to issue #2308.
If
d
is the distance between two vectors describing a non-matching pair, the current code implementsmax(margin - d^2, 0)
while the loss proposed by Hadsell et al ismax(margin - d, 0)^2
.@SlevinKelevra and @melgor, you guys can give this PR a try and see if it works better than the current version.