Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss layer for Siamese neural network #775

Closed
arntanguy opened this issue Jul 24, 2014 · 3 comments
Closed

Loss layer for Siamese neural network #775

arntanguy opened this issue Jul 24, 2014 · 3 comments
Labels

Comments

@arntanguy
Copy link
Contributor

Hi,
First, I should say that I have very little experience with neural networks, and minimisation algorithms. I implemented a fully connected perceptron last year, but that's as far as my experience goes.

I've created a Siamese Convolutional Neural Network using the recently introduced weight sharing feature to try and recognise similarities between pairs of images.

I created my own LMDB dataset based on RGB(D) images, and an appropiate data layer for feeding the labelled pairs of images to the network. This works perfectly, and with a couple of little improvement, it could probably be sent as a pull request).

Now, I am trying to write a loss function for this, based on [1].
The idea is to minimise the energy, definined as the L1 norm of the difference between the feature descriptors computed from two input images X1 and X2. To do so, the loss function from the paper has a contrastive term that makes sure that the energy is low when the inputs are similar, and high when they aren't.

The loss function is defined for a pair of input images X1, X2 with label Y (1 if similar, 0 otherwise) as

L(W, Y, X1, X2) = Y * Lg(Ew(X1,X2)) * (1-Y)* Li(Ew(X1,X2))

Lg (for a genuine pair of images) and Li (for an impostor pair) are designed so that the minimisation of L decreases the energy of genuine paris and increases the energy of impostors.

More specifically in the paper, they used

L(W, Y, X1, X2) = 2/Q * Y * (Ew)^2 + 2_Q_(1-Y)*exp(-2.77/Q * Ew)

Where Q is a constant set to the upper bound of Ew.

I implemented the forward propagation of this loss function without problem, but I am struggling to figure out what to do for the backpropagation. I am very confused as to what needs to be computed there.

Can someone give me some pointers on how to properly implement that?

  • What do I need to compute for the backward propagation?
  • Also, how can I know the range in which the feature descriptors for the network will be (needed to compute the constant Q)?

I am open to other suggestions for the loss function, so your answer doesn't necessarely have to be based on the loss function from [1].

My current code is on my github: geenux/tum_siamese@afaeff85f793f0bbf9f73909d55c936ecb95a23e

Thanks a lot for your help!

[1] Chopra, Sumit, Raia Hadsell, and Yann LeCun. "Learning a similarity metric discriminatively, with application to face verification." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005. (http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf)

@shelhamer
Copy link
Member

See #959 for a contrastive loss function for Siamese networks.

@aimaaonline
Copy link

shelhamer can you please help in if I change my loss function to Mean Squared Error for regression.. how can I verfiy numerical and anlytical gradients. I canot get the gredients calculated by Siamese Network at Last Layre.

@Cluoyao
Copy link

Cluoyao commented Apr 1, 2019

Hi, arntanguy, I want to ask you one question, when you train the model with this loss function, have u ever met the loss value is very large in the first iteration, and then, at the second iteration, the value is small compare with the first? How did u solve it? thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants