You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to reproduce repllama/repmistral and to understand the logic of the code.
1. Why use cross-entropy loss in contrastive learning?
in this file /usa/dayu/Table_similarity/tevatron/src/tevatron/retriever/modeling/encoder.py.
I have questions about the forward function.
Here cross-entropy loss function is used. The cross-entropy loss focuses on the probabilities assigned to the true classes of the samples. It doesn't directly account for the probabilities assigned to the incorrect (negative) classes in the loss calculation.
However, based on some materials I have read. Training a retriever needs to use contrastive loss, which pushes negatives away and pulls positive closer to the anchor.
For example, in the triplet loss, it explicitly penalty on the negative distance between anchor and negatives. While Cross-entropy ignore this part.
2. How "in-batch" negative is considered? What if we have >1 positive?
The standard cosine similarity ranges from -1 to 1. A higher temperature coefficient will make the model's score range closer to the standard range.
When I use bge-en-v1.5, I found the cosine similarity score typically ranging between 0.4 and 1. (So two random documents will have 0.4 similarity which is kind of counter-intuititve)
So why people use tempreture to "reshape" the similarity distribution? Why not keep it ranges from -1 to 1, which seems to be more intuitive and clear for revealing the negative/positive relationship between documents?
Dear authors,
I was trying to reproduce repllama/repmistral and to understand the logic of the code.
1. Why use cross-entropy loss in contrastive learning?
in this file
/usa/dayu/Table_similarity/tevatron/src/tevatron/retriever/modeling/encoder.py
.I have questions about the
forward
function.Here cross-entropy loss function is used. The cross-entropy loss focuses on the probabilities assigned to the true classes of the samples. It doesn't directly account for the probabilities assigned to the incorrect (negative) classes in the loss calculation.
However, based on some materials I have read. Training a retriever needs to use contrastive loss, which pushes negatives away and pulls positive closer to the anchor.
For example, in the triplet loss, it explicitly penalty on the negative distance between anchor and negatives. While Cross-entropy ignore this part.
2. How "in-batch" negative is considered? What if we have >1 positive?
Based on the code I saw In the
forward
function.and the code under
/usa/dayu/Table_similarity/tevatron/src/tevatron/retriever/dataset.py
.If I understand correctly, assuming each query can have and only have 1 positive passage (must = 1), and
negative_size
numbers of negatives.assuming
negative_size
= 2.the positive index will be = [0, 3, 6, 9,...] . all others are negatives.
If I understand correctly, where does the
in-batch
negatives? And what if we have >1 positive passages?Thanks!
The text was updated successfully, but these errors were encountered: