tags | |
---|---|
|
N-pair loss is loss for supervised metric learning introduced by Sohn (2016). It is a natural progression of the Triple loss, but trying to extract more information from a given batch.
Notation:
-
$x$ -- anchor input -
$x^{+}$ -- positive input -
$x_i$ -- negative input to$x$ -
$f$ -- normalized representation of$x$ -
$f^{+}$ -- normalized representation of$x^{+}$ -
$f_i$ -- normalized representation of$x_i$
Then the loss for
$$ \mathcal{L}\left({x, x^{+}, {x_i}{i=1}^{N-1}}; f\right) = \log\left( 1 + \sum{i=1}^{N-1} \exp\left(f^\top f_i - f^\top f^{+}\right) \right) $$
Note that this is identical to classical softmax loss:
Instead of computing a single loss for each batch of N+1 inputs (N-1 negatives, 1 positive, 1 anchor), the authors propose to create a batch of 2N inputs composed of N pairs from different classes (whose embeddings should the loss pull apart). From the 2N inputs we can compute N losses:
- take
$j$ -th anchor and positive as$x$ and$x^{+}$ , other N-1 positives as$x_i$
Triplet loss relies on mining of hard instances to speed up convergence. Authors of N-pair loss propose to mine classes instead of instances:
- Choose large number of classes C with 2 randomly sampled instances from each
- Get the sampled instances' embeddings
- Greedily create a batch of N classes by:
- Randomly taking a class
$i$ (a random instance of the randomly chosen class$i$ ) - Choose
$j$ -th class next if it violates the triplet constraint the most w.r.t. the selected classes. - Repeat
- Randomly taking a class
The authors report better results than
- triplet loss w/ hard class mining (no hard instance mining)
- classical softmax loss
However, often the most performing version of N-pair loss is using hard-class mining, which is undoubtadly the most costly loss out of all that were evaluated.