ThresholdChecker is updated to compute improvement according to the last improved step and not to the best-received metric.
ThresholdChecker(0.5), scores 0.2->0.4->0.8
Previously ---> no improvement, no improvement
Now ---> no improvement, IMPROVED