DPOTrainer log metrics are not gathered and meaned across ranks #2468

zhc7 · 2024-12-13T06:44:27Z

Feature request

synchronize and average metrics across ranks.

Motivation

current metrics reported are only numbers on rank 0.

        metrics[f"{prefix}rewards/chosen"] = chosen_rewards.mean().cpu()
        metrics[f"{prefix}rewards/rejected"] = rejected_rewards.mean().cpu()
        metrics[f"{prefix}rewards/accuracies"] = reward_accuracies.mean().cpu()
        metrics[f"{prefix}rewards/margins"] = (chosen_rewards - rejected_rewards).mean().cpu()
        metrics[f"{prefix}logps/chosen"] = model_output["chosen_logps"].detach().mean().cpu()
        metrics[f"{prefix}logps/rejected"] = model_output["rejected_logps"].detach().mean().cpu()
        metrics[f"{prefix}logits/chosen"] = model_output["mean_chosen_logits"].detach().cpu()
        metrics[f"{prefix}logits/rejected"] = model_output["mean_rejected_logits"].detach().cpu()

all of these aren't synced.

Your contribution

current log function looks like:

    def log(self, logs: Dict[str, float]) -> None:
        """
        Log `logs` on the various objects watching training, including stored metrics.

        Args:
            logs (`Dict[str, float]`):
                The values to log.
        """
        # logs either has 'loss' or 'eval_loss'
        train_eval = "train" if "loss" in logs else "eval"
        # Add averaged stored metrics to logs
        for key, metrics in self._stored_metrics[train_eval].items():
            logs[key] = torch.tensor(metrics).mean().item()
        del self._stored_metrics[train_eval]
        return super().log(logs)

it would have this feature if it looks like:

  def log(self, logs: Dict[str, float]) -> None:
      """
      Log `logs` on the various objects watching training, including stored metrics.

      Args:
          logs (`Dict[str, float]`):
              The values to log.
      """
      # logs either has 'loss' or 'eval_loss'
      train_eval = "train" if "loss" in logs else "eval"
      # Add averaged stored metrics to logs
      for key, metrics in self._stored_metrics[train_eval].items():
          if isinstance(metrics[0], torch.Tensor):
              gathered = self._nested_gather([m.cuda() for m in metrics])
              metrics = [g.mean() for g in gathered]
          meaned = torch.tensor(metrics).mean()
          logs[key] = meaned.item()
      del self._stored_metrics[train_eval]
      return super().log(logs)

I'm happy to submit a pr.

qgallouedec · 2024-12-13T16:51:33Z

That's a good point! Feel free to open a PR to fix this. I don't think adding a unittest for this is relevant. If possible, add plots (eg, with wandb) before/after to ensure that we aren't introducing a regression

zhc7 · 2024-12-13T17:17:59Z

Ofcourse!

here's a graph for the same training with and without the modification. You can see the pink line is a lot more smoother. Especially the accuracy graph. My per_device_batch_size is 2 so the accuracy per device can only be 1, 0.5 or 0.

according to huggingface#2468

qgallouedec · 2024-12-13T17:34:35Z

Perfect!

qgallouedec added 🐛 bug Something isn't working 🏋 DPO Related to DPO labels Dec 13, 2024

zhc7 added a commit to zhc7/trl that referenced this issue Dec 13, 2024

dpo_trainer gather metrics across ranks before logging

1e5df8c

according to huggingface#2468

zhc7 added a commit to zhc7/trl that referenced this issue Dec 13, 2024

dpo_trainer gather metrics across ranks before logging

8febff8

according to huggingface#2468

zhc7 linked a pull request Dec 13, 2024 that will close this issue

dpo_trainer gather metrics across ranks before logging #2474

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPOTrainer log metrics are not gathered and meaned across ranks #2468

DPOTrainer log metrics are not gathered and meaned across ranks #2468

zhc7 commented Dec 13, 2024

qgallouedec commented Dec 13, 2024

zhc7 commented Dec 13, 2024

qgallouedec commented Dec 13, 2024

DPOTrainer log metrics are not gathered and meaned across ranks #2468

DPOTrainer log metrics are not gathered and meaned across ranks #2468

Comments

zhc7 commented Dec 13, 2024

Feature request

Motivation

Your contribution

qgallouedec commented Dec 13, 2024

zhc7 commented Dec 13, 2024

qgallouedec commented Dec 13, 2024