You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transformers4Rec supports multi-GPU training for the next-item prediction task because it uses the HF Trainer (RMP #522), which under-the-hood supports DataParallel / DistributedDataParallel.
The binary classification / regression tasks are currently not supported by HF Trainer, but rather trained with a custom model.fit() method we provided, that doesn't support DataParallel / DistributedDataParallel.
Goal:
Change the implementation of binary classification / regression tasks so that they can be trained (with multi-GPU) by using HF Trainer.
We created a multi-gpu example in TF4Rec for next-item prediction task. For binary-classification task we shared a multi-gpu training example PoC with the customer, and I will add code snippet into the docs in TF4Rec repo.
Problem:
Transformers4Rec supports multi-GPU training for the next-item prediction task because it uses the HF Trainer (RMP #522), which under-the-hood supports DataParallel / DistributedDataParallel.
The binary classification / regression tasks are currently not supported by HF Trainer, but rather trained with a custom
model.fit()
method we provided, that doesn't support DataParallel / DistributedDataParallel.Goal:
Starting Point:
The text was updated successfully, but these errors were encountered: