Cropped the table and addressed comments from previous PR #510

nzarif · 2022-10-25T21:28:38Z

Goals ⚽

Notify users about impact of multi_gpu training on metrics and how to address it (tune learning rate).
Cropped the empty white space out of performance table.

github-actions · 2022-10-25T21:38:25Z

Documentation preview

https://nvidia-merlin.github.io/Transformers4Rec/review/pr-510

rnyak · 2022-10-28T14:36:09Z

docs/source/multi_gpu_train.md

+These experiments used a machine with 2 <i>Tesla V100-SXM2-32GB-LS</i> GPUs.
+
+<b>Note:</b> Using Data-Parallel training can impact metrics such at NDCG and recall. Therefore, Users are advised to tune the learning rate for better accuracy.


Can we add some recommendations/more clarification about that? for example, if we increase the gpu number 2x, do we expect learning rate to be increased or reduced accordingly?

can you make Users lower case? like Therefore, users are ...

can you please add learning_rate column to the table and put what learning rate was used in different configurations based on num of GPUs? that's the critical part. thanks.

In most cases we should decrease learning rate as we increase number of GPUs but I did not observe any specific pattern in how learning rate must be changed. It differs based on model type, batch_size and etc. For example I saw a model where the same metrics could be achieved with 2 GPUs with the same learning rate. I also saw an example where for 2 GPUs, learning rate must be multiplied by 0.75.
I can add learning_rate to the table, but would that be enough? Should I mention any other parameter of the too? I have not modified all the different parameters when running the script, so I was not 100% sure if learning rate only depends on nr_gpu+model_type+batch_size or not.

@nzarif I understand from this new statement that this unpredictable effect of LR applies for DataParallel and not for DistributedDataParallel right? Can we just keep the LR we use with single GPU when we using DistributedDataParallel and the results (loss/accuracy) match?

loss does not change much between single GPU, DataParallel and DistributedDataParallel but accuracy and NDCG changes considerably. It changes for both DataParallel and DistributedDataParallel assuming in each case, each GPU is receiving the same batch-size. This means each GPU is processing the same batch-size on each step so when more than 1 GPU is used, the effective batch-size processed at once is larger. But it does not translate to the same learning_rate producing same accuracy/NDCG/recall for both DataParallel and DistributedDataParallel and I believe the reason lies in the different approach these modes have for aggregation after processing a batch.

I did play around with the learning_rate for a couple of models and I could find learning_rates that gave same metrics for each of the modes. I did not record the learning_rates back then but if you would like to, I can retry doing that and come up with some examples of learning_rates that produce similar metrics for each of the training modes (single GPU, DataParallel and DistributedDataParallel).

@nzarif I'd say adding LR is important and if you used WandB they all should be registered there in your experiments. SO you can get it from there. In addition, you need to write down which dataset you used when preparing the table. It is missing. if you generated data syntetically you can add the generation script in the readme as well), if you used yoochose dataset, please mention it.

nvidia-merlin-bot · 2022-11-07T12:11:03Z

Click to view CI Results

GitHub pull request #510 of commit e943edd2ff80c338fe63a2e17c3539c31d186289, no merge conflicts.
Running as SYSTEM
Setting status of e943edd2ff80c338fe63a2e17c3539c31d186289 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/241/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/510/*:refs/remotes/origin/pr/510/* # timeout=10
 > git rev-parse e943edd2ff80c338fe63a2e17c3539c31d186289^{commit} # timeout=10
Checking out Revision e943edd2ff80c338fe63a2e17c3539c31d186289 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e943edd2ff80c338fe63a2e17c3539c31d186289 # timeout=10
Commit message: "Merge branch 'main' into multi_gpu_doc_fix"
 > git rev-list --no-walk a399724271a5c77c0fc25f9873afc1456e003f6e # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins18015599307978890609.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item
tests/unit/test_notebooks.py .                                           [100%]
============================== 1 passed in 35.20s ==============================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[transformers4rec_tests] $ /bin/bash /tmp/jenkins15984097388296273611.sh

karlhigley · 2023-01-12T16:24:43Z

@rnyak @gabrielspmoreira Is this PR still relevant or has it been superseded by other work? Should we push it forward toward merging or close it as outdated?

cropped the perf table and addressed commnets left from previous PR

771e5bc

nzarif added the status/needs-review label Oct 25, 2022

nzarif requested a review from sararb October 25, 2022 21:28

nzarif changed the title ~~cropped the perf table and addressed comments left from previous PR~~ [documentation] Cropped the table and addressed comments from previous PR Oct 25, 2022

nzarif changed the title ~~[documentation] Cropped the table and addressed comments from previous PR~~ Cropped the table and addressed comments from previous PR Oct 25, 2022

nzarif added the documentation Improvements or additions to documentation label Oct 25, 2022

rnyak reviewed Oct 28, 2022

View reviewed changes

Merge branch 'main' into multi_gpu_doc_fix

e943edd

nzarif self-assigned this Nov 23, 2022

Merge branch 'main' into multi_gpu_doc_fix

fb91d29

rnyak approved these changes Feb 9, 2023

View reviewed changes

Merge branch 'main' into multi_gpu_doc_fix

3549331

karlhigley merged commit 985f551 into main Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cropped the table and addressed comments from previous PR #510

Cropped the table and addressed comments from previous PR #510

nzarif commented Oct 25, 2022

github-actions bot commented Oct 25, 2022

rnyak Oct 28, 2022 •

edited

Loading

nzarif Nov 3, 2022

gabrielspmoreira Nov 3, 2022

nzarif Nov 3, 2022

nzarif Nov 3, 2022

rnyak Nov 7, 2022

nvidia-merlin-bot commented Nov 7, 2022

karlhigley commented Jan 12, 2023

		These experiments used a machine with 2 <i>Tesla V100-SXM2-32GB-LS</i> GPUs.

		<b>Note:</b> Using Data-Parallel training can impact metrics such at NDCG and recall. Therefore, Users are advised to tune the learning rate for better accuracy.

Cropped the table and addressed comments from previous PR #510

Cropped the table and addressed comments from previous PR #510

Conversation

nzarif commented Oct 25, 2022

Goals ⚽

github-actions bot commented Oct 25, 2022

Documentation preview

rnyak Oct 28, 2022 • edited Loading

Choose a reason for hiding this comment

nzarif Nov 3, 2022

Choose a reason for hiding this comment

gabrielspmoreira Nov 3, 2022

Choose a reason for hiding this comment

nzarif Nov 3, 2022

Choose a reason for hiding this comment

nzarif Nov 3, 2022

Choose a reason for hiding this comment

rnyak Nov 7, 2022

Choose a reason for hiding this comment

nvidia-merlin-bot commented Nov 7, 2022

karlhigley commented Jan 12, 2023

rnyak Oct 28, 2022 •

edited

Loading