This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
cuDNN non-persistant bidirectional RNN dgrad sync fix #16391
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ptrendx
reviewed
Oct 8, 2019
ptrendx
reviewed
Oct 8, 2019
ptrendx
approved these changes
Oct 8, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
DickJC123
force-pushed
the
gpu_rnn_dgrad_sync
branch
from
October 10, 2019 01:08
65246f6
to
57c8156
Compare
I've rebased the commits of the PR to latest master in an attempt to avoid pylint failures, as suggested by @reminisce . If this succeeds, does this suggest a bug in the way the CI creates the repo-under-test? |
aaronmarkham
pushed a commit
to aaronmarkham/incubator-mxnet
that referenced
this pull request
Oct 16, 2019
* Alter test_lstm_bidirectional to demo fast-fail with optional wgrad. * Fix cuDNN RNN dgrad sync. * Simplify gpu activity sync sequence. * Remove repeated running of now-passing test. * Trigger CI
haojin2
reviewed
Oct 20, 2019
if (CUDNN_VERSION <= 7604 && dgrad_sync_needed_) { | ||
// Without blocking the CPU, create a synchronization point of all current GPU activity. No | ||
// need to call cudaStreamWaitEvent- cudaEventRecord on the legacy default stream suffices. | ||
CUDA_CALL(cudaEventRecord(dgrad_sync_event_, cudaStreamLegacy)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DickJC123, I'm encountering cudaErrorInvalidResourceHandle
error here when I'm trying to run this notebook and this notebook in dive into deep learning textbook. Could you help with a fix to that?
5 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Background: A non-deterministic failure of test_operator_gpu.py:test_lstm_bidirectional that was observed just once in our own CI was found to be due to cuDNN's RNN dgrad implementation. cuDNN launches many kernels as part of the RNN dgrad into various auxiliary streams that are different than the primary user-settable stream of the cuDNN handle. A final kernel launched by cuDNN into one of these aux streams was not being synchronized (via events) back to the handle's stream. When MXNet's RNN::Backward() returns, a gradient summation kernel can be launched by the GPU worker into its main stream and this kernel's execution could potentially overlap or even precede that of the final cuDNN RNN dgrad kernel. MXNet's calling of cuDNN's wgrad immediately after dgrad makes this data-race failure exceedingly rare, and in fact we discovered the problem by code inspection, not by being able to reproduce the original CI failure.
The first commit of this PR will demonstrate the failure. The approach is to have MXNet now skip the RNN wgrad operation if grad_req = {'data':'add', 'parameters':'none'}, plus expand the test_lstm_bidirectional test to invoke this case. Skipping wgrad aggravates the data race by allowing MXNet to launch the gradient summation kernel sooner after the RNN dgrad kernels are launched with no intervening wgrad gpu activity.
Once the failure is solidly demonstrated, a follow-up commit will supply the fix. The fix will use cuda events directed at the legacy default stream to ensure all dgrad GPU activity is complete before the wgrad or other kernels begin. Unlike a cudaDeviceSynchronize(), the fix will not block the CPU. nvprof-based timing analysis of the fix shows no measurable difference in timing for the single-RNN case analyzed.
The fix is needed for all versions of cuDNN supported by MXNet (so up to the current v7.6.4) and is needed only for non-persistent bidirectional RNNs.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments