-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feature/patch/softmax cross entropy with logits #21
base: master
Are you sure you want to change the base?
[WIP] Feature/patch/softmax cross entropy with logits #21
Conversation
Tied to open tensorflow issue #38185. Non-determinism in backprop of fused implementation of `{sparse_,}softmax_cross_entropy_with_logits` has been reported. This work will provide a patch by routing calls to a deterministic workaround first described in the tensorflow issue above.
@@ -70,13 +70,20 @@ def _patch(): | |||
if re.match("(1\.(14|15)|2\.0)", tf_version): | |||
os.environ['TF_CUDNN_DETERMINISTIC'] = '1' | |||
_patch_bias_add() | |||
_patch_fused_softmax_cross_entropy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll push changes that rough-out enable_determinism
and then you can call this from the appropriate part of that.
same as `logits` and its shape is the same as `labels` except that it does | ||
not have the last dimension of `labels`. | ||
""" | ||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the baby steps. Good job!
Comments above. Heads-up, before you're done, please remember to implement I'm going to work on adding |
tfdeterminism/patch.py
Outdated
tf.nn.softmax_cross_entropy_with_logits = _new_softmax_cross_entropy_with_logits_1_14 # access via public API | ||
nn.softmax_cross_entropy_with_logits = _new_softmax_cross_entropy_with_logits_1_14 # called from tf.keras.layers.convolutional.Conv | ||
nn_ops.softmax_cross_entropy_with_logits = _new_softmax_cross_entropy_with_logits_1_14 # called from tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at tensorflow/python/ops/nn_ops.py it seems like it might be necessary to remap the bindings for softmax_cross_entropy_with_logits_v2
as well. After a brief review, I'm not sure what the difference between this and the other versions is, though, apart from the deprecated dim
parameter.
tfdeterminism/patch.py
Outdated
|
||
# The original, pre-patched method can be viewed at | ||
# https://github.com/tensorflow/tensorflow/blob/v1.14.0/tensorflow/python/ops/nn_ops.py#L3182 | ||
def _new_softmax_cross_entropy_with_logits_1_14(labels, logits, axis=-1, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the reason that _new_bias_add_1_14
has the version number in the name is that before stock TensorFlow version 1.14, tf.nn.bias_add
has a different API. I have a version of this patch for TensorFlow version 1.13 and earlier that I didn't release because it was not useful without TF_CUDNN_DETERMINISTIC
, which was released in version 1.14 of stock TensorFlow. The deterministic bias_add
was effectively released in the NVIDIA GPU Cloud (NGC) TensorFlow container version 19.06, which also implemented TF_DETERMINISTIC_OPS
, and which enabled the functionality.
I don't think we need the _1_14
on the end of these method names. However, before enabling this patch to be applied to any given version of TensorFlow (as decided in enable_determinism
) it should be confirmed that the patched version of the method has the same API as the original on in that version of TensorFlow.
tfdeterminism/patch.py
Outdated
|
||
# Non-sparse | ||
tf.nn.sparse_softmax_cross_entropy_with_logits = _new_sparse_softmax_cross_entropy_with_logits_1_14 # access via public API | ||
nn.sparse_softmax_cross_entropy_with_logits = _new_sparse_softmax_cross_entropy_with_logits_1_14 # called from tf.keras.layers.convolutional.Conv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the end of this line is not relevant here (same for the non-sparse version). You should search for calls to sparse_softmax_cross_entropy_with_logits
(and the non-sparse version) in tensorflow/python
and do the re-binding as appropriate for the op (with similar, perhaps in some cases identical, comments about why you're doing it).
I just added I'm going reorganize the testing quite a lot before release. For the testing part, I would like you to focus on getting the following complete and working:
|
…ps://github.com/MFreidank/tensorflow-determinism into feature/patch/softmax_cross_entropy_with_logits
@@ -52,7 +52,7 @@ def _enable_determinism(seed=None): | |||
_patch_bias_add() | |||
if in_ngc_cont and ngc_vers.at_least('19.06') or tf_vers.at_least('2.1'): | |||
os.environ['TF_DETERMINISTIC_OPS'] = '1' | |||
# TODO: Add patch crossentropy here as well? Issue seems to still be present on tf 2.1, 2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This is the condition for setting TF_DETERMINISTIC_OPS=1
: NGC containers with version >= 19.06 or stock TensorFlow with version >= 2.1.
The condition below will ensure that the fused softmax/cross-entropy patch is applied to NGC containers with version >= 19.06 or stock TensorFlow with version >= 1.14 (which includes versions 2.1 and 2.2).
print("TensorFlow version %s has been patched " | ||
"using tfdeterminism version %s" % | ||
(tf_version, __version__), file=sys.stderr) | ||
elif re.match("2\.1|2\.2", tf_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to deprecate patch
, not make it work on newer versions of TensorFlow, and no longer advertise it in the documentation (will be updated before release). If folks want determinism in TensorFlow version 2.1 or 2.2 then they should be using enable_determinism
. Not adding this does not break anything because there was no patch available for TensorFlow versions 2.1 and 2.2 before.
Hi, my name is Thomas, I am working at the Oracle Digital Assistant team in which we made extensively use of Tensorflow and Tensorflow-determinism, we really appreciate this wonderful library. I just wish to ask is this issue going to be merged anytime soon? |
Hi Thomas (@phqtuyen), Thanks for letting us know. These ops seem to be generally high priority for determinism in TensorFlow. @MFreidank: please can you give an update on this and/or move it forward. Most of the rest of the infrastructure is now in place for this. |
Hi @phqtuyen, @duncanriach |
Awesome. Just FYI, I'll be offline next week (2020-08-17 through 2020-08-21) and, since I'll need to be involved in reviews and iterations, that might slow things down a bit. |
Hey @MFreidank, are you still planning on completing this PR? |
Hi @MFreidank, I'm about to go offline again until 2020-09-16, but I wanted to check-in with you again. I think that this op's nondeterminism is near the top of the priority/urgency list for TensorFlow, and I want to get it resolved ASAP. Can you commit to pushing this through from your end? |
Hi Duncan,
Sorry for the delay on this.
Yes, I'll pick it up starting tomorrow my time.
Duncan Riach <notifications@github.com> schrieb am Fr., 4. Sep. 2020, 20:07:
… Hi @Freidank, I'm about to go offline again until 2020-09-16, but I wanted
to check-in with you again. I think that this op's nondeterminism is near
the top of the priority/urgency list for TensorFlow, and I want to get it
resolved ASAP. Can you commit to pushing this through from your end?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQSWKBQYI2MLT4QV42MP6DSEEUFJANCNFSM4OEKR34Q>
.
|
Thank you, Moritz. |
Hi again @duncanriach, |
Thanks, @MFreidank. Sounds good. Just to reiterate, I'll be offline until 2020-09-16 and I'll review this after that. |
Hey @MFreidank, please will you submit those changes so that I can review them? |
@MFreidank, we will soon release version 0.4.0. We intend to include the fused softmax/cross-entropy patch in version 0.5.0 and I want to get started on it ASAP. If I don't hear from you in the 24 hours, we will proceed to implement it without you. |
Addressing previously reported tensorflow issue #38185. Also tied to
tfdeterminism
issues #19, #14, #9.Fused implementation of
{sparse_,}softmax_cross_entropy_with_logits
exhibits non-determinism in its backprop.This PR will provide a patch that maintains the same function interface but computes the computational steps involved in a non-fused way that is fully deterministic.