-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of reproducibility when using Huggingface transformers library (TensorFlow version) #14
Comments
@dmitriydligach Did you ever get this resolved? |
@MFreidank Nope. I switched to PyTorch, which has a more reliable way to enforce determinism. |
@dmitriydligach Just to verify: your code becomes fully reproducible with pytorch? |
PyTorch has potentially different non-deterministic ops than TensorFlow, and no general mechanism, yet, to enable deterministic op functionality. Both PyTorch and TensorFlow now have the ability to enable deterministic cuDNN functionality. This code may use an op that happens to be non-deterministic in TensorFlow but deterministic in PyTorch. I'm hoping to look at this code in detail soon, hopefully today. |
@MFreidank In most cases, I get the exact same results every time I run my PyTorch code (including loss and accuracy for each epoch). In some (relatively infrequent) cases, there's still a difference, but it's not nearly as large as in the case of tensorflow. |
@duncanriach Thanks for your blazingly fast response! :) A helpful starting point could be my colab example. @dmitriydligach Thanks for those additional details, that sounds like there is still a slight non-determinism in pytorch as well, but it might not affect loss/accuracy as strongly. This is valuable information for me, thank you for sharing your experience :) |
@dmitriydligach: I'm sorry that I didn't get to sorting this out for you in time to benefit from determinism in TensorFlow. @MFreidank: I'll prioritize taking a look at these issues. They could have the same underlying cause, or source, or there could be different sources. Often in these kinds of problems there is an issue with setup that is easy to resolve. I intend to add better step-by-step instructions to the README for that. Sometimes a known (and not-yet-fixed) non-deterministic op is being used, and sometimes there is a new discovery, an op that is non-deterministic that we didn't know about about. We'll figure this out. |
@duncanriach Thanks a lot for taking the time to look into this and for your encouragement. |
Hey @dmitriydligach, it looks like we have reproducibility in on issue 19 (Huggingface Transformers BERT for TensorFlow). @MFreidank is confirming. Looking at your code, I don't see any reason for there to be non-determinism. I want to repro what you're seeing so that I can debug it. I have it running, but it looks like I have to specify |
@duncanriach Non-reproducibility of the code of @dmitriydligach may be related to him training for multiple epochs, see my update on issue #19. |
@duncanriach Thank you very much for looking into this issue. Unfortunately, I'm not able to provide the data (this is medical data that can only be distributed via a data use agreement). However, perhaps it would help you to know that the data consists of relatively short text fragments (max_len ~ 150 word pieces)... |
Dear developers,
I included in my code all the steps listed in this repository but still could not achieve reproducibility either using TF 2.1 or TF 2.0. Here's the link to my code:
https://github.com/dmitriydligach/Thyme/blob/master/Keras/et.py
Please help.
The text was updated successfully, but these errors were encountered: