-
-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frugally-deep cannot pass check_test_outputs() when both permute and attention layers are used #393
Comments
Thanks for the report! It's interesting since the |
I can reproduce the problem locally, and now would like to create a minimal example to narrow down the cause. @roberttangswitch Could you help me by providing the Python code used to create the model architecture? |
One observation, I've made:
Maybe this results in some floating-point precision effects in downstream calculations. The inputs used by the automated tests are random values, which might not reflect what the model gets to "see" in reality. Does frugally-deep, when using |
Hi Dobiasd, Thanks for your quick response. Regarding your question on model1 that cannot Regarding your previous comments, I have the following questions: To narrow down the root cause on this issue, I have created minimal model to
|
Thanks for the additional examples and the great explanations. I can reproduce your results locally, i.e.:
However, they all work when I put #define FDEEP_FLOAT_TYPE double above
. 🎉🎉 This shows, that it's indeed likely "just" a floating-point precision problem. Is using |
Regarding your other points:
Do you mean, that running two predictions with the exact same input and model leads to different outputs?
So far, it's designed in a way, that only one random test case is used. So far this sufficed. In case it turns out to be helpful to be able to have more test cases, I could extend
In case the " |
Thanks for your quick testing and your finding that double precision instead of My user case relies on cycle reduction and this is the reason why we are evaluating Could you help to figure out which layer or function creates the difference Regarding the question: "Do you mean, that running two predictions with the exact |
Ah, got it, thanks. So you prefer float32 ( (On my machine, I just tested with
Wow, that would be really interesting for me to reproduce locally.
Can you provide an input tensor that makes it indeterministic?
I'd print out the output values after each layer during a forward pass (once with I'll give it a try now. |
In return fplus::transform(get_output, output_connections_); with const auto result = fplus::transform(get_output, output_connections_);
for (auto const& x : output_cache)
{
std::cout << x.first.first << ", " << x.first.second << ": " << show_tensors(x.second) << std::endl;
}
return result; and it's the
I'll dig deeper into the |
In std::cout << "scores: " << show_tensor(scores) << std::endl;
std::cout << "distribution: " << show_tensor(distribution) << std::endl; So I'll compare the softmax implementation in frugally-deep with the one used in TensorFlow. |
In the So far, I've not yet found the right approach to fix it. The implementation already had quite some iterations related to similar problems: If you have an idea, please let me know. 🙂 |
By looking at the TensorFlow softmax implementation, I noticed a bug in frugally-deep's softmax implementation. This commit fixes it. With the latest frugally-deep version (just released), the automated tests for all your five provided models work fine with Thanks a lot for your very good error report, pointing out this problem, and thus helping me to make frugally-deep better. ❤️ |
Hi @Dobiasd, With the fix, it is definitely a great improvement for frugally-deep in terms "On my machine, I just tested with fdeep_model1.json and the difference was On your previous comment, my Visual Studio C++ does not run as fast as BTW, although all five models (model0-model4) can pass check_test_outputs() |
I don't use Intel MKL or anything alike. Here is a Dockerfile reproducing the performance results (on my Intel Core i5-6600 CPU @ 3.30GHz), so there is no uncertainty about libraries or compiler settings: https://gist.github.com/Dobiasd/20591b088e5c3451f0fcbaa0b4327997 In my experience, GCC does not produce significantly faster machine code than the Visual C++ compiler (if the compiler optimization flags are set well). |
Sure, I can try, but it might be that I can't solve it until Sunday, and then I'm afk-ish for the next three weeks. Is this new model also a case that works with |
Hi @Dobiasd, The option using double does not help this model to pass check_test_outputs(); Thanks a lot for your technical support. Have a nice vacation! |
Without permute layer, frugally-deep output can pass the verification with epsilon=10^(-6). The
trained model cnn_model0.h5 is the example. The frugally-deep converted model is fdeep_model0.json.
I am using frugally-deep version 04/29/2023 with commit #0efd941, when attention layer was first
introduced in frugally-deep. I am using Visual C++ 2017, Tensorflow 2.10.0, and Keras 2.10.0.
With permute layer, the trained model is cnn_model1.h5. The frugally-deep converted model is
fdeep_model1.json. Frugally-deep output cannot pass check_test_outputs() where the error could
be as big as 0.5 (>> 10^(-6)). The only difference in this model1.py is the permute layers, which
can reduce the computation cycles for matrix multiplication and softmax.
Could you help to resolve the above issue?
zipped_files.zip
The text was updated successfully, but these errors were encountered: