Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frugally-deep cannot pass check_test_outputs() when both permute and attention layers are used #393

Closed
roberttangswitch opened this issue Sep 8, 2023 · 16 comments

Comments

@roberttangswitch
Copy link

Without permute layer, frugally-deep output can pass the verification with epsilon=10^(-6). The
trained model cnn_model0.h5 is the example. The frugally-deep converted model is fdeep_model0.json.
I am using frugally-deep version 04/29/2023 with commit #0efd941, when attention layer was first
introduced in frugally-deep. I am using Visual C++ 2017, Tensorflow 2.10.0, and Keras 2.10.0.

With permute layer, the trained model is cnn_model1.h5. The frugally-deep converted model is
fdeep_model1.json. Frugally-deep output cannot pass check_test_outputs() where the error could
be as big as 0.5 (>> 10^(-6)). The only difference in this model1.py is the permute layers, which
can reduce the computation cycles for matrix multiplication and softmax.

Could you help to resolve the above issue?
zipped_files.zip

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 9, 2023

Thanks for the report! It's interesting since the Permute layer does not change the actual values in the tensor. I'll investigate and get back to you here.

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 13, 2023

I can reproduce the problem locally, and now would like to create a minimal example to narrow down the cause.

@roberttangswitch Could you help me by providing the Python code used to create the model architecture?

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 13, 2023

One observation, I've made:

cnn_model0 has its smallest weight in the conv1d_8 layer. Its value is -9.8351e-07.

cnn_model1 has its smallest weight in the conv1d_7 layer. Its value is -3.83643e-08.

Maybe this results in some floating-point precision effects in downstream calculations.

The inputs used by the automated tests are random values, which might not reflect what the model gets to "see" in reality. Does frugally-deep, when using cnn_model1, also produce significantly different outputs compared to TensorFlow when you use realistic model inputs?

@roberttangswitch
Copy link
Author

Hi Dobiasd,

Thanks for your quick response. Regarding your question on model1 that cannot
pass check_test_outputs(), during testing against real samples, frugally-deep
shows indeterministic behavior, and may generate much worse results than
Tensorflow. So whether frugally-deep can pass check_test_outputs() plays a
major role.

Regarding your previous comments, I have the following questions:
a. How to look at the model weights? How to save the model weights in a file layer
by layer?
b. How to generate ten or one hundred test cases using random samples so that
frugally-deep can perform during check_test_outputs()? What are the parameters
or variables that we need to change in convert_model.py and the .hpp files?
c. Please share some debug skills that you think are helpful in order to find
the root cause. In case that I find a fix, I will check in the fix for you
on a branch.

To narrow down the root cause on this issue, I have created minimal model to
reproduce the issue:

  1. There are only three convolutional layers in model2. The trained model is
    cnn_model2.h5. The frugally-deep converted model is fdeep_model2.json.
    Model2 cannot pass check_test_outputs() with error in the range about
    0.02 instead of the expected 10^(-6).
  2. Model3 is the same as model2, but it does not have maxpooling/upsampling
    layers. The trained model is cnn_model3.h5. The frugally-deep converted
    model is fdeep_model3.json. Model3 can pass check_test_outputs().
  3. Model4 is the same as model2, but it does not have the permute
    layers. The trained model is cnn_model4.h5. The frugally-deep converted
    model is fdeep_model4.json. Model4 can pass check_test_outputs().
    Please check your email editgym@gmail.com for the python files.
    zipped_files.zip

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

Thanks for the additional examples and the great explanations.

I can reproduce your results locally, i.e.:

  • fdeep::load_model("fdeep_model0.json");: succeeds
  • fdeep::load_model("fdeep_model1.json");: fails
  • fdeep::load_model("fdeep_model2.json");: fails
  • fdeep::load_model("fdeep_model3.json");: succeeds
  • fdeep::load_model("fdeep_model4.json");: succeeds

However, they all work when I put

#define FDEEP_FLOAT_TYPE double

above

#include "fdeep/fdeep.hpp"

. 🎉🎉

This shows, that it's indeed likely "just" a floating-point precision problem.

Is using double instead of float a feasible solution for your use case?

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

Regarding your other points:

frugally-deep shows indeterministic behavior

Do you mean, that running two predictions with the exact same input and model leads to different outputs?


a. How to look at the model weights? How to save the model weights in a file layer by layer?

convert_model.py actually already does this. It just stores them base64-encoded in the resulting JSON file. You'll find, for example, trainable_params.conv1d.weights there.


b. How to generate ten or one hundred test cases using random samples so that
frugally-deep can perform during check_test_outputs()? What are the parameters
or variables that we need to change in convert_model.py and the .hpp files?

So far, it's designed in a way, that only one random test case is used. So far this sufficed. In case it turns out to be helpful to be able to have more test cases, I could extend convert_model.py with such an option.


c. Please share some debug skills that you think are helpful in order to find
the root cause. In case that I find a fix, I will check in the fix for you
on a branch.

In case the "double instead of float" solution is not feasible for you, it might be worth trying to find which calculations inject the most significant difference between these two data types. 🕵️

@roberttangswitch
Copy link
Author

Thanks for your quick testing and your finding that double precision instead of
floating-point can solve the issue. I also appreciate your detailed explanations
on my previous questions.

My user case relies on cycle reduction and this is the reason why we are evaluating
frugally-deep solution. So float32 is the maximum precision we can afford for
real-time applications. Since the Tensorflow model training and prediction use
float32 and our goal is that the frugally-deep outputs match that from Tensorfow,
it is reasonable to assume that frugally-deep can achieve the same accuracy as
Tensorflow by using only float32.

Could you help to figure out which layer or function creates the difference
between Tensorflow & frugally-deep, and how to fix the difference using float32
in frugally-deep?

Regarding the question: "Do you mean, that running two predictions with the exact
same input and model lead to different outputs?" Yes, that is the case for model1
and model2, during testing against real samples.

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

Ah, got it, thanks. So you prefer float32 (float) over float64 (double) for performance reasons. How much faster is this in your tests?

(On my machine, I just tested with fdeep_model1.json and the difference was quite small, i.e., 0.0051 s (on average) per forward pass with float and 0.0057 s with double.)


Regarding the question: "Do you mean, that running two predictions with the exact
same input and model lead to different outputs?" Yes, that is the case for model1
and model2, during testing against real samples.

Wow, that would be really interesting for me to reproduce locally. fdeep::load_model("fdeep_model1.json"); (i.e., with random inputs)) always fails with the exact same error (value=-0.117186):

 test failed: output=0 pos=0,0,0 value=-0.117186 target=-0.17968

Can you provide an input tensor that makes it indeterministic?


Could you help to figure out which layer or function creates the difference
between Tensorflow & frugally-deep, and how to fix the difference using float32
in frugally-deep?

I'd print out the output values after each layer during a forward pass (once with float and once with double) and check the diff. Could be done in model_layer::apply_impl genetically, I guess. And then, once the offending layer is found, narrow it down further like this.

I'll give it a try now.

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

In model_layer::apply_impl, I replaced

        return fplus::transform(get_output, output_connections_);

with

        const auto result = fplus::transform(get_output, output_connections_);

        for (auto const& x : output_cache)
        {
            std::cout << x.first.first << ", " << x.first.second  << ": " << show_tensors(x.second) << std::endl;
        }

        return result;

and it's the Attention layer (using fdeep_model2) introducing the difference:

I'll dig deeper into the attention_layer implementation to find the cause.

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

In attention_layer::apply_impl (before the return) I printed the tensors:

std::cout << "scores: " << show_tensor(scores) << std::endl;
std::cout << "distribution: " << show_tensor(distribution) << std::endl;

So scores is still very similar, but distribution (distribution = softmax(scores)) is very different.

I'll compare the softmax implementation in frugally-deep with the one used in TensorFlow.

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

In the softmax implementation, when using float and fdeep_model2, std::isinf(result) is true (sum_shifted is 0, log_sum_shifted is -inf) for some values, resulting in 0.0 being written to the output tensor. This does not happen with double.

So far, I've not yet found the right approach to fix it. The implementation already had quite some iterations related to similar problems:

If you have an idea, please let me know. 🙂

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 20, 2023

By looking at the TensorFlow softmax implementation, I noticed a bug in frugally-deep's softmax implementation. This commit fixes it. With the latest frugally-deep version (just released), the automated tests for all your five provided models work fine with float precision. 🎉

Thanks a lot for your very good error report, pointing out this problem, and thus helping me to make frugally-deep better. ❤️

@Dobiasd Dobiasd closed this as completed Sep 20, 2023
@roberttangswitch
Copy link
Author

Hi @Dobiasd,

With the fix, it is definitely a great improvement for frugally-deep in terms
of stability.

"On my machine, I just tested with fdeep_model1.json and the difference was
quite small, i.e., 0.0051 s (on average) per forward pass with float and
0.0057 s with double."

On your previous comment, my Visual Studio C++ does not run as fast as
your benchmark. Could you share the detailed libraries or configurations
I need port to my Visual Studio IDE so that I can get the same speed like
yours? Do you use some external libraries such as Intel MKL libraries?

BTW, although all five models (model0-model4) can pass check_test_outputs()
with the fix, I still have a bigger model that cannot pass this
check test. That means there are still some discrepancies somewhere between
Tensorflow & frugally-deep. Are you willing to challenge again to find the
root cause? This time since the model is complicated, it may take some
time for me to find the minimal model for debugging.

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 22, 2023

On your previous comment, my Visual Studio C++ does not run as fast as your benchmark. Could you share the detailed libraries or configurations I need port to my Visual Studio IDE so that I can get the same speed like yours? Do you use some external libraries such as Intel MKL libraries?

I don't use Intel MKL or anything alike. Here is a Dockerfile reproducing the performance results (on my Intel Core i5-6600 CPU @ 3.30GHz), so there is no uncertainty about libraries or compiler settings: https://gist.github.com/Dobiasd/20591b088e5c3451f0fcbaa0b4327997

In my experience, GCC does not produce significantly faster machine code than the Visual C++ compiler (if the compiler optimization flags are set well).

@Dobiasd
Copy link
Owner

Dobiasd commented Sep 22, 2023

I still have a bigger model that cannot pass this check test. That means there are still some discrepancies somewhere between Tensorflow & frugally-deep. Are you willing to challenge again to find the root cause? This time since the model is complicated, it may take some time for me to find the minimal model for debugging.

Sure, I can try, but it might be that I can't solve it until Sunday, and then I'm afk-ish for the next three weeks.

Is this new model also a case that works with double but fails with float? If so, the same debugging tactics as shown above could work.

@Dobiasd Dobiasd reopened this Sep 22, 2023
@roberttangswitch
Copy link
Author

Hi @Dobiasd,

The option using double does not help this model to pass check_test_outputs();
this is not the same precision problem as before. So please close this ticket.
The new model may have a different issue. Once I find the minimal model to
reproduce the issue, I will open a new ticket.

Thanks a lot for your technical support. Have a nice vacation!

@Dobiasd Dobiasd closed this as completed Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants