-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Backpropagation does not work #93
Comments
I tested the The CPU mode seems to have some convergence issues in general. Maybe deep down it's somehow related to the Windows backprop issue mentioned above. @BachiLi Any idea, what could be the cause for the discrepancy? |
Interesting. Looking into this. |
CPU mode runs fine on my mac...I'm really confused. |
It also runs fine on my linux machine. This seems like a Colab-specific issue? |
This issue exists on Colab for all redner versions I tested. I have no idea why there is a discrepancy between Colab and my Linux machine. |
This issue also exists on the Tensorflow side. Actually the tensorflow version crashes on Colab occasionally. |
Typical case of 'but it runs on my machine' :D That is really strange. Maybe has something to do with the type of CPU? |
Yes, something is wrong on Colab. |
I have a deadline next week and have to work on something else now. Please let me know if you find anything suspicious. |
Tested the pose estimation with my (custom) Windows GPU branch which is currently based on redner 0.2.3 and the backpropagation works without issues: However, the CPU mode fails as above. So it really seems to be some CPU related issue that exists at least since 0.2.3. I'll keep my eye open. Good luck with your deadline for now! |
That's good news. At least in Colab it seems to work now. However, for Windows it still doesn't work. Same behavior on the CPU as before. There is a small chance that it's still due to some different initialization behavior of MSVC and GCC/Clang. I also just discovered that I didn't properly port one of the compiler intrinsics. My version of Is there a way to verify the integrity of the edge tree or another way to verify that the stuff relying on intrinsics works the same on all systems? |
Hmm. Maybe we can set up some unit tests for the intrinsic. I can potentially do it this weekend. Thank you so much for your time by the way. |
@mworchel Have you checked if the atomics are working properly on windows? |
Unfortunately, I didn't fully verify that the atomics work. The file My hope is, that the CPU path is a little bit easier to debug. Would be great if we had some basic tests. Like I said before, it's a great research project and I'm happy I can contribute something :) |
Windows machines should be able to |
Great news! Did you find any new evidence why the CPU backprop didn't work? I suppose most people use the GPU version anyway, so it's not that urgent. |
Huh, I thought that was resolved. I'll test on CPU later. |
Unfortunately not. Your initialization fix fixed it for colab but not for Windows. I wasn't able to track the issue any further. Hope you'll find something. |
Pretty sure there is a bug in atomic add in windows cpu. Should be fixed soon. |
0.4.3 should fix this. It was a type conversion issue in the atomics: InterlockedCompareExchange takes integers as arguments and we passed float to them without reinterpret_cast. |
Nice catch! Will try it as soon as I can. |
Indeed fixed in 0.4.3. Thanks. |
When running the
pose_estimation
sample under Windows, the optimization part does not actually perform any optimization. The loss seems to vary randomly and the final estimate does not visually differ from the initial one:It seems as if the parameter updates are not correctly computed (or are way too small since there is no visual difference between iterations) for whatever reason.
System:
The text was updated successfully, but these errors were encountered: