Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CPU lightmapper stuck at preparing data structures [MinGW GCC] #45097

Closed
mbrlabs opened this issue Jan 11, 2021 · 55 comments
Closed

New CPU lightmapper stuck at preparing data structures [MinGW GCC] #45097

mbrlabs opened this issue Jan 11, 2021 · 55 comments

Comments

@mbrlabs
Copy link
Contributor

mbrlabs commented Jan 11, 2021

Godot version:
3.2.4.beta5+lightmapper from #44628 (comment)

OS/device including version:
Windows 10, GeForce GTX 660/PCIe/SSE2, i7-3820 @3.6GHz

Issue description:
I tried the new lightmapper on a simple model using the lowest bake settings on my old PC and it hangs at: "Preparing data structures". It was not responding and after half an hour i had to kill the process. The UV2 coordinates were generated by Godot using "Generate Lightmaps" in the import settings. cc @JFonS

Capture

These are my bake settings:
settings

Steps to reproduce:
Open the project, select the BakedLightmap Node and bake it.

Minimal reproduction project:
LightmapperTester.zip

@JFonS
Copy link
Contributor

JFonS commented Jan 11, 2021

I tried to reproduce using the MRP on Linux and I wasn't able to. I will try to reproduce on Windows when I have some spare time.

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 11, 2021

I have a dual boot setup and Manajaro Linux on my other drive using the same hardware. I just tested it there and it bakes in like under 1 second without any issues, so it's definitely a Windows thing.

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 11, 2021

I opened the project on Windows with the -d debug flag and this is what i get in the console (not much; but a starting point):

Embree error: you have to wait for spawned subtasks
terminate called after throwing an instance of 'std::runtime_error'
  what():  you have to wait for spawned subtasks

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 11, 2021

I'm compiling https://github.com/JFonS/godot/tree/new_cpu_lightmapper_3.2 from source now and i get a bunch of warnings when compiling Embree: cl : Befehlszeile warning D9002 : Unbekannte Option "-msse2" wird ignoriert. (Means unkown option "-msse2" will be ignored). Not sure if related. I checked the CI builds and the logs contain the same warning.

Edit:
That's interesting. It actually works when building it myself (debug and release).

@JFonS
Copy link
Contributor

JFonS commented Jan 11, 2021

Ah, I see what's the issue: Embree and MinGW don't get along.

If I remember correctly @RandomShaper looked into this issue a while back, but I don't know if there ever was a conclusion.
Also, it seems like other users report similar issues: RenderKit/embree#279.

I would gather a bit more info about the problem here, and then open an issue in Embree's repo or in MinGW's issue tracker (or both) depending on what we find out.

@fire
Copy link
Member

fire commented Jan 11, 2021

I have the changes for fixing mingw.

https://gist.github.com/fire/00d1206bd95c9d6f047bffa053c9c44b

@RandomShaper
Copy link
Member

I remember the issue quite vaguely. When I researched it, I found that some intrinsic used to query CPU features was not returning the right value in MinGW. However, after working around that, it was still crashing for some people. Then, the whole fixing effort was abandoned since we switched to MSVC.

@RandomShaper
Copy link
Member

Maybe @fire's patch is able to get away with some/all the issues. I'll try in the meantime to remember what the exact problem was. (I think it was a MinGW-Windows exclusive issue.)

@RandomShaper
Copy link
Member

Nightly owl update: I've been investigating and I almost have a cure for this.

@JFonS
Copy link
Contributor

JFonS commented Jan 12, 2021

We were blessed by the nightly owl :)

@RandomShaper
Copy link
Member

This patch fixes the issue, at least for me: https://pastebin.com/xe85bK2c
@JFonS, you may want to integrate it in the PR and then we'll see if the issue really goes away.

Detailed info:

It seems there are issues on MinGW-Windows with two intrinsics.

Regarding _mm_pause(), it should just emit the PAUSE instruction, so I have no idea why it's causing the problem. I've tried replacing the call with direct assembly for PAUSE, REP; NOP (which is equivalent), even emitting directly the machine opcodes, and even just commenting out the calls. Every option was causing the same issue, so in the end I've just replaced the calls by 1 μs sleep. That fixes the freeze for me. (PAUSE is a more or less important optimization for busy waits, but I guess we can live without it until we find a better solution.)

Regarding _xgetbv(), it seems to return a wrong value, different from the one returned when built with MSVC. Fixing it fixes a crash that would occur if just the freeze was fixed.


An alternative would be switching to MinGW-LLVM for Windows, where Embree seems to work, but that would require a whole round of betas, etc. Maybe for the next release of Godot, so we can get rid of this patch? @akien-mga

@akien-mga
Copy link
Member

An alternative would be switching to MinGW-LLVM for Windows, where Embree seems to work, but that would require a whole round of betas, etc. Maybe for the next release of Godot, so we can get rid of this patch? @akien-mga

That's an option, but not for 3.2.4 indeed. And for later releases that would require some work compared to the current setup which relies on Fedora's packaging of MinGW-GCC.

In the meantime though, it would be good to try to contribute fixes for MinGW-GCC upstream to see if they can be integrated. (Possibly after we've tested them successfully.)

@RandomShaper
Copy link
Member

Not so fast... It still crashes for me on release. I'll keep trying.

@RandomShaper
Copy link
Member

...Especially if I had reverted the patch locally. 😅

Release with the patch works, too!

@JFonS
Copy link
Contributor

JFonS commented Jan 13, 2021

I added the patch to my local branch, I will push the changes together with the rest of fixes.

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 14, 2021

Good job figuring that out @RandomShaper and @JFonS 👍. I don't have MinGW setup, but i'm going to test this with the next beta or test build!

@RandomShaper
Copy link
Member

Thanks you, @mbrlabs, but let's not uncap the champagne so fast. 😃 The patch is a best effort thing rather than an analytical solution, so we don't know yet if it's good until multiple people try it.

That said, looking forward to reading your future comment telling it worked for you. 😁

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 16, 2021

@RandomShaper @JFonS I saw that @akien-mga made a new beta build which is not officially released yet and i tested it with the sample project. Well the freeze is gone, but unfortunally it crashes instead at the "Preparing data structures" step with the same error.

@NHodgesVFX
Copy link
Contributor

@RandomShaper @JFonS I saw that @akien-mga made a new beta build which is not officially released yet and i tested it with the sample project. Well the freeze is gone, but unfortunally it crashes instead at the "Preparing data structures" step with the same error.

I can confirm this in my own project as well

@RandomShaper
Copy link
Member

Hmm... I guess my test project was too simple to trigger the crash. I'll keep trying to improve the fix as soon as I get some time.

@DarkKilauea
Copy link
Contributor

Confirming that this issue occurs on AMD processors using the official build on windows, but disappears when Godot is built using MSVC. See #45260 for details.

@akien-mga akien-mga changed the title New CPU lightmapper stuck at preparing data structures New CPU lightmapper stuck at preparing data structures [MinGW GCC] Jan 18, 2021
@akien-mga
Copy link
Member

I'll try a build with mingw 7.0.0 / binutils 2.34 from Fedora 33, and do a test with and without LTO to see if it's involved in the issue.

To test the MinGW-GCC specific issue further, I've made two builds on Fedora 33 with mingw 7.0.0, GCC 10.2 and binutils 2.34, one with LTO (like official builds) and one without:

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 19, 2021

I tried to bake with both versions until something interesting happend (I'd say both versions around 5-10 times). After a freeze i just killed the process and tried again.

No-LTO version

  • Actually worked the first time i tried it with my MRP! But, all the other times i tried it, it froze at Preparing data structures
  • One time it crashed
  • The Embree error message was the usual one:
Embree error: you have to wait for spawned subtasks
terminate called after throwing an instance of 'std::runtime_error'
  what():  you have to wait for spawned subtasks

LTO version

  • Worked one time after several freezes and crashes with the usual Embree error.
  • One time it got to the Generate buffers step and it froze printing this message in an endless loop to the console: Embree error: scene not commited

This seems like a really nasty bug, which we don't really have control over. I fear that no matter what we do for some users it's always going to freeze/crash. Maybe we should consider building the official Windows binaries with MSVC instead 😅?

@Calinou
Copy link
Member

Calinou commented Jan 19, 2021

Maybe we should consider building the official Windows binaries with MSVC instead 😅?

This is what we did a long time ago, but we switched to MinGW + LTO in Godot 3.0 to get better performance. I would prefer not tying the official buildsystem to Windows as it makes everything more complicated…

@JFonS
Copy link
Contributor

JFonS commented Jan 19, 2021

Seeing the amount of issues Embree is causing, I will take a look at adapting some other ray tracing library. Maybe we can get acceptable performance without all the compiler and SIMD shenanigans.

@NHodgesVFX
Copy link
Contributor

Maybe we should consider building the official Windows binaries with MSVC instead 😅?

This is what we did a long time ago, but we switched to MinGW + LTO in Godot 3.0 to get better performance. I would prefer not tying the official buildsystem to Windows as it makes everything more complicated…

I would be happy to build the editor with MSVC for an official release if no one else can. The editor templates probaly work compiled with gcc and I dont think baking light maps is supported on 32bit. So we would only need to compile the 64 bit editor windows. It would be nice to fix the actual problem but its seeming less easy by the moment.

Seeing the amount of issues Embree is causing, I will take a look at adapting some other ray tracing library. Maybe we can get acceptable performance without all the compiler and SIMD shenanigans.

Maybe I'm wrong but haven't really seen any other raytracing library's about. Except for

https://gpuopen.com/radeon-rays/ 2.0 is open-source and 4.0 seem like it will be soon. For cpu is also based on embree.
https://github.com/szellmann/visionaray
Primarily for images, plus lots of dependcies
https://github.com/ands/lightmapper
Not sure if it has all the needed features but perhaps it could be extended.

@Calinou
Copy link
Member

Calinou commented Jan 20, 2021

I would be happy to build the editor with MSVC for an official release if no one else can. The editor templates probaly work compiled with gcc and I dont think baking light maps is supported on 32bit. So we would only need to compile the 64 bit editor windows. It would be nice to fix the actual problem but its seeming less easy by the moment.

Feel free to distribute your own binaries here, but we can't accept third-party binaries as "official" binaries for security reasons.

@akien-mga
Copy link
Member

akien-mga commented Jan 20, 2021

Yeah don't worry, if we need to build binaries with MSVC, that's not too hard to do. We already build UWP binaries with VS 2017 on Wine and it works well (need to look into upgrading to VS 2019), and worse case I could buy Windows binaries manually with MSVC on my machine, but that would suck.

Also, I'm not particularly motivated for swapping compiler mid-branch as it can have implications on the performance and size of builds, as well as possibly expose new MSVC-specific issues that were not affected official binaries now. And all builds should be done with the same compiler for a given platform, so it's not just the editor build.

Now, if that's what it takes, we might come to it, but either way this is going to introduce severe delay for the 3.2.4 release if this is a blocker. It would be much nicer if we could figure out what's missing for Embree to support MinGW-GCC properly - and have issues fixed upstream.

@esdinev
Copy link

esdinev commented Jan 20, 2021

Which branch should I build to test the lightmapper? Master was completely broken today when I built it. Couldn't even open my project.

@Calinou
Copy link
Member

Calinou commented Jan 20, 2021

Which branch should I build to test the lightmapper? Master was completely broken today when I built it. Couldn't even open my project.

The 3.2 branch contains the new CPU lightmapper.

Do not open 3.2.x projects with master yet as this will likely break their file structure! (Or make a backup before doing so.)

@esdinev
Copy link

esdinev commented Jan 20, 2021

Ah OK. I will keep that in mind. I'll do a build now and report back after I test it.

@esdinev
Copy link

esdinev commented Jan 20, 2021

The executable built with msvc seems to bake without problems. At least it looks to be after a fast test.
I'll test it more tomorrow but so far I think that is indeed the problem.

@NHodgesVFX
Copy link
Contributor

The executable built with msvc seems to bake without problems. At least it looks to be after a fast test.
I'll test it more tomorrow but so far I think that is indeed the problem.

Yeah msvc works well although you might run into #45296

@akien-mga
Copy link
Member

There's a new attempt to solve this in #45346, so if either of you is still willing to do some testing for me (thanks!), here are new builds with that PR:

I don't know yet if LTO has an impact so you don't necessarily need to test both builds, if we want to be safe (i.e. check if the potential fix worked) I'd advise testing the no-LTO build first, which might be a bit slower but less risky (as LTO can introduce bugs).

@esdinev
Copy link

esdinev commented Jan 21, 2021

I tried the non-LTO version. Same as before it freezes at 18% of preparing data structures.

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 21, 2021

I tried the non-LTO version. Same as before it freezes at 18% of preparing data structures.

Same for me with both versions: lots of freezes and sometimes crashes.

@akien-mga
Copy link
Member

Thanks for testing! Over to you @RandomShaper :P
I'll try to reproduce the issue too myself, and do a compilation with MinGW-GCC on Windows locally to see if it behaves differently.

@RandomShaper
Copy link
Member

I'll try to setup a similar build environment as the one used for official builds, to be able to iterate faster locally and still fight a bit more before giving up.

@akien-mga
Copy link
Member

akien-mga commented Jan 25, 2021

New builds with #45380! Is it our lucky day? Is this random shape the one that will cover MinGW-GCC's gaping wound of alignment issues? Only time (and amazing testers!) will tell

@NHodgesVFX
Copy link
Contributor

tried my own project and the ops project works fine for me on both lto and non lto. :)

@mbrlabs
Copy link
Contributor Author

mbrlabs commented Jan 25, 2021

Yay! I can confirm that this ideed fixed the issue. Great job all :)

@JFonS
Copy link
Contributor

JFonS commented Jan 25, 2021

Awesome! Thanks a lot @RandomShaper for looking into this :)

@RandomShaper
Copy link
Member

I'm super happy it worked. 🤠

@akien-mga
Copy link
Member

Closing as fixed then! I'll soon spin a new beta or RC1 for 3.2.4 including these fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants