Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Condition "p_elem->_root != this" is true. #62970

Closed
thoraxe opened this issue Jul 13, 2022 · 21 comments
Closed

ERROR: Condition "p_elem->_root != this" is true. #62970

thoraxe opened this issue Jul 13, 2022 · 21 comments

Comments

@thoraxe
Copy link

thoraxe commented Jul 13, 2022

Godot version

v3.4.4.stable.mono.official.419e713a2

System information

Linux (Fedora), Intel Graphics

Issue description

I am getting this error, which appears to be a recurring problem now and again. It is not clear when it starts occurring or what causes it explicitly.

You can see my entire game client here:
https://github.com/redhat-gamedev/srt-godot-client

There are many closed issues related to this error:
https://github.com/godotengine/godot/issues?q=%22p_elem-%3E_root+%21%3D+this%22+is%3Aclosed

The two that are potentially the most closely related are the following:
#22565
#20085

These both mention threading and node manipulation. The second one explicitly mentions the use of "add_child".

In the game client, there are really only two places where add_child is meaningfully used:
https://github.com/redhat-gamedev/srt-godot-client/blob/main/Scenes/MainScenes/Game.cs#L134
https://github.com/redhat-gamedev/srt-godot-client/blob/main/Scenes/MainScenes/Game.cs#L165

Neither seems to reliably produce the error.

Steps to reproduce

Unfortunately I have no reliable reproducer.

Minimal reproduction project

N/A

@thoraxe
Copy link
Author

thoraxe commented Jul 13, 2022

I don't think add_child is related, because the game client was sitting and doing nothing (no ships or missiles were created) and it suddenly started spewing the error.

@thoraxe
Copy link
Author

thoraxe commented Nov 11, 2022

This is still present in 3.5.1.

I noticed that the issue cropped up while I was sending player move commands in the client. Here's the client send command method:
https://github.com/redhat-gamedev/srt-godot-client/blob/main/Networking/ServerConnection.cs#L123-L143

Here's some output from the logs:

[12:58:06 DBG] Game.cs: Got move command
ERROR: Condition "p_elem->_root != this" is true.
   at: remove (./core/self_list.h:80)

Not sure this is related, but this is when it happened.

@thoraxe
Copy link
Author

thoraxe commented Nov 14, 2022

Is there anything I could do to get additional debug data to help with testing, @Calinou ?

@Calinou
Copy link
Member

Calinou commented Nov 14, 2022

Is there anything I could do to get additional debug data to help with testing, @Calinou ?

We need a minimal reproduction project, with the code isolated as much as possible, and in GDScript (to rule out C# being the cause of the bug). We can't do anything about this otherwise.

@thoraxe
Copy link
Author

thoraxe commented Nov 14, 2022

@Calinou You are describing a bit of a chicken-egg troubleshooting challenge. I could create a reproducer if I knew what the problem was. But without additional debugging being spit out by Godot, I can't figure out what the problem is to create a reproducer. I don't even know where in the trace of my code path this error starts being generated or what part of Godot is causing it to continue to be generated.

Even if we could figure out more specifically where the problem lies, having to rewrite the reproducer in GDScript could potentially be impossible if the problem is being caused by C#. As this project uses various C# libraries that have no possible analogs in GDScript, this seems like a pretty huge uplift.

Put yourself in my shoes -- how would you begin to try to create a reproducer?

@raulsntos
Copy link
Member

Even if we could figure out more specifically where the problem lies, having to rewrite the reproducer in GDScript could potentially be impossible if the problem is being caused by C#.

If the issue is caused by C# there's no need to rewrite it in GDScript, an MRP in C# would suffice. We ask for MRPs in GDScript because it's easier for contributors to test (not every contributor uses C#) so you are more likely to get attention. Also, if creating the the equivalent MRP in GDScript does not reproduce the bug, that is a clear indication that the bug is specific to C# so it gives us a hint as to where the issue may come from.

I could create a reproducer if I knew what the problem was. But without additional debugging being spit out by Godot, I can't figure out what the problem is to create a reproducer. I don't even know where in the trace of my code path this error starts being generated or what part of Godot is causing it to continue to be generated.

You mentioned that you get ERROR: Condition "p_elem->_root != this" is true. The only assert I could find with that condition is in SelfList<T>::List::remove(SelfList<T>):

ERR_FAIL_COND(p_elem->_root != this);

Is this the only output you get in the console? Is there anything else that may indicate where the error comes from? Like previous messages that may indicate what was being executed immediately before the error.

You can try removing parts of your game until the error stops happening to try and find what causes it, you probably already have an idea of what part of the code is most likely the cause based on what you were doing when the error occurred so try with that first.

Since you seem to be using asynchronous methods, take a look at those because it's likely they are not synchronized with Godot's main thread. For example, you are calling AddChild as a response of some asynchronous event, see if adding the child with CallDeferred fixes your issue:

CallDeferred("add_child", missileInstance);

Alternatively, if you are willing to build Godot from source, you can try running the binary with gdb or lldb to get a full stacktrace that can indicate where the error comes from.

@thoraxe
Copy link
Author

thoraxe commented Nov 21, 2022

This PR switched everything to CallDeferred but apparently @RoddieKieley still saw the error.

Is this the only output you get in the console? Is there anything else that may indicate where the error comes from? Like previous messages that may indicate what was being executed immediately before the error.

Other than what I've shared above, there is no indication of what is spawning the error. But, once it starts, it seems to spew forever until the client is stopped.

It sounds like building Godot from the source may be the only option to troubleshoot further because I can't imagine we will make much more progress without a full stack trace.

@thoraxe
Copy link
Author

thoraxe commented Nov 21, 2022

I took a quick spin of the docs and didn't see anything special related to generating debugging symbols. Does this happen automatically when building?

@raulsntos
Copy link
Member

raulsntos commented Nov 21, 2022

To build with debug symbols use the debug_symbols=yes argument in the SCons command.

I believe the docs about compiling may not have been updated for 4.0 yet, take a look at this PR that changed some arguments recently: #66242

Also, take a look at the mono module README.md for instructions specific to C#.

Oops, totally missed that this issue was about 3.x. I believe in 3.x the debug symbols are generated by default.

@thoraxe
Copy link
Author

thoraxe commented Nov 22, 2022

OK well, I have it built, but I don't understand how to use gdb. It keeps pausing when nothing bad is happening.

What is the gdb magic so that it will only pause/backtrace/whatever when the ERROR: Condition "p_elem->_root != this" is true. happens?

@Zireael07
Copy link
Contributor

@thoraxe
Copy link
Author

thoraxe commented Nov 23, 2022

@raulsntos unfortunately, ERR_FAIL_COND is a macro, and is not part of self_list.h. Per this StackExchange post GDB is not capable of debugging/breaking macros. This leaves me very few choices.

If I set the breakpoint at line 80, I get a PILE of breakpoints, and Godot effectively gets broken/paused constantly because gdb is doing some bizarre macro expansion or something.

(gdb) break ./core/self_list.h:80
Breakpoint 1 at 0x8f500d: ./core/self_list.h:80. (116 locations)

The above makes Godot unusable.

Do you have any other suggestions on how I can use GDB to only break/pause/etc when it spits out this error? Or is there a way to make Godot spit out the backtrace when that error happens? Like can I add something to the macro code?

@thoraxe
Copy link
Author

thoraxe commented Dec 1, 2022

Would it be acceptable if I took the macro code and modified self_list.h in order to execute the same code? This would allow me to set a breakpoint on the actual condition/code in self_list. It wouldn't be exactly the same Godot, and there's a chance that a timing would be modified, but I'd at least be able to backtrace when I got to that point where the error is kicked.

WDYT?

@raulsntos
Copy link
Member

raulsntos commented Dec 1, 2022

AFAIK the macro is just a compile-time replacement so it should be the same, also feel free to make any modifications you need if they help you debug the issue such as using print_line or WARN_PRINT.

Once you locate the source of the issue, you may be able to reproduce it more easily without modifying Godot's source code.

@thoraxe
Copy link
Author

thoraxe commented Dec 1, 2022

Yes, I'm going to swap the macro call for the code from the macro itself and put that in self_list.h so that I can set a breakpoint.

Alternatively, I have not tried to use VScode to build the Godot project, which could let me debug/backtrace directly in VScode. That may be my next step if I continue to struggle with GDB.

@thoraxe
Copy link
Author

thoraxe commented Dec 6, 2022

I added CRASH_NOW() inside the method that checks for this error condition, and got this output:
https://gist.github.com/thoraxe/f56fd138a23f43be56d82eb9046093de

I'm not sure if this is helpful.

@thoraxe
Copy link
Author

thoraxe commented Dec 6, 2022

OK, I was finally able to get the debugger inside the game at the time the error condition is raised. Here's the stack at that moment:

SelfList<Node>::List::remove(SelfList<Node>*) (/home/thoraxe/Red_Hat/openshift/godot/core/self_list.h:82)
SceneTree::flush_transform_notifications() (/home/thoraxe/Red_Hat/openshift/godot/scene/main/scene_tree.cpp:187)
SceneTree::idle(float) (/home/thoraxe/Red_Hat/openshift/godot/scene/main/scene_tree.cpp:661)
Main::iteration() (/home/thoraxe/Red_Hat/openshift/godot/main/main.cpp:2298)
OS_X11::run() (/home/thoraxe/Red_Hat/openshift/godot/platform/x11/os_x11.cpp:3987)
main (/home/thoraxe/Red_Hat/openshift/godot/platform/x11/godot_x11.cpp:59)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main@@GLIBC_2.34 (@__libc_start_main@@GLIBC_2.34:43)
_start (@_start:15)

I don't know if this tells you anything.

@thoraxe
Copy link
Author

thoraxe commented Jan 4, 2023

@raulsntos FYI I took all of the async things that were happening when messages were received and put them into queues that are then processed during _Process on a node. This is what the server code already does and we've never seen this error on the server-side.

I did a quick test with a bunch of players and did not hit this error. We're going to do a deeper soak test later this week but this might have fixed it. Will report back accordingly.

@laingawbl
Copy link

Did it?

@choskyo
Copy link

choskyo commented Mar 22, 2023

Just to +1 thoraxe's possible solution, I'd been bashing my head against the same error for a while.
In my case (it sounds like similarly) I was spawning + transforming nodes based on a signal that fired when a network connection received an update. I peeked at his solution here and went for a similar Queue-based approach instead and I haven't seen the error since.
Possibly relevant is that I was doing it on multiple separate events and it was only an issue on handlers that would try to affect more than a dozen or so nodes at a time.

@thoraxe
Copy link
Author

thoraxe commented Mar 23, 2023

Since switching to using queues, we have apparently eliminated this problem. It no longer seems to be happening. I can't claim direct causation, but the correlation is strong. However, using standard C# Queue was also causing problems where sometimes threading would try to operate on elements already gone from the queue. We switched to ConcurrentQueue to resolve those issues.

I am closing this as no longer an issue due to a lack of other evidence to show that it's still a problem.

Thanks, all, for the tips.

@thoraxe thoraxe closed this as completed Mar 23, 2023
@Calinou Calinou closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants