Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in template_release build due to optimization level (GCC -O2) #85263

Closed
akien-mga opened this issue Nov 23, 2023 · 4 comments · Fixed by #85280
Closed

Segfault in template_release build due to optimization level (GCC -O2) #85263

akien-mga opened this issue Nov 23, 2023 · 4 comments · Fixed by #85280

Comments

@akien-mga
Copy link
Member

akien-mga commented Nov 23, 2023

Godot version

4.2.rc2 (ad72de5)

System information

Mageia 9 - Vulkan (Forward+) - dedicated AMD Radeon RX Vega M GL Graphics (RADV VEGAM) () - Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz (8 Threads)

Issue description

Spin-off from #70910, where the MRP from #70910 (comment) seems to trigger a segfault (on Linux with GCC at least), but only with 4.2 builds, so it appears to be a regression of some sort.

The crash is reproduced when compiling Godot with scons p=linuxbsd target=template_release, to which debug_symbols=yes can be added to have a nice stacktrace.

Here's the log output and stacktrace:

ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Condition "slot >= slot_max" is true. Returning: nullptr
   at: get_instance (./core/object/object.h:1033)
ERROR: Parameter "canvas_item" is null.
   at: canvas_item_clear (./servers/rendering/renderer_canvas_cull.cpp:1558)

Thread 1 "godot.linuxbsd." received signal SIGSEGV, Segmentation fault.
0x0000000003c79105 in Object::notification (this=0x6159910, p_notification=30, p_reversed=<optimized out>) at ./core/object/object.cpp:837
837                     _notificationv(p_notification, p_reversed);
(gdb) bt
#0  0x0000000003c79105 in Object::notification (this=0x6159910, p_notification=30, p_reversed=<optimized out>) at ./core/object/object.cpp:837
#1  0x0000000001fa3557 in CanvasItem::_redraw_callback (this=0x6159910) at ./scene/main/canvas_item.cpp:140
#2  CanvasItem::_redraw_callback (this=0x6159910) at ./scene/main/canvas_item.cpp:129
#3  0x0000000003c7e72d in CallQueue::_call_function (this=this@entry=0x44c85d0, p_callable=..., p_args=p_args@entry=0x5fdc238, p_argcount=0, p_show_error=<optimized out>) at ./core/object/message_queue.cpp:219
#4  0x0000000003c97425 in CallQueue::flush (this=0x44c85d0) at ./core/object/message_queue.cpp:324
#5  0x0000000002052afc in SceneTree::physics_process (this=0x5feb600, p_time=0.016666666666666666) at ./scene/main/scene_tree.cpp:471
#6  0x000000000108f2f4 in Main::iteration () at main/main.cpp:3598
#7  0x00000000010244c1 in OS_LinuxBSD::run (this=this@entry=0x7fffffffcf80) at platform/linuxbsd/os_linuxbsd.cpp:933
#8  0x0000000001023afb in main (argc=<optimized out>, argv=0x7fffffffd538) at platform/linuxbsd/godot_linuxbsd.cpp:74
(gdb) 
(gdb) frame 0
#0  0x0000000003c79105 in Object::notification (this=0x6159910, p_notification=30, p_reversed=<optimized out>) at ./core/object/object.cpp:837
837                     _notificationv(p_notification, p_reversed);
(gdb) print p_notification
$1 = 30
(gdb) print p_reversed
$2 = <optimized out>
(gdb) print this
$3 = (Object * const) 0x6159910
(gdb) print _notificationv
$4 = {void (Object * const, int, bool)} 0x1033be0 <Object::_notificationv(int, bool)>

The main problem seems to be the optimization level.

With target=template_release, the default optimization level is optimize=speed_trace, which for GCC sets -O2.

I tested a custom build with -O1 (optimize=debug CCFLAGS=-O1), which also reproduces the bug.
That gives slightly more info in the stacktrace:

0x0000000003833083 in Object::notification (this=this@entry=0x5c4dbe0, p_notification=p_notification@entry=30, p_reversed=p_reversed@entry=false) at ./core/object/object.cpp:837
837                     _notificationv(p_notification, p_reversed);
(gdb) bt
#0  0x0000000003833083 in Object::notification (this=this@entry=0x5c4dbe0, p_notification=p_notification@entry=30, p_reversed=p_reversed@entry=false) at ./core/object/object.cpp:837
#1  0x0000000001ca06c4 in CanvasItem::_redraw_callback (this=0x5c4dbe0) at ./scene/main/canvas_item.cpp:140
#2  0x0000000001cc2129 in call_with_variant_args_helper<CanvasItem>(CanvasItem*, void (CanvasItem::*)(), Variant const**, Callable::CallError&, IndexSequence<>) (r_error=..., p_args=<optimized out>, p_method=<optimized out>, 
    p_instance=<optimized out>) at ./core/variant/binder_common.h:305
#3  call_with_variant_args<CanvasItem> (r_error=..., p_argcount=<optimized out>, p_args=<optimized out>, p_method=<optimized out>, p_instance=<optimized out>) at ./core/variant/binder_common.h:417
#4  CallableCustomMethodPointer<CanvasItem>::call (this=<optimized out>, p_arguments=<optimized out>, p_argcount=<optimized out>, r_return_value=..., r_call_error=...) at ./core/object/callable_method_pointer.h:104
#5  0x000000000363e2c1 in Callable::callp (this=this@entry=0x5ac8240, p_arguments=p_arguments@entry=0x0, p_argcount=p_argcount@entry=0, r_return_value=..., r_call_error=...) at ./core/variant/callable.cpp:57
#6  0x000000000383a2a8 in CallQueue::_call_function (this=this@entry=0x3fb35e0, p_callable=..., p_args=p_args@entry=0x5ac8258, p_argcount=0, p_show_error=<optimized out>) at ./core/object/message_queue.cpp:219
#7  0x000000000384e773 in CallQueue::flush (this=0x3fb35e0) at ./core/object/message_queue.cpp:324
#8  0x0000000001d53f34 in SceneTree::physics_process (this=0x5ad7400, p_time=0.016666666666666666) at ./scene/main/scene_tree.cpp:471
#9  0x000000000101d67a in Main::iteration () at main/main.cpp:3598
#10 0x0000000000fbe6a4 in OS_LinuxBSD::run (this=this@entry=0x7fffffffcf80) at platform/linuxbsd/os_linuxbsd.cpp:933
#11 0x0000000000fbe4a5 in main (argc=<optimized out>, argv=0x7fffffffd538) at platform/linuxbsd/godot_linuxbsd.cpp:74

With -O0 (optimize=none, or optimize=custom CCFLAGS=-O0), there's no crash, just endless error spam like in #70910.

Steps to reproduce

  • Download MRP
  • Comment out the return line in Control.gd as advised by the script
  • Run it with a template_release binary, compiled with scons p=linuxbsd target=template_release debug_symbols=yes or equivalent

Minimal reproduction project

Freebug.zip

@bruvzg
Copy link
Member

bruvzg commented Nov 23, 2023

With template_release build on macOS, I'm getting a lot of ERROR: Condition "slot >= slot_max" is true. Returning: nullptr messages (but not endless spam) and no crash.

@akien-mga akien-mga changed the title Segfault in template_release build due to optimization level (-O2) Segfault in template_release build due to optimization level (GCC -O2) Nov 23, 2023
@YuriSizov
Copy link
Contributor

YuriSizov commented Nov 23, 2023

With template_release build on macOS, I'm getting a lot of ERROR: Condition "slot >= slot_max" is true. Returning: nullptr messages (but not endless spam) and no crash.

Same for me on Windows (MSVC if that's important again).

Edit:
By the way, added a custom message so there is at least some information in release builds, and the overflow is pretty impressive:

ERROR: Cannot get instance from ObjectDB, slot index 8957152 (id: 1254273625312) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 8957152 (id: 1254273625312) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 15307384 (id: -8646787037943721352) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 15307384 (id: -8646787037943721352) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 9782608 (id: 1254274450768) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 9782608 (id: 1254274450768) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 8776596 (id: -8646601220619375724) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)
ERROR: Cannot get instance from ObjectDB, slot index 8776596 (id: -8646601220619375724) is exceeding max 2048
   at: (C:\Projects\godot-engine\master\core/object/object.h:1034)

Edit2:
Actually, what the hell? How is id printed here, which is supposed to be a uint64 value, negative? Or is this just a bug/limitation with vformat?

@bruvzg
Copy link
Member

bruvzg commented Nov 23, 2023

overflow is pretty impressive

It likely just random garbage from corrupted memory, the issue should be in the CallQueue.

Actually, what the hell? How is id printed here, which is supposed to be a uint64 value, negative?

GDScript int (and therefore vformat) is always signed 64-bit int.

@bruvzg
Copy link
Member

bruvzg commented Nov 23, 2023

The issues seem to be not in the CallQueue, but in the CallableCustomMethodPointer:

  • In release mode, CallableCustomMethodPointer store only a raw pointer to the object, and if this object freed during CallQueue flush, subsequent Callable processing is causing read-after-free when it's trying to get ObjectID from this pointer.
  • In debug mode CallableCustomMethodPointer store ObjectID directly, so it's not an issue, queue just ignore calls if it can't get ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants