Use WorkerThreadPool for Server threads (enhanced) #90268

RandomShaper · 2024-04-05T15:49:28Z

This PR is the version of #77004 where my collaboration on top of @reduz's work has been added.

This is kept from the description of the original PR:

Servers now use WorkerThreadPool for background computation.

This helps keep the number of threads used fixed at all times.

It also ensures everything works on HTML5 with threads.

And makes it easier to support disabling threads for also HTML5.

This is new:

CommandQueueMT now syncs with the servers via the WorkerThreadPool yielding mechanism, which makes its classic main sync semaphore superfluous.
Also, some warnings about calls that kill performance when using threaded rendering are removed because there's a mechanism that warns about that in a more general fashion.

This is no longer true:

The only downside of this is that GLES3 threads likely no longer work because every time the render thread is called, the thread context is different. This needs to be researched how to fix. Maybe if GLES3 is used, threaded rendering server should be disabled for the time being?

Supersedes #77004.

2024-04-09: A way of easily testing this being sure all the physics and rendering are threaded is to apply this patch:
all_servers_mt.txt

core/config/engine.cpp

doc/classes/WorkerThreadPool.xml

tests/core/threads/test_worker_thread_pool.h

core/object/worker_thread_pool.cpp

bruvzg · 2024-04-05T20:30:16Z

platform/macos/display_server_macos.mm

+#if defined(GLES3_ENABLED)
+	if (gl_manager_angle) {
+		gl_manager_angle->release_current();
+	}
+	if (gl_manager_legacy) {
+		gl_manager_legacy->release_current();
+	}
+#endif
 }

 void DisplayServerMacOS::make_rendering_thread() {
+#if defined(GLES3_ENABLED)
+	if (gl_manager_angle) {
+		gl_manager_angle->make_current();
+	}
+	if (gl_manager_legacy) {
+		gl_manager_legacy->make_current();
+	}
+#endif


This seems to be called from the thread when it is started and set the current GL context (to the context of the active window), but GL context is already set from _draw_viewport() at the start of each window drawing, so it's likely redundant (probably part of the old 3.x single windows code).

That seems to be the case indeed. However, I'm not clear on what the right API would be because we now have the concepts of windows and threads somewhat conflated. I'm wondering if the right call would be the following:

Remove DisplayServer::make/release_rendering_thread() .

Remove make/release_current() from the various GL context classes.

Modify window_make_current() so that taking INVALID_WINDOW_ID means releasing the context.

(When I say remove, it may be needed to be understood as deprecate until we can remove.)

In the next push, I'm replacing the commit where I added those by one that simplifies the APIs (without breaking compat, I think). DisplayServer as well as the various GL managers can therefore now have a window-specific make current, a global release, and call it a day.

For the reference:

Multithreaded compatibility renderer was crashing (in the GLES3::Utilities::get_video_adapter_api_version()) before this PR. With it, it's no longer crashing, but still displaying only black screen. Multithreaded Vulkan renderer also not rendering anything (both with and without this PR), might be macOS specific issue, I'll check it later.

In both cases, it deadlocks inside blit_render_targets_to_screen.

Main thread: 🔻 Waiting for CommandQueueMT mutex. std::__1::recursive_mutex::lock() (@std::__1::recursive_mutex::lock():7) CommandQueueMT::Command1<RendererCanvasCull, void (RendererCanvasCull::*)(RID), RID>* CommandQueueMT::allocate_and_lock<CommandQueueMT::Command1<RendererCanvasCull, void (RendererCanvasCull::*)(RID), RID>>() (core/templates/command_queue_mt.h:362) void CommandQueueMT::push<RendererCanvasCull, void (RendererCanvasCull::*)(RID), RID>(RendererCanvasCull*, void (RendererCanvasCull::*)(RID), RID) (core/templates/command_queue_mt.h:407) RenderingServerDefault::canvas_item_clear(RID) (servers/rendering/rendering_server_default.h:916) CanvasItem::_redraw_callback() (scene/main/canvas_item.cpp:137) CallQueue::_call_function(Callable const&, Variant const*, int, bool) (core/object/message_queue.cpp:221) CallQueue::flush() (core/object/message_queue.cpp:326) SceneTree::process(double) (scene/main/scene_tree.cpp:537) 🔻 Inside main evnet loop. Main::iteration() (main/main.cpp:4013) OS_MacOS::run() (platform/macos/os_macos.mm:778) main (platform/macos/godot_main_macos.mm:84) start (@start:593)

Rendering thread: 🔻 Since nothing seems directy holding it in the Engine code, I assue it's waiting for the next main event loop iteration, but main thread is locked. _dlock_wait (@_dlock_wait:17) _dispatch_thread_event_wait_slow (@_dispatch_thread_event_wait_slow:17) -[CAMetalLayer(MoltenVK) screenMVK] (@-[CAMetalLayer(MoltenVK) screenMVK]:31) MVKSwapchain::initSurfaceImages(VkSwapchainCreateInfoKHR const*, unsigned int) (@MVKSwapchain::initSurfaceImages(VkSwapchainCreateInfoKHR const*, unsigned int):153) MVKSwapchain::MVKSwapchain(MVKDevice*, VkSwapchainCreateInfoKHR const*) (@MVKSwapchain::MVKSwapchain(MVKDevice*, VkSwapchainCreateInfoKHR const*):183) MVKDevice::createSwapchain(VkSwapchainCreateInfoKHR const*, VkAllocationCallbacks const*) (@MVKDevice::createSwapchain(VkSwapchainCreateInfoKHR const*, VkAllocationCallbacks const*):15) vkCreateSwapchainKHR (@vkCreateSwapchainKHR:31) RenderingDeviceDriverVulkan::swap_chain_resize(RenderingDeviceDriver::CommandQueueID, RenderingDeviceDriver::SwapChainID, unsigned int) (drivers/vulkan/rendering_device_driver_vulkan.cpp:2663) RenderingDevice::screen_prepare_for_drawing(int) (servers/rendering/rendering_device.cpp:3190) RendererCompositorRD::blit_render_targets_to_screen(int, BlitToScreen const*, int) (servers/rendering/renderer_rd/renderer_compositor_rd.cpp:37) RendererViewport::draw_viewports(bool) (servers/rendering/renderer_viewport.cpp:861) RenderingServerDefault::_draw(bool, double) (servers/rendering/rendering_server_default.cpp:89) 🔻 This lock CommandQueueMT mutex. CommandQueueMT::_flush() (core/templates/command_queue_mt.h:377) CommandQueueMT::flush_all() (core/templates/command_queue_mt.h:423) RenderingServerDefault::_thread_loop() (servers/rendering/rendering_server_default.cpp:347) ...

Bisected it to #87340 - Finish splitting functionality of the RenderingDevice backends into RenderingDeviceDriver (it's big, so probably won't be easy to detect what exactly is causing the issue).

@bruvzg I realized a missing unlock in my changes to MacOS. Would you please re-test? If it deadlocks, please test with #90470 as well.

Neither is working.

Since it was originally broken by #87340, which is not changing CommandQueueMT in any way, it's probably not directly related, and the issue is in RenderingDevice.

In some cases (resizing window for example), it still lockups with the aforementioned patch (and lockups only involve RenderingDevice not CommandQueueMT).

Alright. Thanks for your help! I'll delve.

The more I look into this, the more issues with multi-threading rendering (physics seem OK) I'm seeing. My suggestion (CC @akien-mga) is that we merge this as it is, given it seems it doesn't break single-threading, and I'll make a subsequent PR fixing MT rendering.

RandomShaper · 2024-04-09T15:31:27Z

I'm about to push with a change: yielding functions in WorkerThreadPool are not exposed anymore. I realized that, unless there's some sort of prevention mechanism, users can easily shoot themselves in the foot because a dameon-like task could starve legit tasks. The usage in this PR is safe because those dameons are launched when there are free slots for sure so each of them is the bottommost in the call stack. I've started implementing a mechanism to prevent that, but it started to get messy so I'll put it in a draft PR together with re-exposing the yielding functions.

- Adapt GL make/release API to the current architecture. - Fix DisplayServer being locked while dispatching input (prevent deadlocks).

* Servers now use WorkerThreadPool for background computation. * This helps keep the number of threads used fixed at all times. * It also ensures everything works on HTML5 with threads. * And makes it easier to support disabling threads for also HTML5. CommandQueueMT now syncs with the servers via the WorkerThreadPool yielding mechanism, which makes its classic main sync semaphore superfluous. Also, some warnings about calls that kill performance when using threaded rendering are removed because there's a mechanism that warns about that in a more general fashion. Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>

AThousandShips

Can't vouch for the thread pool code but the rest looks clean to me

akien-mga · 2024-04-12T09:12:30Z

Tested against the GDQuest TPS demo with multithreaded rendering and physics. Rendering seems to work ok, but physics gave me a crash. It worked fine initially but after breaking a few crates things got laggy and eventually segfaulted.

ERROR: FATAL: Index p_index = 0 is out of bounds (shapes.size() = 0).
   at: get_shape_transform (./servers/physics_3d/godot_collision_object_3d.h:128)
ERROR: Caller thread can't call this function in this node (/root). Use call_deferred() or call_thread_group() instead.
   at: propagate_notification (./scene/main/node.cpp:2420)

================================================================
handle_crash: Program crashed with signal 4
Engine version: Godot Engine v4.3.dev.custom_build (67b5e1779070c61ac0595301ec1c5080e1765b05)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /lib64/libc.so.6(+0x3e9a0) [0x7fa3307889a0] (??:0)
[2] GodotCollisionObject3D::get_shape_transform(int) const (/home/akien/Godot/godot/./servers/physics_3d/godot_collision_object_3d.h:128 (discriminator 3))
[3] GodotArea2Pair3D::setup(float) (/home/akien/Godot/godot/./servers/physics_3d/godot_area_pair_3d.cpp:126 (discriminator 3))
[4] GodotStep3D::_setup_constraint(unsigned int, void*) (/home/akien/Godot/godot/./servers/physics_3d/godot_step_3d.cpp:118)
[5] WorkerThreadPool::GroupUserData<GodotStep3D, void (GodotStep3D::*)(unsigned int, void*), decltype(nullptr)>::callback_indexed(unsigned int) (/home/akien/Godot/godot/./core/object/worker_thread_pool.h:185)
[6] WorkerThreadPool::_process_task(WorkerThreadPool::Task*) (/home/akien/Godot/godot/./core/object/worker_thread_pool.cpp:92)
[7] WorkerThreadPool::_thread_function(void*) (/home/akien/Godot/godot/./core/object/worker_thread_pool.cpp:195)
[8] Thread::callback(unsigned long, Thread::Settings const&, void (*)(void*), void*) (/home/akien/Godot/godot/./core/os/thread.cpp:66)
[9] void std::__invoke_impl<void, void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*>(std::__invoke_other, void (*&&)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long&&, Thread::Settings&&, void (*&&)(void*), void*&&) (/usr/include/c++/13/bits/invoke.h:61)
[10] std::__invoke_result<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*>::type std::__invoke<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*>(void (*&&)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long&&, Thread::Settings&&, void (*&&)(void*), void*&&) (/usr/include/c++/13/bits/invoke.h:97)
[11] void std::thread::_Invoker<std::tuple<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*> >::_M_invoke<0ul, 1ul, 2ul, 3ul, 4ul>(std::_Index_tuple<0ul, 1ul, 2ul, 3ul, 4ul>) (/usr/include/c++/13/bits/std_thread.h:292)
[12] std::thread::_Invoker<std::tuple<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*> >::operator()() (/usr/include/c++/13/bits/std_thread.h:299)
[13] std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*> > >::_M_run() (/usr/include/c++/13/bits/std_thread.h:244)
[14] /home/akien/Godot/godot/bin/godot.linuxbsd.editor.dev.x86_64() [0x7f882f3] (thread.o:?)
[15] /lib64/libc.so.6(+0x8e897) [0x7fa3307d8897] (??:0)
[16] /lib64/libc.so.6(+0x115a5c) [0x7fa33085fa5c] (??:0)
-- END OF BACKTRACE --
================================================================

RandomShaper · 2024-04-12T15:57:23Z

Trying to reproduce, to no avail, so far. Maybe it's one of those obscure and unpredictable platform-specific things. I'm wondering, could you try reproducing on master (I mean, with threaded physics)?

akien-mga · 2024-04-15T08:10:21Z

Trying to reproduce, to no avail, so far. Maybe it's one of those obscure and unpredictable platform-specific things. I'm wondering, could you try reproducing on master (I mean, with threaded physics)?

It indeed seems to be reproducible on master with multi-threaded physics, with the same backtrace. So it's not caused by this PR. I'll open a separate bug report.

So I guess we can merge this PR now and get wider testing.

akien-mga · 2024-04-15T08:12:06Z

Thanks! Great work 🎉

RandomShaper added enhancement topic:core topic:rendering topic:physics labels Apr 5, 2024

RandomShaper added this to the 4.3 milestone Apr 5, 2024

RandomShaper requested review from a team as code owners April 5, 2024 15:49

RandomShaper force-pushed the wtp_servers branch from 0109e33 to 9a81679 Compare April 5, 2024 15:51

AThousandShips reviewed Apr 5, 2024

View reviewed changes

bruvzg reviewed Apr 5, 2024

View reviewed changes

RandomShaper force-pushed the wtp_servers branch from fb6a641 to 13ed92c Compare April 8, 2024 08:16

RandomShaper mentioned this pull request Apr 8, 2024

Add missing platform implementations of release/make_rendering_thread() #90256

Closed

RandomShaper force-pushed the wtp_servers branch 3 times, most recently from 4352c87 to 8e7a566 Compare April 8, 2024 15:41

clayjohn mentioned this pull request Apr 9, 2024

Crash when running project with error ERROR: Condition "flushing_cmd_queue != nullptr" is true. #90363

Closed

RandomShaper force-pushed the wtp_servers branch from 8e7a566 to 8ca91c5 Compare April 9, 2024 10:30

RandomShaper requested review from a team as code owners April 9, 2024 10:30

RandomShaper force-pushed the wtp_servers branch from 8ca91c5 to 2dd1585 Compare April 9, 2024 11:36

RandomShaper requested a review from a team as a code owner April 9, 2024 11:36

RandomShaper force-pushed the wtp_servers branch from 2dd1585 to 3884e7a Compare April 9, 2024 11:45

akien-mga mentioned this pull request Apr 9, 2024

Use WorkerThreadPool for Server threads #77004

Closed

RandomShaper mentioned this pull request Apr 9, 2024

WorkerThreadPool: Expose yield functions, add anti-starvation mechanism #90437

Draft

RandomShaper force-pushed the wtp_servers branch from 3884e7a to ebe731d Compare April 9, 2024 15:48

akien-mga requested a review from reduz April 10, 2024 08:43

RandomShaper force-pushed the wtp_servers branch 2 times, most recently from cf43f93 to db15eab Compare April 10, 2024 16:06

akien-mga mentioned this pull request Apr 10, 2024

OpenXR: Change timing of xrWaitFrame and fix XR multithreading issues #89734

Merged

5 tasks

RandomShaper and others added 4 commits April 10, 2024 18:47

WorkerThreadPool: Fix data race

71facba

WorkerThreadPool: Support daemon-like tasks (via yield semantics)

1b104ff

Polish interaction between windowing, input and rendering

c28f590

- Adapt GL make/release API to the current architecture. - Fix DisplayServer being locked while dispatching input (prevent deadlocks).

RandomShaper force-pushed the wtp_servers branch from db15eab to 65686de Compare April 10, 2024 16:47

AThousandShips approved these changes Apr 10, 2024

View reviewed changes

RandomShaper mentioned this pull request Apr 11, 2024

Rationalize single-safe server threading #90520

Draft

RandomShaper force-pushed the wtp_servers branch from 4d89aa6 to 65686de Compare April 12, 2024 08:55

bruvzg mentioned this pull request Apr 12, 2024

Finish splitting functionality of the RenderingDevice backends into RenderingDeviceDriver. #87340

Merged

14 tasks

akien-mga merged commit c951421 into godotengine:master Apr 15, 2024
31 of 32 checks passed

RandomShaper deleted the wtp_servers branch April 15, 2024 08:12

This was referenced Apr 15, 2024

GodotPhysics 3D crash with multi-threaded physics #90689

Closed

Crash when physics run on separate thread #70491

Open

RandomShaper mentioned this pull request Apr 19, 2024

Apply additional fixes to servers' threading #90913

Merged

akien-mga removed topic:rendering topic:physics labels May 1, 2024

Summersay415 mentioned this pull request May 2, 2024

Crash when Thread Model is set to Multi-Threaded #61650

Closed

AThousandShips mentioned this pull request May 13, 2024

Fix DisplayServer overrides #91900

Merged

clayjohn mentioned this pull request Jun 19, 2024

Fix typo in preprocessor symbol #93367

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use WorkerThreadPool for Server threads (enhanced) #90268

Use WorkerThreadPool for Server threads (enhanced) #90268

RandomShaper commented Apr 5, 2024 •

edited

Loading

bruvzg Apr 5, 2024

RandomShaper Apr 8, 2024 •

edited

Loading

RandomShaper Apr 9, 2024

bruvzg Apr 10, 2024

bruvzg Apr 10, 2024 •

edited

Loading

bruvzg Apr 10, 2024

RandomShaper Apr 10, 2024

bruvzg Apr 10, 2024

RandomShaper Apr 10, 2024

RandomShaper Apr 12, 2024

RandomShaper commented Apr 9, 2024

AThousandShips left a comment

akien-mga commented Apr 12, 2024

RandomShaper commented Apr 12, 2024 •

edited

Loading

akien-mga commented Apr 15, 2024

akien-mga commented Apr 15, 2024

Use WorkerThreadPool for Server threads (enhanced) #90268

Use WorkerThreadPool for Server threads (enhanced) #90268

Conversation

RandomShaper commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

RandomShaper Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bruvzg Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RandomShaper commented Apr 9, 2024

AThousandShips left a comment

Choose a reason for hiding this comment

akien-mga commented Apr 12, 2024

RandomShaper commented Apr 12, 2024 • edited Loading

akien-mga commented Apr 15, 2024

akien-mga commented Apr 15, 2024

RandomShaper commented Apr 5, 2024 •

edited

Loading

RandomShaper Apr 8, 2024 •

edited

Loading

bruvzg Apr 10, 2024 •

edited

Loading

RandomShaper commented Apr 12, 2024 •

edited

Loading