Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WorkerThreadPool for Server threads (enhanced) #90268

Merged
merged 4 commits into from
Apr 15, 2024

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented Apr 5, 2024

This PR is the version of #77004 where my collaboration on top of @reduz's work has been added.

This is kept from the description of the original PR:

  • Servers now use WorkerThreadPool for background computation.
  • This helps keep the number of threads used fixed at all times.
  • It also ensures everything works on HTML5 with threads.
  • And makes it easier to support disabling threads for also HTML5.

This is new:

CommandQueueMT now syncs with the servers via the WorkerThreadPool yielding mechanism, which makes its classic main sync semaphore superfluous.
Also, some warnings about calls that kill performance when using threaded rendering are removed because there's a mechanism that warns about that in a more general fashion.

This is no longer true:

The only downside of this is that GLES3 threads likely no longer work because every time the render thread is called, the thread context is different. This needs to be researched how to fix. Maybe if GLES3 is used, threaded rendering server should be disabled for the time being?

Supersedes #77004.

2024-04-09: A way of easily testing this being sure all the physics and rendering are threaded is to apply this patch:
all_servers_mt.txt

core/config/engine.cpp Show resolved Hide resolved
doc/classes/WorkerThreadPool.xml Outdated Show resolved Hide resolved
tests/core/threads/test_worker_thread_pool.h Outdated Show resolved Hide resolved
tests/core/threads/test_worker_thread_pool.h Show resolved Hide resolved
tests/core/threads/test_worker_thread_pool.h Show resolved Hide resolved
core/object/worker_thread_pool.cpp Show resolved Hide resolved
core/object/worker_thread_pool.cpp Show resolved Hide resolved
core/object/worker_thread_pool.cpp Show resolved Hide resolved
core/object/worker_thread_pool.cpp Show resolved Hide resolved
core/object/worker_thread_pool.cpp Show resolved Hide resolved
Comment on lines 3059 to 3077
#if defined(GLES3_ENABLED)
if (gl_manager_angle) {
gl_manager_angle->release_current();
}
if (gl_manager_legacy) {
gl_manager_legacy->release_current();
}
#endif
}

void DisplayServerMacOS::make_rendering_thread() {
#if defined(GLES3_ENABLED)
if (gl_manager_angle) {
gl_manager_angle->make_current();
}
if (gl_manager_legacy) {
gl_manager_legacy->make_current();
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be called from the thread when it is started and set the current GL context (to the context of the active window), but GL context is already set from _draw_viewport() at the start of each window drawing, so it's likely redundant (probably part of the old 3.x single windows code).

Copy link
Member Author

@RandomShaper RandomShaper Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be the case indeed. However, I'm not clear on what the right API would be because we now have the concepts of windows and threads somewhat conflated. I'm wondering if the right call would be the following:

  • Remove DisplayServer::make/release_rendering_thread() .
  • Remove make/release_current() from the various GL context classes.
  • Modify window_make_current() so that taking INVALID_WINDOW_ID means releasing the context.

(When I say remove, it may be needed to be understood as deprecate until we can remove.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next push, I'm replacing the commit where I added those by one that simplifies the APIs (without breaking compat, I think). DisplayServer as well as the various GL managers can therefore now have a window-specific make current, a global release, and call it a day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reference:

Multithreaded compatibility renderer was crashing (in the GLES3::Utilities::get_video_adapter_api_version()) before this PR. With it, it's no longer crashing, but still displaying only black screen. Multithreaded Vulkan renderer also not rendering anything (both with and without this PR), might be macOS specific issue, I'll check it later.

Copy link
Member

@bruvzg bruvzg Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both cases, it deadlocks inside blit_render_targets_to_screen.

Main thread:

   🔻 Waiting for CommandQueueMT mutex.
std::__1::recursive_mutex::lock() (@std::__1::recursive_mutex::lock():7)
CommandQueueMT::Command1<RendererCanvasCull, void (RendererCanvasCull::*)(RID), RID>* CommandQueueMT::allocate_and_lock<CommandQueueMT::Command1<RendererCanvasCull, void (RendererCanvasCull::*)(RID), RID>>() (core/templates/command_queue_mt.h:362)
void CommandQueueMT::push<RendererCanvasCull, void (RendererCanvasCull::*)(RID), RID>(RendererCanvasCull*, void (RendererCanvasCull::*)(RID), RID) (core/templates/command_queue_mt.h:407)
RenderingServerDefault::canvas_item_clear(RID) (servers/rendering/rendering_server_default.h:916)
CanvasItem::_redraw_callback() (scene/main/canvas_item.cpp:137)
CallQueue::_call_function(Callable const&, Variant const*, int, bool) (core/object/message_queue.cpp:221)
CallQueue::flush() (core/object/message_queue.cpp:326)
SceneTree::process(double) (scene/main/scene_tree.cpp:537)
    🔻 Inside main evnet loop.
Main::iteration() (main/main.cpp:4013)
OS_MacOS::run() (platform/macos/os_macos.mm:778)
main (platform/macos/godot_main_macos.mm:84)
start (@start:593)
Rendering thread:

    🔻 Since nothing seems directy holding it in the Engine code, I assue it's waiting for the next main event loop iteration, but main thread is locked.
_dlock_wait (@_dlock_wait:17)
_dispatch_thread_event_wait_slow (@_dispatch_thread_event_wait_slow:17)
-[CAMetalLayer(MoltenVK) screenMVK] (@-[CAMetalLayer(MoltenVK) screenMVK]:31)
MVKSwapchain::initSurfaceImages(VkSwapchainCreateInfoKHR const*, unsigned int) (@MVKSwapchain::initSurfaceImages(VkSwapchainCreateInfoKHR const*, unsigned int):153)
MVKSwapchain::MVKSwapchain(MVKDevice*, VkSwapchainCreateInfoKHR const*) (@MVKSwapchain::MVKSwapchain(MVKDevice*, VkSwapchainCreateInfoKHR const*):183)
MVKDevice::createSwapchain(VkSwapchainCreateInfoKHR const*, VkAllocationCallbacks const*) (@MVKDevice::createSwapchain(VkSwapchainCreateInfoKHR const*, VkAllocationCallbacks const*):15)
vkCreateSwapchainKHR (@vkCreateSwapchainKHR:31)
RenderingDeviceDriverVulkan::swap_chain_resize(RenderingDeviceDriver::CommandQueueID, RenderingDeviceDriver::SwapChainID, unsigned int) (drivers/vulkan/rendering_device_driver_vulkan.cpp:2663)
RenderingDevice::screen_prepare_for_drawing(int) (servers/rendering/rendering_device.cpp:3190)
RendererCompositorRD::blit_render_targets_to_screen(int, BlitToScreen const*, int) (servers/rendering/renderer_rd/renderer_compositor_rd.cpp:37)
RendererViewport::draw_viewports(bool) (servers/rendering/renderer_viewport.cpp:861)
RenderingServerDefault::_draw(bool, double) (servers/rendering/rendering_server_default.cpp:89)
    🔻 This lock CommandQueueMT mutex.
CommandQueueMT::_flush() (core/templates/command_queue_mt.h:377)
CommandQueueMT::flush_all() (core/templates/command_queue_mt.h:423)
RenderingServerDefault::_thread_loop() (servers/rendering/rendering_server_default.cpp:347)
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bisected it to #87340 - Finish splitting functionality of the RenderingDevice backends into RenderingDeviceDriver (it's big, so probably won't be easy to detect what exactly is causing the issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bruvzg I realized a missing unlock in my changes to MacOS. Would you please re-test? If it deadlocks, please test with #90470 as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither is working.

Since it was originally broken by #87340, which is not changing CommandQueueMT in any way, it's probably not directly related, and the issue is in RenderingDevice.

In some cases (resizing window for example), it still lockups with the aforementioned patch (and lockups only involve RenderingDevice not CommandQueueMT).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Thanks for your help! I'll delve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look into this, the more issues with multi-threading rendering (physics seem OK) I'm seeing. My suggestion (CC @akien-mga) is that we merge this as it is, given it seems it doesn't break single-threading, and I'll make a subsequent PR fixing MT rendering.

@RandomShaper
Copy link
Member Author

I'm about to push with a change: yielding functions in WorkerThreadPool are not exposed anymore. I realized that, unless there's some sort of prevention mechanism, users can easily shoot themselves in the foot because a dameon-like task could starve legit tasks. The usage in this PR is safe because those dameons are launched when there are free slots for sure so each of them is the bottommost in the call stack. I've started implementing a mechanism to prevent that, but it started to get messy so I'll put it in a draft PR together with re-exposing the yielding functions.

RandomShaper and others added 4 commits April 10, 2024 18:47
- Adapt GL make/release API to the current architecture.
- Fix DisplayServer being locked while dispatching input (prevent deadlocks).
* Servers now use WorkerThreadPool for background computation.
* This helps keep the number of threads used fixed at all times.
* It also ensures everything works on HTML5 with threads.
* And makes it easier to support disabling threads for also HTML5.

CommandQueueMT now syncs with the servers via the WorkerThreadPool
yielding mechanism, which makes its classic main sync semaphore
superfluous.

Also, some warnings about calls that kill performance when using
threaded rendering are removed because there's a mechanism that
warns about that in a more general fashion.

Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>
Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't vouch for the thread pool code but the rest looks clean to me

@akien-mga
Copy link
Member

Tested against the GDQuest TPS demo with multithreaded rendering and physics. Rendering seems to work ok, but physics gave me a crash. It worked fine initially but after breaking a few crates things got laggy and eventually segfaulted.

ERROR: FATAL: Index p_index = 0 is out of bounds (shapes.size() = 0).
   at: get_shape_transform (./servers/physics_3d/godot_collision_object_3d.h:128)
ERROR: Caller thread can't call this function in this node (/root). Use call_deferred() or call_thread_group() instead.
   at: propagate_notification (./scene/main/node.cpp:2420)

================================================================
handle_crash: Program crashed with signal 4
Engine version: Godot Engine v4.3.dev.custom_build (67b5e1779070c61ac0595301ec1c5080e1765b05)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /lib64/libc.so.6(+0x3e9a0) [0x7fa3307889a0] (??:0)
[2] GodotCollisionObject3D::get_shape_transform(int) const (/home/akien/Godot/godot/./servers/physics_3d/godot_collision_object_3d.h:128 (discriminator 3))
[3] GodotArea2Pair3D::setup(float) (/home/akien/Godot/godot/./servers/physics_3d/godot_area_pair_3d.cpp:126 (discriminator 3))
[4] GodotStep3D::_setup_constraint(unsigned int, void*) (/home/akien/Godot/godot/./servers/physics_3d/godot_step_3d.cpp:118)
[5] WorkerThreadPool::GroupUserData<GodotStep3D, void (GodotStep3D::*)(unsigned int, void*), decltype(nullptr)>::callback_indexed(unsigned int) (/home/akien/Godot/godot/./core/object/worker_thread_pool.h:185)
[6] WorkerThreadPool::_process_task(WorkerThreadPool::Task*) (/home/akien/Godot/godot/./core/object/worker_thread_pool.cpp:92)
[7] WorkerThreadPool::_thread_function(void*) (/home/akien/Godot/godot/./core/object/worker_thread_pool.cpp:195)
[8] Thread::callback(unsigned long, Thread::Settings const&, void (*)(void*), void*) (/home/akien/Godot/godot/./core/os/thread.cpp:66)
[9] void std::__invoke_impl<void, void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*>(std::__invoke_other, void (*&&)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long&&, Thread::Settings&&, void (*&&)(void*), void*&&) (/usr/include/c++/13/bits/invoke.h:61)
[10] std::__invoke_result<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*>::type std::__invoke<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*>(void (*&&)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long&&, Thread::Settings&&, void (*&&)(void*), void*&&) (/usr/include/c++/13/bits/invoke.h:97)
[11] void std::thread::_Invoker<std::tuple<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*> >::_M_invoke<0ul, 1ul, 2ul, 3ul, 4ul>(std::_Index_tuple<0ul, 1ul, 2ul, 3ul, 4ul>) (/usr/include/c++/13/bits/std_thread.h:292)
[12] std::thread::_Invoker<std::tuple<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*> >::operator()() (/usr/include/c++/13/bits/std_thread.h:299)
[13] std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(unsigned long, Thread::Settings const&, void (*)(void*), void*), unsigned long, Thread::Settings, void (*)(void*), void*> > >::_M_run() (/usr/include/c++/13/bits/std_thread.h:244)
[14] /home/akien/Godot/godot/bin/godot.linuxbsd.editor.dev.x86_64() [0x7f882f3] (thread.o:?)
[15] /lib64/libc.so.6(+0x8e897) [0x7fa3307d8897] (??:0)
[16] /lib64/libc.so.6(+0x115a5c) [0x7fa33085fa5c] (??:0)
-- END OF BACKTRACE --
================================================================

@RandomShaper
Copy link
Member Author

RandomShaper commented Apr 12, 2024

Trying to reproduce, to no avail, so far. Maybe it's one of those obscure and unpredictable platform-specific things. I'm wondering, could you try reproducing on master (I mean, with threaded physics)?

@akien-mga
Copy link
Member

Trying to reproduce, to no avail, so far. Maybe it's one of those obscure and unpredictable platform-specific things. I'm wondering, could you try reproducing on master (I mean, with threaded physics)?

It indeed seems to be reproducible on master with multi-threaded physics, with the same backtrace. So it's not caused by this PR. I'll open a separate bug report.


So I guess we can merge this PR now and get wider testing.

@akien-mga akien-mga merged commit c951421 into godotengine:master Apr 15, 2024
31 of 32 checks passed
@akien-mga
Copy link
Member

Thanks! Great work 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants