Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve thread safety of ScriptServer #82578

Closed

Conversation

piiertho
Copy link
Contributor

@piiertho piiertho commented Sep 30, 2023

When working on godot-jvm multi-threading, we implemented ScriptLanguage::thread_enter and ScriptLanguage::thread_exit to attach and detach JVM according to thread lifecycle.
Those implementations were not called. So after looking to ScriptServer code I found those methods:

void ScriptServer::finish_languages() {
	for (int i = 0; i < _language_count; i++) {
		_languages[i]->finish();
	}
	global_classes_clear();
	languages_finished.set();
}

void ScriptServer::thread_enter() {
	if (!languages_finished.is_set()) {
		return;
	}
	for (int i = 0; i < _language_count; i++) {
		_languages[i]->thread_enter();
	}
}

void ScriptServer::thread_exit() {
	if (!languages_finished.is_set()) {
		return;
	}
	for (int i = 0; i < _language_count; i++) {
		_languages[i]->thread_exit();
	}
}

The problem here is that languages_finished is set only when killing languages (C# is using ScriptLanguage::finish to kill language too).

So the problem here is the condition inversion on

	if (!languages_finished.is_set()) {
		return;
	}

This PR fixes this by inverting the condition.
It also adds a pre-processor on this if, as when engine runs language servers are up most of the time.

@piiertho piiertho requested a review from a team as a code owner September 30, 2023 08:39
@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from 882bc36 to 4f695d6 Compare September 30, 2023 08:53
@piiertho piiertho changed the title fix(ScriptServer): Call ScriptLanguage::thread_enter when languages are not finished Call ScriptLanguage::thread_enter and ScriptLanguage::thread_exit when languages are not finished Sep 30, 2023
@Chaosus Chaosus added this to the 4.2 milestone Sep 30, 2023
@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from 4f695d6 to 5f4344b Compare October 1, 2023 08:44
@piiertho
Copy link
Contributor Author

piiertho commented Oct 1, 2023

I modified this PR to match the expected behaviour of #76586
When discussing on contributors chat, @AThousandShips mentionned that AFAIK languages can be added and removed at runtime, not during init or closing, and so that ScriptServer should be thread safe.
Is adding and removing languages at runtime something we want ?
If so we would need to add lock on all methods using _languages property.

@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from 5f4344b to d90e320 Compare October 1, 2023 08:51
@RandomShaper
Copy link
Member

Is adding and removing languages at runtime something we want ?

Yes, because a GDExtension providing a language can be loaded at runtime.

Aside, please have a look at #74501, plus all the related PRs linked there, for some insightful context on the situation of all this.

@piiertho
Copy link
Contributor Author

piiertho commented Oct 5, 2023

So I guess main question now is: Should it be done in #49362 ?

@RandomShaper
Copy link
Member

I think I need to go back to the basics to be sure I can assess this well. There are some issues that will surely require a more comprehensive fix (mutex-protecting the languages array indeed, calling init() on dynamically loaded languages, etc.). However, just to fix the very specific issue here, I'd say that the condition inversion would do the trick. So, what I quite don't get is the reasoning behing the current changes.

@AThousandShips
Copy link
Member

I'd say that the condition inversion would do the trick

If you mean the condition in the current codebase that won't work, it causes a race condition to reappear

@RandomShaper
Copy link
Member

By "the very specific issue," I mean the fact that the thread enter/exit callbacks are not being called at all. But I see that may not be enough a fix. Maybe it's not a bad idea to go full out to do fix the threading issues and protect the languages array, count and the flag (that can become a plain boolean again) with a mutex in this very PR or a subsequent one. It's up to @piiertho.

If not, at least I'd like some help to understand why the current changes (beyond fixing the condition) prevent the race.

@AThousandShips
Copy link
Member

The current changes prevents the entered/exited methods being called while initializing the language, which appears to have been the intent of the PR introducing this condition

Which I think is an incomplete solution

@RandomShaper
Copy link
Member

Oh, I see. Then... Well, I'd suggest embracing the mutexed approach and have it fixed for good.

@AThousandShips
Copy link
Member

I tested out a build with some mutexes a week or so ago but don't really have the time to fix with it now so I'll leave it for the OP to work on it if they want

@piiertho
Copy link
Contributor Author

piiertho commented Oct 10, 2023

Hello, sorry for delay.
@RandomShaper I'd also like to know if this mutex is useful in other cases than hot reloading extensions ?
I'm asking this because if there is no other case, this mutex could be avoided if not TOOL.

@piiertho
Copy link
Contributor Author

Just added some mutex.
I kept new flag to know if languages are initialized since we need main loop rework to not have thread_enter called before init.

core/object/script_language.cpp Outdated Show resolved Hide resolved
core/object/script_language.cpp Outdated Show resolved Hide resolved
core/object/script_language.cpp Outdated Show resolved Hide resolved
@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from df998b6 to f12358b Compare October 12, 2023 11:18
@piiertho
Copy link
Contributor Author

Forgot clang format, let me fix that

@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from f12358b to 73c291c Compare October 12, 2023 11:26
@AThousandShips
Copy link
Member

Please use clang-format to fix your style

core/object/script_language.h Outdated Show resolved Hide resolved
core/object/script_language.cpp Outdated Show resolved Hide resolved
@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from 73c291c to faaf49e Compare October 12, 2023 11:37
@piiertho
Copy link
Contributor Author

Please use clang-format to fix your style

funny thing is that clang format does not find any problem, I had to fix them manually (replacing spaces by tabs).

@AThousandShips
Copy link
Member

Then that's because your clang-format isn't correctly set up, as you still have errors in your format

@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from faaf49e to 88f414d Compare October 12, 2023 11:41
@piiertho
Copy link
Contributor Author

piiertho commented Oct 12, 2023

Then that's because your clang-format isn't correctly set up, as you still have errors in your format

I'm not suppos to just run sh .\misc\scripts\clang_format.sh to find troubles ?

core/object/script_language.cpp Outdated Show resolved Hide resolved
core/object/script_language.cpp Outdated Show resolved Hide resolved
core/object/script_language.cpp Outdated Show resolved Hide resolved
core/object/script_language.cpp Outdated Show resolved Hide resolved
@AThousandShips
Copy link
Member

No, that's for automation, see here

@AThousandShips AThousandShips changed the title Call ScriptLanguage::thread_enter and ScriptLanguage::thread_exit when languages are not finished Improve thread safety of ScriptServer Oct 12, 2023
@AThousandShips
Copy link
Member

What is the reason the languages_initialized is needed? It was added to prevent thread unsafe behaviour when initializing languages at the same time as adding threads, but if the iteration and modification of languages is made thread safe I don't see that it is needed any longer

Further as confirmed by RandomShaper languages can still be added after initialization, so the original approach to fix this wasn't necessarily sound

We can stay safe and keep it, but I'd suggest removing it entirely and rely on the mutex alone

@piiertho
Copy link
Contributor Author

We can stay safe and keep it, but I'd suggest removin

It is because thread_enter is still called before languages are initialized (I think because WorkerThreadPools are started before ScriptServer) as Mutex does not ensure call order, only that we cannot call the same time.
I have the problem with godot kotlin: https://github.com/utopia-rise/godot-kotlin-jvm/blob/bd2c3fd6cb50892f5c6cfb50210dae32c7d12c71/src/kotlin_language.cpp#L289
If I remove the flag it crash because JVM is not initialized.

@piiertho
Copy link
Contributor Author

Any idea on CI failure ?

@AThousandShips
Copy link
Member

It's unrelated, dependency error, being investigated

@AThousandShips
Copy link
Member

Try rebasing your branch, it's quite behind, and some attempts to fix it has been done

@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from 88f414d to 37651d0 Compare October 12, 2023 17:10
@piiertho
Copy link
Contributor Author

Try rebasing your branch, it's quite behind, and some attempts to fix it has been done

Seems it does not solve the issue.

@AThousandShips
Copy link
Member

Let's see when we re-run it after all the other CI has passed

@piiertho piiertho force-pushed the bugfix/call-languages-thread-enter branch from 37651d0 to d636701 Compare October 13, 2023 06:22
@RandomShaper
Copy link
Member

This is getting better! I think we can replace the languages_initialized hack by another, more robust one, until the initialization order is maybe fixed for good (mabye in #49632). I have made a quick patch that may help.
p.txt

@piiertho
Copy link
Contributor Author

This is getting better! I think we can replace the languages_initialized hack by another, more robust one, until the initialization order is maybe fixed for good (mabye in #49632). I have made a quick patch that may help.

p.txt

Not sure but I think this patch is not linked to this PR. Or I really don't get why modifying map iterator to add map as a friend class would help.

@RandomShaper
Copy link
Member

I attached the wrong patch. My apologies. This is the right one:
d.txt

@piiertho
Copy link
Contributor Author

I attached the wrong patch. My apologies. This is the right one: d.txt

Sorry for late reply, I've been busy past weeks.
That's not that easy:

  • ScriptServer is not a singleton. This class is here only to namespace static methods.
  • Even is script server was a singleton it would not be that easy, we would still need to check ScriptServer::init_languages has been called. Also, I feel transforming ScriptServer to a singleton would be too much design change for a "hack fix".

I agree this would be more robust to do the hack in WorkerThreadPool (and it could be done really clean with a thread_local variable). But as it does not seem as easy at it appears, should we spend more time on a hack that would be removed when initialisation order is fixed ?

@RandomShaper
Copy link
Member

Taking the patch as a proof of concept, do you think you could apply it to your PR? The check for the theoretical singleton would be replaced by some equivalent check in the real codebase. If you can do that, I'd just have another request, and this PR would be perfect! Instead of having two booleans, use an enum:

enum LanguagesState {
    LANGUAGES_STATE_UNINITIALIZED,
    LANGUAGES_STATE_INITIALIZED,
    LANGUAGES_STATE_TERMINATED,
};

There would be a getter for the status instead of one for any of the booleans.

@piiertho
Copy link
Contributor Author

piiertho commented Nov 4, 2023

Taking the patch as a proof of concept, do you think you could apply it to your PR? The check for the theoretical singleton would be replaced by some equivalent check in the real codebase. If you can do that, I'd just have another request, and this PR would be perfect! Instead of having two booleans, use an enum:

enum LanguagesState {
    LANGUAGES_STATE_UNINITIALIZED,
    LANGUAGES_STATE_INITIALIZED,
    LANGUAGES_STATE_TERMINATED,
};

There would be a getter for the status instead of one for any of the booleans.

Just tried to apply your diff using enums instead of two booleans.
If I remove the checks in ScriptServer::thread_enter and ScriptServer::thread_exit engine crashes.
So it seems fix in WorkerThreadPool::_process_task does not have the expected effect.

@RandomShaper
Copy link
Member

Can you please share the patch with enums, etc. so I can debug locally and try to polish it?

@piiertho
Copy link
Contributor Author

piiertho commented Nov 4, 2023

Can you please share the patch with enums, etc. so I can debug locally and try to polish it?

Here it is patch.txt

@RandomShaper
Copy link
Member

Please see (and test) #84657.

@akien-mga
Copy link
Member

Superseded by #84657. Thanks for the contribution!

@akien-mga akien-mga closed this Nov 9, 2023
@YuriSizov YuriSizov removed this from the 4.2 milestone Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants