Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when firing a signal just after awaiting on that signal #89439

Closed
BimDav opened this issue Mar 13, 2024 · 15 comments · Fixed by #89451
Closed

Errors when firing a signal just after awaiting on that signal #89439

BimDav opened this issue Mar 13, 2024 · 15 comments · Fixed by #89451

Comments

@BimDav
Copy link
Contributor

BimDav commented Mar 13, 2024

Tested versions

4.2.stable

System information

MacOS, M2

Issue description

Errors are logged when firing a signal just after awaiting on that signal. It seems that the await object is not properly cleaned up and is still partly waiting on the signal.

extends Node2D

signal my_signal()

func _ready() -> void:
	await my_signal
	my_signal.emit()

func _process(_delta: float) -> void:
	my_signal.emit()
	set_process(false)

gives

E 0:00:00:0589   Node2D.gd:8 @ _ready(): Resumed function '_ready()' after await, but script is gone. At script: res://Node2D.gd:7
  <Erreur C++>   Method/function failed. Returning: Variant()
  <Source C++>   modules/gdscript/gdscript_function.cpp:197 @ resume()
  <Pile des appels>Node2D.gd:8 @ _ready()
                 Node2D.gd:13 @ _process()

E 0:00:00:0589   Node2D.gd:13 @ _process(): Attempt to disconnect a nonexistent connection from 'Node2D:<Node2D#26793215177>'. Signal: 'my_signal', callable: 'GDScriptFunctionState::_signal_callback'.
  <Erreur C++>   Condition "!s->slot_map.has(*p_callable.get_base_comparator())" is true. Returning: false
  <Source C++>   core/object/object.cpp:1420 @ _disconnect()
  <Pile des appels>Node2D.gd:13 @ _process()

Steps to reproduce

Run the project below or the code sample above.

Minimal reproduction project (MRP)

await_signal_twice_fired.zip

@ivoiv
Copy link

ivoiv commented Mar 13, 2024

Aren't you doing something "illegal"?

You're blocking your _ready from running before _process has emitted the signal.
set_process docs say and probably for good reason

Any calls to this before _ready will be ignored.

Whatever called the _ready is probably stuck waiting on the await as well so maybe it can't clean up or assign things properly?

@BimDav
Copy link
Contributor Author

BimDav commented Mar 13, 2024

It's just a minimal code example to reproduce the bug. The bug is not linked to this happening in _ready or not. The bug is also not linked to the set_process(false), which I added to remove confusion due to the signal being emitted each frame. This way, it is only emitted once.

@kleonc
Copy link
Member

kleonc commented Mar 13, 2024

The direct cause is that ONE_SHOT connections to a signal are disconnected after the connected method is called, not before. Meaning e.g. this results in an infinite loop:

extends Node2D

signal my_signal

func _ready():
	my_signal.connect(foo, CONNECT_ONE_SHOT)
	my_signal.emit()

func foo():
	my_signal.emit()

Related issue: #28857.

How does it affect await? When awaiting on a signal GDScriptFunctionState::_signal_callback which calls GDScriptFunctionState::resume is ONE_SHOT-connected to such awaited signal:

Error err = sig.connect(Callable(gdfs.ptr(), "_signal_callback").bind(retvalue), Object::CONNECT_ONE_SHOT);

Variant GDScriptFunctionState::_signal_callback(const Variant **p_args, int p_argcount, Callable::CallError &r_error) {
Variant arg;
r_error.error = Callable::CallError::CALL_OK;
if (p_argcount == 0) {
r_error.error = Callable::CallError::CALL_ERROR_TOO_FEW_ARGUMENTS;
r_error.expected = 1;
return Variant();
} else if (p_argcount == 1) {
//noooneee
} else if (p_argcount == 2) {
arg = *p_args[0];
} else {
Array extra_args;
for (int i = 0; i < p_argcount - 1; i++) {
extra_args.push_back(*p_args[i]);
}
arg = extra_args;
}
Ref<GDScriptFunctionState> self = *p_args[p_argcount - 1];
if (self.is_null()) {
r_error.error = Callable::CallError::CALL_ERROR_INVALID_ARGUMENT;
r_error.argument = p_argcount - 1;
r_error.expected = Variant::OBJECT;
return Variant();
}
return resume(arg);
}

Meaning if the awaited signal is emitted again during such resume() then it results in calling resume() again because the ONE_SHOT connection is not yet disconnected.


extends Node2D

signal my_signal

func _ready() -> void:
	foo()
	my_signal.emit() # A

func foo():
	await my_signal
	my_signal.emit() # B

Execution goes something like:

`_ready()`
	`foo()`
		`await my_signal`
			`ONE_SHOT` connect resuming `foo` on `my_signal`
	`my_signal.emit() # A`
		Resume `foo`.
			`my_signal.emit() # B`
				Resume `foo`.
					Error (already removed from `scripts_list` within the first resume).
				Disconnect resuming `foo` on `my_signal` signal (it's ONE_SHOT and `my_signal.emit() # B` handled all connections).
			foo "completed" signal
		Error trying to disconnect resuming `foo` on `my_signal` signal (`my_signal.emit() # A` handled all connections).

The in GDScriptFunctionState::resume error part:

Variant GDScriptFunctionState::resume(const Variant &p_arg) {
ERR_FAIL_NULL_V(function, Variant());
{
MutexLock lock(GDScriptLanguage::singleton->mutex);
if (!scripts_list.in_list()) {
#ifdef DEBUG_ENABLED
ERR_FAIL_V_MSG(Variant(), "Resumed function '" + state.function_name + "()' after await, but script is gone. At script: " + state.script_path + ":" + itos(state.line));
#else
return Variant();
#endif
}
if (state.instance && !instances_list.in_list()) {
#ifdef DEBUG_ENABLED
ERR_FAIL_V_MSG(Variant(), "Resumed function '" + state.function_name + "()' after await, but class instance is gone. At script: " + state.script_path + ":" + itos(state.line));
#else
return Variant();
#endif
}
// Do these now to avoid locking again after the call
scripts_list.remove_from_list();
instances_list.remove_from_list();
}


How disconnecting ONE_SHOT connections is done:

godot/core/object/object.cpp

Lines 1121 to 1175 in 6128206

for (const Connection &c : slot_conns) {
if (!c.callable.is_valid()) {
// Target might have been deleted during signal callback, this is expected and OK.
continue;
}
const Variant **args = p_args;
int argc = p_argcount;
if (c.flags & CONNECT_DEFERRED) {
MessageQueue::get_singleton()->push_callablep(c.callable, args, argc, true);
} else {
Callable::CallError ce;
_emitting = true;
Variant ret;
c.callable.callp(args, argc, ret, ce);
_emitting = false;
if (ce.error != Callable::CallError::CALL_OK) {
#ifdef DEBUG_ENABLED
if (c.flags & CONNECT_PERSIST && Engine::get_singleton()->is_editor_hint() && (script.is_null() || !Ref<Script>(script)->is_tool())) {
continue;
}
#endif
Object *target = c.callable.get_object();
if (ce.error == Callable::CallError::CALL_ERROR_INVALID_METHOD && target && !ClassDB::class_exists(target->get_class_name())) {
//most likely object is not initialized yet, do not throw error.
} else {
ERR_PRINT("Error calling from signal '" + String(p_name) + "' to callable: " + Variant::get_callable_error_text(c.callable, args, argc, ce) + ".");
err = ERR_METHOD_NOT_FOUND;
}
}
}
bool disconnect = c.flags & CONNECT_ONE_SHOT;
#ifdef TOOLS_ENABLED
if (disconnect && (c.flags & CONNECT_PERSIST) && Engine::get_singleton()->is_editor_hint()) {
//this signal was connected from the editor, and is being edited. just don't disconnect for now
disconnect = false;
}
#endif
if (disconnect) {
_ObjectSignalDisconnectData dd;
dd.signal = p_name;
dd.callable = c.callable;
disconnect_data.push_back(dd);
}
}
while (!disconnect_data.is_empty()) {
const _ObjectSignalDisconnectData &dd = disconnect_data.front()->get();
_disconnect(dd.signal, dd.callable);
disconnect_data.pop_front();
}

@BimDav
Copy link
Contributor Author

BimDav commented Mar 13, 2024

@kleonc Thanks for investigating! @AThousandShips it is still a bug, though, right? Why did you remove the bug tag?

@AThousandShips
Copy link
Member

AThousandShips commented Mar 13, 2024

The infinite recursion should maybe be considered one, but I'd say emitting a signal from the handler for that same signal should be considered recursion and not allowed, I'd say that's expected behavior

I'd say the behavior should either be documented, or a blocking check should be added to emitting signals to ensure you don't loop, but allowing this doesn't make sense to me, regardless of one-shot

I'd say this is the same limitation as:

func foo() -> void:
    foo.call_deferred()

func _ready() -> void:
    foo.call_deferred()

I'm not even sure how this could be "fixed" in any other way than adding a check that errors if you try to emit a signal from within its own handler

@BimDav
Copy link
Contributor Author

BimDav commented Mar 13, 2024

That's what I get for giving a minimal reproducing example :).

Consider this: I want to wait for the state to get back to Idle before updating it to State1. There is a set_state() setter and a state_changed signal:

func set_state(p_state: State) -> void:
   state = p_state
   state_changed.emit()

now in my updating function:

func wait_and_update_to_state1() -> void:
   while state != Idle:
      await state_changed
   state = State1

This seems reasonable, right? It gives the above errors.

@AThousandShips
Copy link
Member

AThousandShips commented Mar 13, 2024

That's what I get for giving a minimal reproducing example

What do you mean?

As explained above awaiting uses connections, so that's an unfortunate consequence of the same limitation, but I don't think it's solvable to handle this as such, try using deferred calls to update the data perhaps, otherwise I'd say this is a limitation just like call deferred

But I'd really say that awaiting something happening and then triggering the same is the same issue and would be error prone

@BimDav
Copy link
Contributor Author

BimDav commented Mar 13, 2024

The problem of giving minimal reproducing example is that it can often be answered by "why would even do that anyway?". I should have given more realistic context from the start.

In my opinion, it's not because something is complicated to fix that it's not a bug: I think the above example is a perfectly reasonable usecase for await, and it not working without weird errors being logged constitutes a bug.

@AThousandShips
Copy link
Member

AThousandShips commented Mar 13, 2024

The problem of giving minimal reproducing example is that it can often be answered by "why would even do that anyway?"

That's not the reason for requesting an MRP, as you can see from the issue template :)

In my opinion, it's not because something is complicated to fix that it's not a bug

Agreed, I never said it wasn't a bug because of that, please read what I say instead of putting words in my moth, act professionally and please read the code of conduct

I'm going to stop engaging with you and I'll wish you a good day

To be clear, using await means:

func my_func():
    foo_1()
    foo_2()
    await bar
    baz_1()
    baz_2()

Is equivalent to:

func my_func():
    foo_1()
    foo_2()
    bar.connect(my_func_2, CONNECT_ONE_SHOT)

func my_func_2():
    baz_1()
    baz_2()

@kleonc
Copy link
Member

kleonc commented Mar 13, 2024

@AThousandShips
To clarify: I've only pointed out the direct cause of the reported issue for the current implementation.

Whether how ONE_SHOT connections are being disconnected is intentional or it is a bug itself I don't know, hence why I've asked about this in the contributors chat (I could have mentioned my doubts in my previous comment).

But regardless of the current ONE_SHOT disconnecting behavior, what's reported here i.e. await producing errors was already triaged as a bug by a GDScript team member (dalexeev) so please respect this more / be more thoughful about changing the labels in the future (I'll leave fixing them to you).

The behavior of await seems already correct / as expected, what's reported here is some superfluous/unneeded errors.


So:

  • If the current ONE_SHOT disconnecting behavior is a bug then it needs to be fixed. The issue with await erroring would then be automatically fixed as a result.
  • If the current ONE_SHOT disconnecting behavior is not a bug then await's implementation needs to be tweaked to fix the reported issue accordingly.

@BimDav Your original MRP is fine/sufficient on its own, no need for a "real world" example to classify some easily reproducible behavior as a bug / not a bug.

@AThousandShips
Copy link
Member

AThousandShips commented Mar 13, 2024

And I gave my arguments in chat too, and explained here, but I'll restore the label, but I thought this through and presented my arguments, so please take that into account (and don't misrepresent my actions) :) also note that the original labelling of this was made without the knowledge of these details, so it's not irrelevant

I'd say the solution should be using something else for await, and enforcing protection against recursion in signals

@dalexeev
Copy link
Member

dalexeev commented Mar 13, 2024

It's just a minimal code example to reproduce the bug.

Just for information, you can call an async function without waiting for it. So the following example is simpler since it doesn't involve node processing:

extends Node

signal my_signal()

func _ready():
    async_func()
    my_signal.emit()

func async_func():
    await my_signal
    my_signal.emit()

@kleonc

This comment was marked as off-topic.

@kleonc
Copy link
Member

kleonc commented Mar 14, 2024

enforcing protection against recursion in signals

I'm absolutely against this, there's nothing wrong with the recursive signals on its own. Sure an infinite recursion is unwanted/problematic but if it happens then that would be a user error (assuming no in-engine bugs). Just because recursion can be misued doesn't mean it should be disallowed.

BTW it's already allowed hence changing it I'd consider as a breaking change (as I don't think it's a bug / needs a bugfix).

And that's also kinda a separate discussion, even possibly a material for a proposal.

@BimDav
Copy link
Contributor Author

BimDav commented Mar 14, 2024

I have a second, similar but different enough case. Here is a sample project:
resume_after_await.zip

extends Node2D

signal signal1
signal signal2


func _ready() -> void:
	signal1.connect(_on_signal1_emitted)
	var coroutine: = Coroutine.new()  # a RefCounted
	coroutine.add_future(signal1)   # coroutine awaits on signal1
	coroutine.add_future(signal2)   # coroutine awaits on signal2
	await coroutine.join_either()   # signal1 will resume execution here and coroutine will be deleted
	print("after")


func _process(delta: float) -> void:
	signal1.emit()


func _on_signal1_emitted() -> void:
	signal2.emit()  # this gives an error as coroutine is deleted

The issue here seems to be that the receivers list stills contains coroutine's await even though it was just deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants