-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Besu StacklessClosedChannelException errors and resulted Timeout errors in CL clients #4410
Fix Besu StacklessClosedChannelException errors and resulted Timeout errors in CL clients #4410
Conversation
…e API requests are executed sequentially. Signed-off-by: Ameziane H <ameziane.hamlat@consensys.net>
I installed a Besu patched with this PR and it has had little effect on my missed attestations. Correctly voted head percent still is floating between 58% and 81%. Before merge I was at 98%. But maybe this helps fix other problems. Good luck! |
@kayagoban thanks for your feedback, what is your hardware setup and your CL client ? do you see see timeout errors in CL side and could you share Besu logs to check block processing time ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, as long as the message ordering property is maintained for forkChoiceUpdated as per https://github.com/ethereum/execution-apis/blob/main/src/engine/specification.md#message-ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this fixes the eth_ concurrency issue and keeps the engine_ api calls ordered and synchronous 👍
As suggeted by @jflo we could/should do additional websocket engine api testing with nimbus also to ensure that concurrent websocket requests are handled correctly when nimbus is using websockets for the engine api.
I will update with nimbus websocket testing results.
I’m running Besu/Prysm. No timeout errors on CL side. Maybe I can make a new issue with what logs I do have and my symptoms. |
@kayagoban Yes, please, we think we have different use cases around missing attestations. Please, provide your hardware setup, Besu and Prysm logs. |
@garyschulte, running this patch on mainnet with nimbus websockets, just missed an attestation after about two hours of running the patched version. |
Thanks for testing. Other than continuing to miss attestations did nimbus/besu work normally otherwise? What percentage of attestations were missed before and after? |
Yes, other than the missed attestation it’s working normally, I can’t tell if there is a difference yet as prior it was also missing roughly 1-2 attestations/hr. It will be interesting to see if effectiveness improves, before the patch it would hover between 90-95% due to the occasional sub optimal inclusion distance, hoping to see improvement after 12hrs. |
I'm receiving missed attestations with a Prysm / Besu configuration. Relevant from Prysm might be the repeating message time="2022-09-19 07:44:23" level=info msg="Subscribed to topic" prefix=sync topic="/eth2/4a26c58b/beacon_attestation_2/ssz_snappy" |
Sadly I can report that this PR did not fix missed attestations with nimbus, roughly same amount of attestations are missed and effectiveness won’t go beyond 94-95% (was 97-98% pre-merge). |
Missed attestations despite this fix being tracked here #4400 |
…e API requests are executed sequentially. (hyperledger#4410) Signed-off-by: Ameziane H <ameziane.hamlat@consensys.net>
Signed-off-by: Ameziane H ameziane.hamlat@consensys.net
PR description
StacklessClosedChannelException are thrown when some Eth calls, like eth_syncing or eth_getBlockByHash have to wait for the engine_newPayloadV1 call to finish. As the latter takes sometimes more than 1 second to execute depending on the block size, transactions type and hardware setup, the CL closes the connection (in the case of Lighthouse) because the timeout is set to 1 second on these calls.
In the screenshot below, we can see that 1 eth_getBlockByHash call and 2 eth_syncing calls are waiting for engine_newPayloadV1 call to finish.
The Eth calls should't execute sequentially with the other Engine API calls, this fix will make several requests to execute concurrently on port 8551.
@garyschulte please confirm that the 'Engine" will still execute sequentially even with this new configuration.
You can find lighthouse logs before and after the fix.
Fixed Issue(s)
fixes #4398 and #4400
Documentation
doc-change-required
label to this PR ifupdates are required.
Changelog