Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beacon API events SSE stream unexpected disconnections #4245

Closed
Savid opened this issue Apr 28, 2023 · 4 comments
Closed

Beacon API events SSE stream unexpected disconnections #4245

Savid opened this issue Apr 28, 2023 · 4 comments
Labels
bug Something isn't working HTTP-API

Comments

@Savid
Copy link

Savid commented Apr 28, 2023

Description

When connecting to the /eth/v1/events event stream, clients will get unexpected disconnections from the lighthouse beacon node. Especially on the attestation topic for mainnet while subscribing all subnets.

Version

Rust - 1.69.0 (84c898d65 2023-04-16)
unstable - commit 7456e1e8faac7a581705f8e71b0dc4f09a36ee5c
stable - commit 693886b94176faa4cb450f024696cb69cda2fe58

Present Behaviour

To replicate the current behaviour easily connect via curl;

curl "http://127.0.0.1:5052/eth/v1/events?topics=attestation" -H "accept: text/event-stream"

Eventually you'll get disconnected with the libcurl error;

curl: (18) transfer closed with outstanding read data remaining

If you want to speed this up run the beacon node with --subscribe-all-subnets --import-all-attestations flags on mainnet to have more attestation events published. With a "fast" machine you will get disconnected multiple times a slot.

Expected Behaviour

Should not disconnect unexpectedly.

@michaelsproul michaelsproul added bug Something isn't working HTTP-API labels Apr 28, 2023
@michaelsproul
Copy link
Member

I think this could be due to the channel filling up, and the stream being terminated by our error handling here:

Err(e) => Err(warp_utils::reject::server_sent_event_error(
format!("{:?}", e),
)),

The broadcast receiver will return "lag errors" as described here when the channel fills up:

If a value is sent when the channel is at capacity, the oldest value currently held by the channel is released. This frees up space for the new value. Any receiver that has not yet seen the released value will return RecvError::Lagged the next time recv is called.

I think we could map these RecvError::Lagged errors into no-op events, maybe a comment that says ":dropped some events".

We could also set some more generous defaults for the channel capacity, which is currently hardcoded at 16 here:

const DEFAULT_CHANNEL_CAPACITY: usize = 16;

Different capacities for different events would probably make sense, and a way to adjust them via the CLI (maybe an integer multiplier that applies to all channel capacities).

@michaelsproul
Copy link
Member

Implementation of this fix should happen on top of #4462

@michaelsproul
Copy link
Member

WIP PR for initial testing here: #4500

I haven't been able to get it to produce the :error - dropped N messages comment yet, but will run it for a few hours and see what happens

@Savid
Copy link
Author

Savid commented Jul 13, 2023

awesome, thanks. Just built an image. Will roll it out to one of our mainnet sentries and let you know how it looks

bors bot pushed a commit that referenced this issue Aug 17, 2023
## Issue Addressed

Closes #4245

## Proposed Changes

- If an SSE channel fills up, send a comment instead of terminating the stream.
- Add a CLI flag for scaling up the SSE buffer: `--http-sse-capacity-multiplier N`.

## Additional Info

~~Blocked on #4462. I haven't rebased on that PR yet for initial testing, because it still needs some more work to handle long-running HTTP threads.~~

- [x] Add CLI flag tests.
bors bot pushed a commit that referenced this issue Aug 17, 2023
## Issue Addressed

Closes #4245

## Proposed Changes

- If an SSE channel fills up, send a comment instead of terminating the stream.
- Add a CLI flag for scaling up the SSE buffer: `--http-sse-capacity-multiplier N`.

## Additional Info

~~Blocked on #4462. I haven't rebased on that PR yet for initial testing, because it still needs some more work to handle long-running HTTP threads.~~

- [x] Add CLI flag tests.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Closes sigp#4245

- If an SSE channel fills up, send a comment instead of terminating the stream.
- Add a CLI flag for scaling up the SSE buffer: `--http-sse-capacity-multiplier N`.

~~Blocked on sigp#4462. I haven't rebased on that PR yet for initial testing, because it still needs some more work to handle long-running HTTP threads.~~

- [x] Add CLI flag tests.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Closes sigp#4245

- If an SSE channel fills up, send a comment instead of terminating the stream.
- Add a CLI flag for scaling up the SSE buffer: `--http-sse-capacity-multiplier N`.

~~Blocked on sigp#4462. I haven't rebased on that PR yet for initial testing, because it still needs some more work to handle long-running HTTP threads.~~

- [x] Add CLI flag tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working HTTP-API
Projects
None yet
Development

No branches or pull requests

2 participants