Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious Error whilst producing block when using block relays #4473

Closed
michaelsproul opened this issue Jul 6, 2023 · 6 comments
Closed

Spurious Error whilst producing block when using block relays #4473

michaelsproul opened this issue Jul 6, 2023 · 6 comments
Assignees
Labels
bug Something isn't working UX-and-logs v4.4.1 ETA August 2023

Comments

@michaelsproul
Copy link
Member

Description

The Lighthouse v4.3.0 validator client will sometimes log an error message when publishing a block that came from an MEV relay:

Jul 06 00:18:23.968 ERRO Error whilst producing block info: this error may or may not result in a missed block, block_slot: XXX, error: "Some endpoints failed, num_failed: 4 https://node1 => RequestFailed(Irrecoverable(\"Error from beacon node when publishing block: ServerMessage(ErrorMessage { code: 400, message: \\\"BAD_REQUEST: Unsupported endpoint version: v1\\\", stacktraces: [] })\")) ...

This strange error is the result of several subtleties:

  1. In Lighthouse v4.3.0 we changed the block publishing API to properly check gossip validity before publishing (No gossip validation before block broadcast #4264). This includes ruling out duplicate blocks. When publishing a builder block, the relay publishes it to the network before returning it back to the proposer, so many blocks are detected as duplicates and rejected.
  2. Due to another bug in our HTTP route handlers (Counterintuitive warp error for Unsupported endpoint version #3404) the true error message is lost when conveying it to the validator client. We know how to fix this class of error now, and just need to roll out the fix over the whole HTTP API (this might interact with [Merged by Bors] - Use BeaconProcessor for API requests #4462).

Version

Lighthouse v4.3.0

Present Behaviour

The aforementioned error will be logged by the VC. The block should still be integrated into the chain (assuming the builder does their job and there's no re-org).

The beacon node will log this warning, which can be used to quickly check that a block proposal didn't fail for another reason:

WARN Not publishing block, not gossip verified slot => xxx, error => BlockIsAlreadyKnown

Expected Behaviour

We are still discussing the best way to handle this, but our hope is that we'll be able to distinguish valid duplicate blocks, blocks that are still in the process of being imported, and invalid duplicate blocks. The HTTP return codes in each case might end up being 200, 202 and 400 respectively. The DuplicateCache will be helpful in identifying the "in progress" blocks.

@mcdee
Copy link
Contributor

mcdee commented Jul 18, 2023

It would certainly help if valid duplicate blocks returned a suitable success code, given that this is now a relatively common scenario with MEV relays publishing blocks in advance of returning them to validator clients.

@michaelsproul
Copy link
Member Author

@mcdee Indeed. The troublesome case to handle is the one where we know the block is a duplicate (because it hit the duplicate filter), but it hasn't finished being processed yet, so we don't know whether it's valid or invalid. I think our options are:

  • Just return a 200: the block has already been broadcast and is probably valid. If it's not valid, there's not much the caller can do anyway, as the block has already been broadcast from elsewhere.
  • Just return a 206: by similar logic to above, but erring on the side of caution by declaring the block to be probably invalid. The caller will know the block has been broadcast, and then can check out of band whether it was invalid.
  • Wait for the other block to finish processing. This is probably the "most correct" option but the most fiddly to implement. We need to suspend validation of the API block and have a mechanism by which it can wait. This would require extra queues/channels, which although not impossible to implement, definitely adds to the complexity.
  • Add a new status code for duplicate blocks, which doesn't imply anything about their validity. An HTTP 409 might be suitable for this, and could be added to the new v3 block endpoint.

I think I'm leaning towards the 2nd option for now (return 206), with a bit of extra handling to try to return a 200 if: the block is in the observed_block_producers cache, is not in the DuplicateCache and is in fork choice.

@mcdee
Copy link
Contributor

mcdee commented Jul 19, 2023

Yeah option 3 would be the best from a client's point of view, but I understand the complexity.

Option 2 is workable, although I think the reality is that a validator client isn't going to do anything with a 206 above what it would do with a 200, so unsure if differentiating the situations makes sense. Given it's a duplicate of a block received over P2P, and the validator client is assumedly happy enough with it to send via the REST API (plus it can't change its mind anyway having signed the block), I think that a 200 is reasonable.

So I'd say option 3, then option 1 as my preferred results.

@michaelsproul michaelsproul self-assigned this Aug 17, 2023
@paulhauner
Copy link
Member

paulhauner commented Aug 21, 2023

@ethDreamer are you still planning to work on this? I recall you putting your hand up for it in a meeting.

@michaelsproul
Copy link
Member Author

@paulhauner I'm working on it, about to raise a PR. Mark confirmed he wasn't working on it via DM

bors bot pushed a commit that referenced this issue Aug 28, 2023
## Issue Addressed

Closes #4473 (take 3)

## Proposed Changes

- Send a 202 status code by default for duplicate blocks, instead of 400. This conveys to the caller that the block was published, but makes no guarantees about its validity. Block relays can count this as a success or a failure as they wish.
- For users wanting finer-grained control over which status is returned for duplicates, a flag `--http-duplicate-block-status` can be used to adjust the behaviour. A 400 status can be supplied to restore the old (spec-compliant) behaviour, or a 200 status can be used to silence VCs that warn loudly for non-200 codes (e.g. Lighthouse prior to v4.4.0).
- Update the Lighthouse VC to gracefully handle success codes other than 200. The info message isn't the nicest thing to read, but it covers all bases and isn't a nasty `ERRO`/`CRIT` that will wake anyone up.

## Additional Info

I'm planning to raise a PR to `beacon-APIs` to specify that clients may return 202 for duplicate blocks. Really it would be nice to use some 2xx code that _isn't_ the same as the code for "published but invalid". I think unfortunately there aren't any suitable codes, and maybe the best fit is `409 CONFLICT`. Given that we need to fix this promptly for our release, I think using the 202 code temporarily with configuration strikes a nice compromise.
@paulhauner
Copy link
Member

Resolved by #4655

jxs pushed a commit to jxs/lighthouse that referenced this issue Aug 28, 2023
## Issue Addressed

Closes sigp#4473 (take 3)

## Proposed Changes

- Send a 202 status code by default for duplicate blocks, instead of 400. This conveys to the caller that the block was published, but makes no guarantees about its validity. Block relays can count this as a success or a failure as they wish.
- For users wanting finer-grained control over which status is returned for duplicates, a flag `--http-duplicate-block-status` can be used to adjust the behaviour. A 400 status can be supplied to restore the old (spec-compliant) behaviour, or a 200 status can be used to silence VCs that warn loudly for non-200 codes (e.g. Lighthouse prior to v4.4.0).
- Update the Lighthouse VC to gracefully handle success codes other than 200. The info message isn't the nicest thing to read, but it covers all bases and isn't a nasty `ERRO`/`CRIT` that will wake anyone up.

## Additional Info

I'm planning to raise a PR to `beacon-APIs` to specify that clients may return 202 for duplicate blocks. Really it would be nice to use some 2xx code that _isn't_ the same as the code for "published but invalid". I think unfortunately there aren't any suitable codes, and maybe the best fit is `409 CONFLICT`. Given that we need to fix this promptly for our release, I think using the 202 code temporarily with configuration strikes a nice compromise.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Closes sigp#4473 (take 3)

- Send a 202 status code by default for duplicate blocks, instead of 400. This conveys to the caller that the block was published, but makes no guarantees about its validity. Block relays can count this as a success or a failure as they wish.
- For users wanting finer-grained control over which status is returned for duplicates, a flag `--http-duplicate-block-status` can be used to adjust the behaviour. A 400 status can be supplied to restore the old (spec-compliant) behaviour, or a 200 status can be used to silence VCs that warn loudly for non-200 codes (e.g. Lighthouse prior to v4.4.0).
- Update the Lighthouse VC to gracefully handle success codes other than 200. The info message isn't the nicest thing to read, but it covers all bases and isn't a nasty `ERRO`/`CRIT` that will wake anyone up.

I'm planning to raise a PR to `beacon-APIs` to specify that clients may return 202 for duplicate blocks. Really it would be nice to use some 2xx code that _isn't_ the same as the code for "published but invalid". I think unfortunately there aren't any suitable codes, and maybe the best fit is `409 CONFLICT`. Given that we need to fix this promptly for our release, I think using the 202 code temporarily with configuration strikes a nice compromise.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
Closes sigp#4473 (take 3)

- Send a 202 status code by default for duplicate blocks, instead of 400. This conveys to the caller that the block was published, but makes no guarantees about its validity. Block relays can count this as a success or a failure as they wish.
- For users wanting finer-grained control over which status is returned for duplicates, a flag `--http-duplicate-block-status` can be used to adjust the behaviour. A 400 status can be supplied to restore the old (spec-compliant) behaviour, or a 200 status can be used to silence VCs that warn loudly for non-200 codes (e.g. Lighthouse prior to v4.4.0).
- Update the Lighthouse VC to gracefully handle success codes other than 200. The info message isn't the nicest thing to read, but it covers all bases and isn't a nasty `ERRO`/`CRIT` that will wake anyone up.

I'm planning to raise a PR to `beacon-APIs` to specify that clients may return 202 for duplicate blocks. Really it would be nice to use some 2xx code that _isn't_ the same as the code for "published but invalid". I think unfortunately there aren't any suitable codes, and maybe the best fit is `409 CONFLICT`. Given that we need to fix this promptly for our release, I think using the 202 code temporarily with configuration strikes a nice compromise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working UX-and-logs v4.4.1 ETA August 2023
Projects
None yet
Development

No branches or pull requests

3 participants