Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Add a flag for storing invalid blocks #4194

Closed
wants to merge 4 commits into from

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Apr 14, 2023

Issue Addressed

NA

Proposed Changes

Adds a flag to store invalid blocks on disk for teh debugz. Only some invalid blocks are stored, those which:

  • Were received via gossip (rather than RPC, for instance)
    • This keeps things simple to start with and should capture most blocks.
  • Passed gossip verification
    • This reduces the ability for random people to fill up our disk. A proposer signature is required to write something to disk.

Additional Info

It's possible that we'll store blocks that aren't necessarily invalid, but we had an internal error during verification. Those blocks seem like they might be useful sometimes.

@paulhauner paulhauner added work-in-progress PR is a work-in-progress v4.2.0 Q2 2023 labels Apr 14, 2023
@paulhauner paulhauner added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Apr 19, 2023
@paulhauner paulhauner marked this pull request as ready for review April 26, 2023 07:24
@jimmygchen
Copy link
Member

sorry - deleted my earlier comment, it was for another issue 😅

@@ -1015,6 +1025,10 @@ impl<T: BeaconChainTypes> Worker<T> {
);
}
};

if result.is_err() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if storing the BlockError as well would help with debugging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we don't need to store the block error because it's already logged above. Although it could be convenient.

My concern is that this PR currently counts EL timeouts as invalid blocks and will store the block SSZ unnecessarily. I guess this is OK if we don't mind the invalid block storage containing a few false positives, although it would hamper an effort to analyze every block in the directory. I suspect in practice we'll use the invalid block storage to look up specific blocks based on Invalid gossip beacon block log messages or manual reports. In this case, it doesn't matter if there's some junk in the storage, apart from it being a waste of space.

Maybe we could just filter out the most common false positive (TimedOut) and allow the rest? I think this would be quite pragmatic using a function like is_timeout on ExecutionPayloadError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I figured out how to reply to this comment. The same comment appears a second time, but without the reply field 🤷

Perhaps writing out the error message is a nice happy medium about the timeout stuff mentioned in #4194 (comment). Super easy access to the error message might make those timeouts easier to ignore. I'll add it in.

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 🎉 I only have a minor question above.

@@ -1015,6 +1025,10 @@ impl<T: BeaconChainTypes> Worker<T> {
);
}
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a trace! just above this that tries to log the invalid block as SSZ. I think we may as well remove that (I don't think anybody likes running with trace logs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AgeManning IIRC you've used this in the past. Are you OK if we remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSZ will still be accessible on the filesystem, but you'd have to run with the flag provided to get it.

@@ -1015,6 +1025,10 @@ impl<T: BeaconChainTypes> Worker<T> {
);
}
};

if result.is_err() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we don't need to store the block error because it's already logged above. Although it could be convenient.

My concern is that this PR currently counts EL timeouts as invalid blocks and will store the block SSZ unnecessarily. I guess this is OK if we don't mind the invalid block storage containing a few false positives, although it would hamper an effort to analyze every block in the directory. I suspect in practice we'll use the invalid block storage to look up specific blocks based on Invalid gossip beacon block log messages or manual reports. In this case, it doesn't matter if there's some junk in the storage, apart from it being a waste of space.

Maybe we could just filter out the most common false positive (TimedOut) and allow the rest? I think this would be quite pragmatic using a function like is_timeout on ExecutionPayloadError

@paulhauner
Copy link
Member Author

paulhauner commented May 15, 2023

I can't seem to reply to your comment about timeouts @michaelsproul, but perhaps we may be interested in blocks that cause the EL to timeout?

EDIT: I hit send accidentally. I can definitely see that accidental timeouts might be annoying, but perhaps it would be good for identifying "bomb blocks" in testnets (and mainnet). It's hard to say what's going to be most useful, I don't feel that strongly either way.

@michaelsproul
Copy link
Member

@paulhauner Yeah good point, lets leave them in and if we find they're annoying we can revisit

@paulhauner
Copy link
Member Author

Alrighty @jimmygchen and/or @michaelsproul, we're now storing the errors in a .error file alongside the .ssz file.

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!!

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 15, 2023
@paulhauner
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request May 15, 2023
## Issue Addressed

NA

## Proposed Changes

Adds a flag to store invalid blocks on disk for teh debugz. Only *some* invalid blocks are stored, those which:

- Were received via gossip (rather than RPC, for instance)
    - This keeps things simple to start with and should capture most blocks.
- Passed gossip verification
    - This reduces the ability for random people to fill up our disk. A proposer signature is required to write something to disk.

## Additional Info

It's possible that we'll store blocks that aren't necessarily invalid, but we had an internal error during verification. Those blocks seem like they might be useful sometimes.
@bors
Copy link

bors bot commented May 15, 2023

@bors bors bot changed the title Add a flag for storing invalid blocks [Merged by Bors] - Add a flag for storing invalid blocks May 15, 2023
@bors bors bot closed this May 15, 2023
ghost pushed a commit to oone-world/lighthouse that referenced this pull request Jul 13, 2023
## Issue Addressed

NA

## Proposed Changes

Adds a flag to store invalid blocks on disk for teh debugz. Only *some* invalid blocks are stored, those which:

- Were received via gossip (rather than RPC, for instance)
    - This keeps things simple to start with and should capture most blocks.
- Passed gossip verification
    - This reduces the ability for random people to fill up our disk. A proposer signature is required to write something to disk.

## Additional Info

It's possible that we'll store blocks that aren't necessarily invalid, but we had an internal error during verification. Those blocks seem like they might be useful sometimes.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
NA

Adds a flag to store invalid blocks on disk for teh debugz. Only *some* invalid blocks are stored, those which:

- Were received via gossip (rather than RPC, for instance)
    - This keeps things simple to start with and should capture most blocks.
- Passed gossip verification
    - This reduces the ability for random people to fill up our disk. A proposer signature is required to write something to disk.

It's possible that we'll store blocks that aren't necessarily invalid, but we had an internal error during verification. Those blocks seem like they might be useful sometimes.
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
NA

Adds a flag to store invalid blocks on disk for teh debugz. Only *some* invalid blocks are stored, those which:

- Were received via gossip (rather than RPC, for instance)
    - This keeps things simple to start with and should capture most blocks.
- Passed gossip verification
    - This reduces the ability for random people to fill up our disk. A proposer signature is required to write something to disk.

It's possible that we'll store blocks that aren't necessarily invalid, but we had an internal error during verification. Those blocks seem like they might be useful sometimes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v4.2.0 Q2 2023
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants