-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still worry a little about the design around badblocks. Using RPC methods doesn't really feel elegant, and in some cases if under attacks, it may not work well.
@holiman would it work if those bad blocks are printed out as tracing info instead?
- We are not limited to 8mb bad blocks. For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.
- For cases where you actually want to debug bad blocks, you always have direct node access. So it makes this RPC indirection a little bit unnecessary.
ethcore/src/client/bad_blocks.rs
Outdated
|
||
impl BadBlocks { | ||
/// Reports given RLP as invalid block. | ||
pub fn report(&self, raw: Bytes, message: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace.
} | ||
}, | ||
Err(err) => { | ||
self.bad_blocks.report(raw, format!("{:?}", err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this configurable? Such as only enable bad blocks reporting when the debug namespace RPC is enabled, or just use another cli flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole point is to have it running by default, so that anyone (that have seen such message) can be asked to get bad blocks from RPC and then report them for the analysis.
use v1::types::{Block, Bytes, RichBlock, BlockTransactions, Transaction}; | ||
|
||
/// Debug rpc implementation. | ||
pub struct DebugClient<C> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this mod needs to be enabled in parity/rpc_apis.rs
? Or did I missed that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's 100% true. Haven't tested it properly, will add to CLI and actually check if it's queryable through RPC.
The reason to have this functionality is so that we can run one geth node and one parity node in paralell, and detect immediately if a consensus fork has occurred (one rejecting a block that the other has accepted). The later steps of analyzing the error is not very difficult, although it definitely helps if the client that rejecteded the block also prints out the rlp, for analysis.
So, of the cases where real-world consensus issues have happened, none of them have been at that level of sophistication -- afiak all of them have been incidental. But even so -- how could they be flooded with non-consensus-critical bad blocks? Those would be rejected due to low difficulty even before going into actual block execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general idea is that logs are easily lost, and accessing over RPC makes it easy for anyone to fetch the details and report back for analysis.
For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.
That would require a targeted attack to your particular machine, other bad blocks would most likely be propagated by consensus-affected peers anyway and re-imported in the future, so they would show up in the cache again.
} | ||
}, | ||
Err(err) => { | ||
self.bad_blocks.report(raw, format!("{:?}", err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole point is to have it running by default, so that anyone (that have seen such message) can be asked to get bad blocks from RPC and then report them for the analysis.
use v1::types::{Block, Bytes, RichBlock, BlockTransactions, Transaction}; | ||
|
||
/// Debug rpc implementation. | ||
pub struct DebugClient<C> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's 100% true. Haven't tested it properly, will add to CLI and actually check if it's queryable through RPC.
Another idea is to move this outside of |
@holiman What's the definition of a "bad block" then? In this PR we are reporting any block that fails the validation (even before execution) - so even low difficulty, invalid seal, etc. blocks would be reported here. |
I don't understand how a small cache of blocks can be a potential attack target? Am I missing something? |
Actually, I'm not quite certain -- it's possible we do too. However, as you've pointed out -- an attacker needs to specifically target the node, in order to get his bad blocks propagated. |
We report bad blocks here:https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L1136 (pre-check) and here: https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L1154 (processing) |
Some more context: we've had 'reporting' of bad blocks now for quite some time. By 'reporting' I mean printing out a warning about a bad block being encountered, and some details, in the log stream. This has triggered only a couple of times over the last couple of years, and each time it has been very good info. IIRC, that's what made people notice when Parity mined bad blocks on Ropsten due to the consensus-bug with having EIP-86 enabled. The added ability to actually get the same info over RPC will make it possible to get instant alerts immediately if a fork occurs. If the fork-monitor needs to parse log output for the same info, it will be quite messy. If you see fit to limit the badblock-cache even more, then please do so, but I'd really love to have this feature in, even if it's somewhat limited. Right now, our fork monitor (http://mon04.ethdevops.io/forkmon) would immediately detect if parity head contains something that geth rejects, but the opposite scenario would not be detected until later. |
How does this relate to #9207 ? |
@5chdn #9207 is around for more than a month and it's nowhere close to be neither correct nor finished - it stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -647,6 +658,7 @@ impl ApiSet { | |||
public_list | |||
}, | |||
ApiSet::SafeContext => { | |||
public_list.insert(Api::Debug); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why enable this only for safe context (given that bad blocks cache is enabled by default)? I understand that the debug
namespace might have other methods in the future which are unsafe to expose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't enable for two reasons:
debug
is potentially unsafe- It should never be used by regular dapps
- We have this implemented to support detecting forks, I don't want to make it a first-class namespace where we implement all the methods, cause that most likely won't happen - the methods are not going through EIPs, and they are pretty much geth-specific debug RPCs.
util/memory_cache/src/lib.rs
Outdated
pub fn backstore(&self) -> &LruCache<K, V> { | ||
&self.inner | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
/// Recently seen bad blocks. | ||
pub struct BadBlocks { | ||
last_blocks: RwLock<MemoryLruCache<H256, (Unverified, String)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to add the raw_block data (Vec) into the tuple also, to be able to save "undecodable" BadBlocks? Is that makes any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I though about it actually. It doesn't allow us to return detailed block in the RPC response though. I think currently it makes sense to have something with the same results as geth to start detecting issues asap.
That said, feel free to log it as a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about this and checked the geth implementation and it seems they also don't store block rlp's that aren't decodable.
To follow up on @sorpaas comment. We spoke through a different channel about this and agreed that it doesn't necessarily suits the core client. The idea that came up was to expose structured logs via RPC (we already to that for recent logs, we could add a subscription), where one would be able to subscribe to some particular events happening in the core client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but CI is failing..
Closes #8761
reason
to the RPC response to see internal error message that occurred during import.