Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue failed reports for later insertion into blocks #98

Merged
merged 18 commits into from
Mar 5, 2019

Conversation

DemiMarie
Copy link

If we fail to send a transaction that reports a validator as malicious,
store it on a stack. When we next seal a block, included the failed
transactions.

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #97, let's make sure we retry sending the transactions until they are committed and finalized. When retrying a whole batch of reports, let's use consecutive nonces for them, so that they all have a chance to get committed.

ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
What's still missing is detecting committed reports in finalized blocks, I think: Those can be removed from the queue.

ethcore/src/client/client.rs Outdated Show resolved Hide resolved
@DemiMarie
Copy link
Author

@afck can you review the changed request?

ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
afck
afck previously requested changes Feb 25, 2019
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
ethcore/src/engines/validator_set/safe_contract.rs Outdated Show resolved Hide resolved
@DemiMarie
Copy link
Author

@afck what would you set the limit to?

Copy link

@vkomenda vkomenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the ABI sensible you could add declarations only for the methods that are new in Parity code. The complete ABI is not informative and bears the responsibility to update it every time it updates in the contracts even if that does not affect methods called from Parity.

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! But let's make sure this doesn't spam malice report calls, and use a timeout and some maximum number of calls per block.

@varasev: I wonder whether we should even enforce some maximum number of reports per block in the permission contract?

scripts/get-abi Outdated
if len(sys.argv) != 4:
print('must have 3 arguments, not {}'.format(len(sys.argv) - 1), file=sys.stderr)
sys.exit(1)
main(*sys.argv[1:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! But let's document this script. If found this I'd have no idea what it's for and how to use it.

@@ -0,0 +1,6 @@
getValidators

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parity already suffers from inconsistent naming of contract ABI JSONs and the associated Rust variables. Can we at least have an unambiguous connection between this file and validator_set_aura.json? Maybe call it validator_set_aura.used.txt or something like that - with the base name of the ABI JSON?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@varasev
Copy link

varasev commented Feb 27, 2019

Looks good! But let's make sure this doesn't spam malice report calls, and use a timeout and some maximum number of calls per block.

Let's limit this to 15 calls per block and remove successfully calls from the list according to FIFO principle. That would be enough to not to spam the block with 100 000 000 gas limit by 20 validators (20 validators * 15 calls * 250 000 gas/call = 75 million gas in total).

@varasev: I wonder whether we should even enforce some maximum number of reports per block in the permission contract?

I think we shouldn't limit this for a block.

We have such a limit per staking epoch (to resist spammer validator): if some validator reports too often unlike the rest validators, this is treated as spam attempt by the contracts.

However, this doesn't apply to the honest validators. This is a protection against malicious validator for the case when they want to use their ability to send fake transactions with 0 gas price to ValidatorSet contract. TxPermission checks whether reportMalicious can be called at the moment and doesn't allow to call that if reportMaliciousCallable returns false. This helps us to not to mine the transactions which shouldn't be created due to requirements (and to not to consume block's gas that way).

@afck
Copy link
Collaborator

afck commented Feb 27, 2019

But the problem is that with the current retry logic, we are creating transactions for the same malice report multiple times, with multiple nonces, which would be spam in that sense.

We should either:

  • Only retry if a long time has passed since we sent the report last time. Or:
  • Only retry if if the nonce we used for our last attempt has been used otherwise.

@varasev
Copy link

varasev commented Feb 27, 2019

But the problem is that with the current retry logic, we are creating transactions for the same malice report multiple times, with multiple nonces, which would be spam in that sense.

This won't be treated as spam by the contracts until the transactions are mined. If some transaction with the same tuple (maliciousValidator, blockNumber) is already mined, the next same transactions on the next blocks will be canceled by TxPermission and won't be treated as spam that way.

@afck
Copy link
Collaborator

afck commented Feb 27, 2019

Ah, okay, thanks! Then that should be fine.
I'd add some minimum time before retry anyway. (I'd even say several minutes.)

@varasev
Copy link

varasev commented Feb 27, 2019

Ah, okay, thanks! Then that should be fine.
I'd add some minimum time before retry anyway. (I'd even say several minutes.)

Several minutes is too long. Maybe you meant a few blocks? Malicious reports should be sent as soon as possible, especially at the end of staking epoch. Imagine that some validator misbehaved at the end of the epoch: honest validators don't have much time to send their reports until they stop being validators because isReportValidatorValid allows sending report during maximum 20 blocks after the validator is no longer a validator. Also, reportMaliciousCallable allows reporting only 100 blocks behind.

@afck
Copy link
Collaborator

afck commented Feb 28, 2019

Malicious reports should be sent as soon as possible

Yes, but this is only for the (hopefully exceptional) case where the report fails to get mined for some reason. Maybe we should use exponential backoff: Retry after 20 seconds, then 40, then 80, then 160, …?

Also, reportMaliciousCallable allows reporting only 100 blocks behind.

Right, I hadn't realized that!
@DemiMarie: That means we should definitely always delete reports from the queue that are more than 100 blocks old, I guess?

@varasev
Copy link

varasev commented Feb 28, 2019

The reportMalicious function doesn't contain any code which could revert, so if the transaction has a correct nonce, it will be mined or canceled by TxPermission.

For example:

If some tuple (maliciousValidator, blockNumber) is sent in transaction A, the transaction B with the same tuple won't revert and will succeed, but this tuple won't be saved in the state because it has already been saved before by the transaction A.

Such a situation with mining those two transactions is only possible if they are mined in the same block. If the transaction B is called in some later block, it will be canceled due to the rules in TxPermission and thus won't be included into the block.

@varasev
Copy link

varasev commented Mar 6, 2019

I think it's too early to merge this into aura-pos branch because we didn't try to run and test it yet.

@afck
Copy link
Collaborator

afck commented Mar 6, 2019

Too late. 😬 Let's test it thoroughly and revert if we see problems.
Or do you prefer reverting it right away?

@varasev
Copy link

varasev commented Mar 6, 2019

I'd restore the removed branch no-consume-nonces and revert this PR right away because later it will be harder to do due to other possible changes in the code (e.g., after merging PR #103 or other changes).

@varasev
Copy link

varasev commented Mar 6, 2019

GitHub allows restoring removed branch and reverting PR's changes easily if you have enough rights for the repo.

@afck
Copy link
Collaborator

afck commented Mar 6, 2019

OK, I'm fine with it. Feel free to revert.

@varasev
Copy link

varasev commented Mar 6, 2019

I have no rights for this repo :) And you?

@afck
Copy link
Collaborator

afck commented Mar 6, 2019

Not admin permissions. It looks like I can't even see the deleted branch.

@phahulin phahulin restored the no-consume-nonces branch March 6, 2019 10:56
@phahulin
Copy link

phahulin commented Mar 6, 2019

@varasev
Copy link

varasev commented Mar 6, 2019

@phahulin Can you also revert this PR? We could re-open it and then test before merging into aura-pos.

@varasev
Copy link

varasev commented Mar 6, 2019

@phahulin Can you also revert this PR? We could re-open it and then test before merging into aura-pos.

I mean there should be Revert button like this:

image

@DemiMarie
Copy link
Author

@varasev I did test it locally. It works fine.

@varasev
Copy link

varasev commented Mar 6, 2019

@DemiMarie could you please write here the steps you performed for testing, so we could repeat them.

@DemiMarie
Copy link
Author

@varasev cargo build and then running npm run all in posdao-test-setup

@varasev
Copy link

varasev commented Mar 6, 2019

Got it, but those are the general tests which don't test the reportMalicious feature.

@DemiMarie
Copy link
Author

@varasev We don’t have any tests for reportMalicious (yet), other than @afck’s branch (which is not to be merged). That was true before and it is true now. That said, I will write some.

@afck would you be okay with merging afck-sim-malice, but with the code to misbehave not included unless:

  1. Parity Ethereum is built without optimizations, and
  2. A Cargo feature unsafe-enable-malice-tests-do-not-use-this-in-production is enabled?

@afck
Copy link
Collaborator

afck commented Mar 6, 2019

I feel really uncomfortable about merging code into aura-pos that makes a node misbehave…
Can't we just keep rebasing afck-sim-malice, so we can keep using it for manual testing?

@DemiMarie
Copy link
Author

@afck We can, though it would be a significant amount of maintainance effort. That’s why I suggested using Cargo features to ensure that it is disabled under any normal circumstances.

In particular, we could ensure that the when the test feature is enabled, the node refuses to connect to anything but loopback.

@afck
Copy link
Collaborator

afck commented Mar 6, 2019

Hmm… okay, maybe it's better this way. It still makes me uncomfortable, but if the team prefers it, feel free to merge it (with the precautions you're suggesting).

@DemiMarie
Copy link
Author

@varasev @vkomenda @phahulin what do you think?

@phahulin
Copy link

phahulin commented Mar 6, 2019

@varasev I don't see this option unfortunately. I see it in some other PRs e.g. #90, but not in this one 🤔

@DemiMarie
Copy link
Author

@phahulin That might be because of the way I merged the PR (by fast-forward).

@DemiMarie
Copy link
Author

@phahulin I could do a git reset --hard and a git push --force.

@phahulin
Copy link

phahulin commented Mar 6, 2019

@DemiMarie yes, I think that's the way to do it.
Here is the backup just in case 😬 https://github.com/poanetwork/parity-ethereum/tree/aura-pos-with-pr98

@DemiMarie DemiMarie mentioned this pull request Mar 6, 2019
@varasev
Copy link

varasev commented Mar 6, 2019

I feel really uncomfortable about merging code into aura-pos that makes a node misbehave…

I agree with this. Let's keep that feature in a separate branch.

@vkomenda
Copy link

vkomenda commented Mar 6, 2019

I'm actually OK with merging afck-sim-malice under a Cargo feature. The more I think about this the more I get convinced that simplified maintenance outweighs separation from specific code for specific tests. Anyone who wants a Parity node with "malicious" behaviour could compile that branch as easily. So, keeping it separate only increases the maintenance burden, and those Parity version switches are no joke.

@phahulin
Copy link

phahulin commented Mar 6, 2019

I agree with @VladimirNovgorodov . We can remove this "feature" before the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants