Failover script for a backed up node #41

vkomenda · 2019-03-19T14:03:31Z

See #39. This PR doesn't contain any tests for the failover script.

.editorconfig

package.json

scripts/watchBackedUpNode.js

phahulin · 2019-03-28T13:03:30Z

Seems to be working correctly 👍
But this is not the final script that we want to have for #39, because it connects to localhosts. Though I think it's ok to merge this PR as a proof-of-concept and then have the final script in a separate pull request.
Or we could continue to update this PR and get a final version here. What do you think @vkomenda ?

vkomenda · 2019-03-28T13:23:27Z

Can it call the secondary node locally and the primary node remotely? In that case we can just change the primary node HTTP URL to stop calling it through localhost. That is assuming it is OK to run the script on the secondary node.

phahulin · 2019-03-28T14:14:33Z

I don't think it can call primary, it's best to close all ports except SSH on validator node for security reasons.
So probably only scanBlocks can be used to verify if primary is down.

phahulin · 2019-03-28T14:23:39Z

In such a case it seems like we'll have to have similar scripts on both primary and secondary

varasev · 2019-03-28T14:49:57Z

Could we add an extra RPC call like parity_clearEngineSigner into Parity code which would remove the engine_signer option instead of switching it to DUMMY_SIGNER_ADDRESS. This way a node wouldn't have to have the dummy address. Would it be difficult?

vkomenda · 2019-03-28T15:10:14Z

@phahulin That's a good plan. Would you like me to continue with that in this PR or merge this and open another for splitting the failover script in two?

@varasev It's not difficult, I think, to add parity_clearEngineSigner and it should work better than a dummy signer address. I'll do that.

phahulin · 2019-03-28T15:12:06Z

IMO it's best to get a final script in this PR

vkomenda · 2019-04-01T12:01:56Z

So probably only scanBlocks can be used to verify if primary is down.

OK. In that case how do you check whether the primary is back up?

phahulin · 2019-04-01T12:33:05Z

Maybe we could have on-chain storage for that in a special smart contract.
The contract would contain a mapping address => uint256, i.e. mining key => node unique non-zero id. All nodes with the same mining key should have different ids.
When script on one of the nodes detects that validator is missing blocks, it would send a tx to that smart contract and replace current value of id with it's own id (unless current id matches it's own id), wait for tx to be mined and then switch engine signer.
When scripts on other nodes detect that id has changed, they would clear engine signer on their nodes.
In this case we don't have distinction between primaries/secondaries

What do you think? Also cc @varasev

vkomenda · 2019-04-01T13:19:37Z

OK. Let's use smart contract storage. Does this now fit into the old notion of benign misbehaviour? If a miner address failed to produce expected blocks,

should this be reported (rather than simply logged locally) as benign misbehaviour and
should there be an event MisbehaviourHandledBy(replica_id) that leads to the change of the value at the mining address in the above mapping?

varasev · 2019-04-01T15:08:44Z

Does this now fit into the old notion of benign misbehaviour?

We can't use the reportBenign because this is difficult as in case with the reportMalicious: poanetwork/parity-ethereum#97 - it would require a lot of changes like in poanetwork/parity-ethereum#107 (we didn't test it yet).

Maybe we could have on-chain storage for that in a special smart contract.

It is a possible solution if the validator has some switcher address (or wather address?) bound with the mining address (like staking address). The separate switcher address is needed to have a separate nonce (because of that nonce issue with the mining address).

So, when some candidate create their pool, they will have to have three addresses:

mining address
staking address
switcher address

The switcher address would be allowed to send only the described transaction (to change the node id in the mapping) with zero gas price (this would be achieved by using TxPermission contract).

This approach would require several changes in the contracts. But I think of the next solution:

it's best to close all ports except SSH on validator node for security reasons.

I think, ideally, we could open one TCP port on each node (say, 2019) and set firewall rules so that the inbound traffic would only be allowed between the two nodes since we know their IP addresses and they are constant, I guess.

For example, we have two nodes for the same validator:

Node A (IP = 192.168.10.101, engine_signer = mining address)
Node B (IP = 192.168.10.102, engine_signer is not set)

Each node has the watchguard script listening on port 2019.

The firewall on the node A allows inbound connections on the port 22 and connection from the remote 192.168.10.102:2019 to the local port 2019.

The firewall on the node B allows inbound connections on the port 22 and connection from the remote 192.168.10.101:2019 to the local port 2019.

That way the scripts on the nodes would connect to each other, and other unwanted inbound connections are restricted (except SSH).

This would be simpler than the scheme with node's id and the mapping in a contract.

vkomenda · 2019-04-09T12:37:14Z

Ready for review. All tests pass for me.

phahulin · 2019-04-16T16:28:54Z

For me this test fails
I also double-checked with jsonrpc call and it actually returns true for the secondary (and for the primary)

➜  curl --data '{"method":"eth_mining","params":[],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8544
{"jsonrpc":"2.0","result":true,"id":1}

vkomenda requested review from phahulin and varasev March 19, 2019 14:03

phahulin reviewed Mar 19, 2019

View reviewed changes

.editorconfig Show resolved Hide resolved

phahulin self-requested a review March 21, 2019 16:54

phahulin suggested changes Mar 21, 2019

View reviewed changes

package.json Outdated Show resolved Hide resolved

scripts/watchBackedUpNode.js Outdated Show resolved Hide resolved

scripts/watchBackedUpNode.js Outdated Show resolved Hide resolved

phahulin self-requested a review March 22, 2019 11:27

phahulin reviewed Mar 22, 2019

View reviewed changes