Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve wait for activation #13448

Merged
merged 26 commits into from
Jan 16, 2024
Merged

Improve wait for activation #13448

merged 26 commits into from
Jan 16, 2024

Conversation

james-prysm
Copy link
Contributor

@james-prysm james-prysm commented Jan 10, 2024

What type of PR is this?

Other

What does this PR do? Why is it needed?

Reported by the Figment team - "On startup, there is a check for keys, and if there are no keys it sleeps for 30 seconds. During this time we are using the key manager API to try and load the keys but it will hang for the full 30 seconds till the check happens again."

Upon reviewing the code more deeply there are many inefficiencies in the code with multiple loops and not event-driven via the accountChanged channel when we should be leveraging it. This function only cares if we have 0 validating keys and need at least 1 active one, outside of this function we have a recheck keys function that handles future keystore updates and this architecture should be revisited in the future to find more improvements.

baseline takes 30 seconds on the recheck

[2024-01-11 15:02:54]  WARN validator: You are using an insecure gRPC connection. If you are running your beacon node and validator on the same machines, you can ignore this message. If you want to know how to enable secure connections, see: https://docs.prylabs.network/docs/prysm-usage/secure-grpc
[2024-01-11 15:02:54]  INFO rpc: Initialized REST API routes
[2024-01-11 15:02:54]  INFO node: Starting Prysm web UI on address, open in browser to access address=http://127.0.0.1:7500
[2024-01-11 15:02:54]  INFO node: Starting validator node version=Prysm/Unknown/Local build. Built at: Moments ago
[2024-01-11 15:02:54]  WARN validator: You are using an insecure gRPC connection. If you are running your beacon node and validator on the same machines, you can ignore this message. If you want to know how to enable secure connections, see: https://docs.prylabs.network/docs/prysm-usage/secure-grpc
[2024-01-11 15:02:54]  INFO gateway: Starting gRPC gateway address=127.0.0.1:7500
[2024-01-11 15:02:54]  WARN prometheus: Port already in use; cannot start prometheus service address=127.0.0.1:8081
[2024-01-11 15:02:54]  INFO rpc: gRPC server listening on address address=127.0.0.1:7000
[2024-01-11 15:02:54]  INFO rpc: Once your validator process is running, navigate to the link below to authenticate with the Prysm web interface
[2024-01-11 15:02:54]  INFO rpc: http://127.0.0.1:7500/initialize?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.vgs336GzjwJZaYV7lrmUS1swwVXQqsA1_TB42-7ewIE
[2024-01-11 15:02:54]  INFO rpc: Validator CLient JWT for RPC and REST authentication set at:/Users/james/Library/Eth2Validators/prysm-wallet-v2/auth-token
[2024-01-11 15:02:54]  INFO validator: Syncing with beacon node to align on chain genesis info
[2024-01-11 15:02:54]  INFO validator: Beacon chain started genesisTime=2021-03-23 09:00:00 -0500 CDT
[2024-01-11 15:02:54]  INFO validator: Wait for Activation function called
[2024-01-11 15:02:54]  WARN validator: No validating keys fetched. Trying again
Importing accounts... 100% [==========================================]  [0s:0s]
[2024-01-11 15:02:58]  INFO local-keymanager: Successfully imported validator key(s) publicKeys=0xaa0ef7404c3a
[2024-01-11 15:02:59]  INFO local-keymanager: Reloaded validator keys into keymanager
[2024-01-11 15:03:24]  INFO validator: Wait for Activation function called
[2024-01-11 15:03:24]  INFO validator: Validator activated index=392138 publicKey=0xaa0ef7404c3a
[2024-01-11 15:03:24]  INFO validator: Attestation schedule attesterDutiesAtSlot=1 pubKeys=[0xaa0ef7404c3a] slot=7374930 slotInEpoch=18 timeTillDuty=2m36s totalAttestersInEpoch=1

new approach go waits for the account to be populated to continue but can happen immediately

[2024-01-11 15:53:58]  INFO node: Checking DB databasePath=/Users/james/Library/Eth2
Adding optimizations for validator slashing protection 100% [=========]  [0s:0s]
[2024-01-11 15:53:58]  WARN validator: You are using an insecure gRPC connection. If you are running your beacon node and validator on the same machines, you can ignore this message. If you want to know how to enable secure connections, see: https://docs.prylabs.network/docs/prysm-usage/secure-grpc
[2024-01-11 15:53:58]  INFO rpc: Initialized REST API routes
[2024-01-11 15:53:58]  INFO node: Starting Prysm web UI on address, open in browser to access address=http://127.0.0.1:7500
[2024-01-11 15:53:58]  INFO node: Starting validator node version=Prysm/Unknown/Local build. Built at: Moments ago
[2024-01-11 15:53:58]  INFO gateway: Starting gRPC gateway address=127.0.0.1:7500
[2024-01-11 15:53:58]  WARN validator: You are using an insecure gRPC connection. If you are running your beacon node and validator on the same machines, you can ignore this message. If you want to know how to enable secure connections, see: https://docs.prylabs.network/docs/prysm-usage/secure-grpc
[2024-01-11 15:53:58]  WARN prometheus: Port already in use; cannot start prometheus service address=127.0.0.1:8081
[2024-01-11 15:53:58]  INFO rpc: gRPC server listening on address address=127.0.0.1:7000
[2024-01-11 15:53:58]  INFO rpc: Once your validator process is running, navigate to the link below to authenticate with the Prysm web interface
[2024-01-11 15:53:58]  INFO rpc: http://127.0.0.1:7500/initialize?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.vgs336GzjwJZaYV7lrmUS1swwVXQqsA1_TB42-7ewIE
[2024-01-11 15:53:58]  INFO rpc: Validator CLient JWT for RPC and REST authentication set at:/Users/james/Library/Eth2Validators/prysm-wallet-v2/auth-token
[2024-01-11 15:53:58]  INFO validator: Syncing with beacon node to align on chain genesis info
[2024-01-11 15:53:58]  INFO validator: Beacon chain started genesisTime=2021-03-23 09:00:00 -0500 CDT
[2024-01-11 15:53:58]  WARN validator: No validating keys fetched. Waiting for keys...
Importing accounts... 100% [==========================================]  [0s:0s]
[2024-01-11 15:54:07]  INFO local-keymanager: Successfully imported validator key(s) publicKeys=0xaa0ef7404c3a
[2024-01-11 15:54:08]  INFO local-keymanager: Reloaded validator keys into keymanager
[2024-01-11 15:54:08]  INFO validator: Validator activated index=392138 pubKey=0xaa0ef7404c3a publicKey=0xaa0ef7404c3a status=ACTIVE
[2024-01-11 15:54:08]  INFO validator: Attestation schedule attesterDutiesAtSlot=1 pubKeys=[0xaa0ef7404c3a] slot=7375175 slotInEpoch=7 timeTillDuty=52s totalAttestersInEpoch=1

Which issues(s) does this PR fix?

Fixes #

Other notes for review

@james-prysm james-prysm added UX cosmetic / user experience related Validator Client labels Jan 10, 2024
Comment on lines 72 to 74
if len(validatingKeys) == 0 {
continue
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need maybe some small sleep here? we are here in the endless for loop and if channels won't be filled for some longer period I think we can experience some CPU spike here?

@@ -387,6 +386,12 @@ func (v *validator) checkAndLogValidatorStatus(statuses []*validatorStatus, acti
}
case ethpb.ValidatorStatus_ACTIVE, ethpb.ValidatorStatus_EXITING:
validatorActivated = true
if status.status.Status == ethpb.ValidatorStatus_ACTIVE {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe separate the two cases ethpb.ValidatorStatus_ACTIVE and ethpb.ValidatorStatus_EXITING to avoid the

if status.status.Status == ethpb.ValidatorStatus_ACTIVE

in the switch/case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably makes more sense just not having the if statement actually. activated should mean that it's using the account

return v.internalWaitForActivation(ctx, accountsChangedChan)
default:
res, err := (*stream).Recv()
if len(validatingKeys) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validatingKeys is updated only once, before the for loop.
Should it not be updated inside the for loop as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the account changes the function is recursive. my understanding is that this loop only really needs to run if you have 0 active keys anyways. once you have 1 active key it exits and proceeds and subsequent key changes are handled in the HandleKeyReload function call which is the loop in runner.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I missed the recursivity.

case <-accountsChangedChan:
// Accounts (keys) changed, restart the process.
// if the accounts changed try it again
return v.internalWaitForActivation(ctx, accountsChangedChan)
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default case doesn't do anything, we can remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually improved the efficiency as I believe golang efficiently handles the loop if this is the case

Comment on lines 136 to 137
// reset the ticker when they are all active
v.ticker = slots.NewSlotTicker(time.Unix(int64(v.genesisTime), 0), params.BeaconConfig().SecondsPerSlot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this code? The ticker is already set here:

v.ticker = slots.NewSlotTicker(time.Unix(int64(v.genesisTime), 0), params.BeaconConfig().SecondsPerSlot)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it, i don't think it has a usecase if it's set elsewhere, i wish it was more clear where it was set however

@james-prysm james-prysm marked this pull request as ready for review January 11, 2024 22:48
@james-prysm james-prysm requested a review from a team as a code owner January 11, 2024 22:48
@nalepae
Copy link
Contributor

nalepae commented Jan 12, 2024

I did this test:

  1. Empty the prysm-wallet-v2/direct/accounts directory.
  2. Run the VC with the --rpc flag.
  3. Run POST http://localhost:7500/eth/v1/keystores with a body like:
{
    "keystores": [
      "{\"crypto\": {\"kdf\": {\"function\": \"scrypt\", \"params\": {\"dklen\": 32, \"n\": 262144, \"r\": 8, \"p\": 1, \"salt\": \"xxx\"}, \"message\": \"\"}, \"checksum\": {\"function\": \"sha256\", \"params\": {}, \"message\": \"xxx\"}, \"cipher\": {\"function\": \"aes-128-ctr\", \"params\": {\"iv\": \"xxx\"}, \"message\": \"xxx\"}}, \"description\": \"\", \"pubkey\": \"xxx\", \"path\": \"m/12381/3600/0/0/0\", \"uuid\": \"xxx\", \"version\": 4}",
      "{\"crypto\": {\"kdf\": {\"function\": \"scrypt\", \"params\": {\"dklen\": 32, \"n\": 262144, \"r\": 8, \"p\": 1, \"salt\": \"xxx\"}, \"message\": \"\"}, \"checksum\": {\"function\": \"sha256\", \"params\": {}, \"message\": \"xxx\"}, \"cipher\": {\"function\": \"aes-128-ctr\", \"params\": {\"iv\": \"xxx\"}, \"message\": \"xxx\"}}, \"description\": \"\", \"pubkey\": \"xxx\", \"path\": \"m/12381/3600/1/0/0\", \"uuid\": \"xxx\", \"version\": 4}",
      "{\"crypto\": {\"kdf\": {\"function\": \"scrypt\", \"params\": {\"dklen\": 32, \"n\": 262144, \"r\": 8, \"p\": 1, \"salt\": \"xxx\"}, \"message\": \"\"}, \"checksum\": {\"function\": \"sha256\", \"params\": {}, \"message\": \"xxx\"}, \"cipher\": {\"function\": \"aes-128-ctr\", \"params\": {\"iv\": \"xxx\"}, \"message\": \"xxx\"}}, \"description\": \"\", \"pubkey\": \"xxx\", \"path\": \"m/12381/3600/2/0/0\", \"uuid\": \"xxx\", \"version\": 4}"
    ],
    "passwords": [
      "password",
      "password",
      "password"
    ]
}

Get this response:

{
  "data": [
    {
      "status": "IMPORTED",
      "message": ""
    },
    {
      "status": "IMPORTED",
      "message": ""
    },
    {
      "status": "IMPORTED",
      "message": ""
    }
  ]
}

And this log:

time="2024-01-12 13:38:01" level=info msg="Successfully imported validator key(s)" prefix=local-keymanager publicKeys="0x80f408a41341,0xa17524bcf06d,0x84d7fa31925c"

The file all-accounts.keystore.json is written in prysm-wallet-v2/direct/accounts directory and has consistent data.

Then... nothing (no validation).


If I restart the validator, eveything is fine.

@james-prysm
Copy link
Contributor Author

broke e2e converting to draft as I resolve

@james-prysm james-prysm marked this pull request as draft January 12, 2024 23:38
@james-prysm james-prysm marked this pull request as ready for review January 13, 2024 04:51
@james-prysm james-prysm requested a review from nalepae January 13, 2024 04:51
@james-prysm
Copy link
Contributor Author

Ready again as tests have been fixed

@james-prysm james-prysm added this pull request to the merge queue Jan 16, 2024
Merged via the queue into develop with commit 790a09f Jan 16, 2024
17 checks passed
@james-prysm james-prysm deleted the improve-wait-for-activation branch January 16, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UX cosmetic / user experience related Validator Client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants