Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird Doppelganger issues with validator client on Sepolia #4394

Closed
jakubgs opened this issue Dec 6, 2022 · 6 comments
Closed

Weird Doppelganger issues with validator client on Sepolia #4394

jakubgs opened this issue Dec 6, 2022 · 6 comments

Comments

@jakubgs
Copy link
Member

jakubgs commented Dec 6, 2022

Describe the bug
For a several days now the validator client on our linux-01.he-eu-hel1.nimbus.sepolia has been shutting down due to doppelganger detection triggering for various validators. Here's serveral examples:

{"lvl":"WRN","ts":"2022-11-25 00:14:24.128+00:00","msg":"Validator's activity has been seen","validator_index":614,"epoch":35420,"service":"doppelganger_service"}
{"lvl":"WRN","ts":"2022-11-30 10:09:36.121+00:00","msg":"Validator's activity has been seen","validator_index":638,"epoch":36638,"service":"doppelganger_service"}
{"lvl":"WRN","ts":"2022-12-01 00:14:24.509+00:00","msg":"Validator's activity has been seen","validator_index":614,"epoch":36770,"service":"doppelganger_service"}
{"lvl":"WRN","ts":"2022-12-03 00:14:24.093+00:00","msg":"Validator's activity has been seen","validator_index":614,"epoch":37220,"service":"doppelganger_service"}
{"lvl":"WRN","ts":"2022-12-06 00:14:24.107+00:00","msg":"Validator's activity has been seen","validator_index":679,"epoch":37895,"service":"doppelganger_service"}

The thing is those validators exist only on one host(since we have only one), and are only used by the validator client:

admin@linux-01.he-eu-hel1.nimbus.sepolia:~ % sudo find /data -name 0x80822499f96a1a8c0048f01f389dfcaaa5d8269c332dbb507fe46f270bcfd5f67c53f827fd867221592dbde77b6b37ab
/data/validator-client-sepolia-unstable-01/data/secrets/0x80822499f96a1a8c0048f01f389dfcaaa5d8269c332dbb507fe46f270bcfd5f67c53f827fd867221592dbde77b6b37ab
/data/validator-client-sepolia-unstable-01/data/validators/0x80822499f96a1a8c0048f01f389dfcaaa5d8269c332dbb507fe46f270bcfd5f67c53f827fd867221592dbde77b6b37ab

But if we look at a validator causing issues at a time it appears offline:

image

To Reproduce
Steps to reproduce the behavior:

  1. Platform details (OS, architecture): Ubuntu 22.04
  2. Branch/commit used: unstable

Additional context
This is only affecting the validator client. One thing worth noting is that the time this hapopens is usually around 00:14:24.

@jakubgs
Copy link
Member Author

jakubgs commented Dec 6, 2022

If we look at when the build finished today it was at 00:11:27:

admin@linux-01.he-eu-hel1.nimbus.sepolia:~ % j -n 2 -u build-validator-client-sepolia-unstable-01.service
Dec 06 00:11:27 linux-01.he-eu-hel1.nimbus.sepolia systemd[1]: Finished Build validator-client-sepolia-unstable-01.
Dec 06 00:11:27 linux-01.he-eu-hel1.nimbus.sepolia systemd[1]: build-validator-client-sepolia-unstable-01.service: Consumed 12min 37.014s CPU time.

Which doesn't exactly match the failure. The restart did happen at that time as expected:

admin@linux-01.he-eu-hel1.nimbus.sepolia:/var/log/service/validator-client-sepolia-unstable-01 % zgrep -A2 -e Starting -e Shutting -e activity service.20221206_0100.log.gz
{"lvl":"INF","ts":"2022-12-06 00:11:27.196+00:00","msg":"Shutting down metrics HTTP server"}
{"lvl":"INF","ts":"2022-12-06 00:11:27.197+00:00","msg":"Closing slashing protection","path":"/data/validator-client-sepolia-unstable-01/data/validators"}
{"lvl":"NOT","ts":"2022-12-06 00:11:27.215+00:00","msg":"REST service stopped","address":"127.0.0.1:5053"}
--
{"lvl":"INF","ts":"2022-12-06 00:11:27.281+00:00","msg":"Starting metrics HTTP server","url":"http://0.0.0.0:8109/metrics"}
{"lvl":"INF","ts":"2022-12-06 00:11:27.289+00:00","msg":"Initializing slashing protection","path":"/data/validator-client-sepolia-unstable-01/data/validators"}
{"lvl":"INF","ts":"2022-12-06 00:11:27.307+00:00","msg":"Initializaing validators","path":"/data/validator-client-sepolia-unstable-01/data/validators"}
--
{"lvl":"NOT","ts":"2022-12-06 00:11:27.328+00:00","msg":"Starting REST HTTP server","url":"http://127.0.0.1:5053/"}
{"lvl":"INF","ts":"2022-12-06 00:11:27.329+00:00","msg":"Beacon node has been identified","agent":"Nimbus/v22.11.0-38827d-stateofus","service":"fallback_service","node":"127.0.0.1:9311","node_index":0,"node_roles":"AGBSD"}
{"lvl":"INF","ts":"2022-12-06 00:11:27.330+00:00","msg":"Beacon node has compatible configuration","service":"fallback_service","node":"127.0.0.1:9311[Nimbus/v22.11.0-38827d-stateofus]","node_index":0,"node_roles":"AGBSD"}
--
{"lvl":"WRN","ts":"2022-12-06 00:14:24.107+00:00","msg":"Validator's activity has been seen","validator_index":679,"epoch":37895,"service":"doppelganger_service"}
{"lvl":"NOT","ts":"2022-12-06 00:14:24.107+00:00","msg":"Received shutdown event, exiting"}
{"lvl":"INF","ts":"2022-12-06 00:14:24.107+00:00","msg":"Shutting down metrics HTTP server"}
{"lvl":"INF","ts":"2022-12-06 00:14:24.107+00:00","msg":"Closing slashing protection","path":"/data/validator-client-sepolia-unstable-01/data/validators"}
{"lvl":"NOT","ts":"2022-12-06 00:14:24.119+00:00","msg":"REST service stopped","address":"127.0.0.1:5053"}

So it got restarted, ran for 3 minutes and then doppelganger kicked in.

@jakubgs
Copy link
Member Author

jakubgs commented Dec 7, 2022

Looks like it didn't happen after today's restart.

@jakubgs
Copy link
Member Author

jakubgs commented Dec 9, 2022

Happened again today, same time as always: 00:14:24

{"lvl":"INF","ts":"2022-12-09 00:14:12.001+00:00","msg":"Slot start","slot":1234271,"attestationIn":"4s","blockIn":"<unknown>","validators":25,"delay":"936us345ns"}
{"lvl":"ERR","ts":"2022-12-09 00:14:12.321+00:00","msg":"Failed to subscribe validators to sync committee subnets","slot":1234271,"epoch":38570,"subscriptions_count":6,"service":"duties_service"}
{"lvl":"INF","ts":"2022-12-09 00:14:16.001+00:00","msg":"Doppelganger protection disabled validator duties","validators":1,"service":"attestation_service"}
{"lvl":"INF","ts":"2022-12-09 00:14:24.001+00:00","msg":"Slot start","slot":1234272,"attestationIn":"1m4s","blockIn":"<unknown>","validators":25,"delay":"1ms60us982ns"}
{"lvl":"WRN","ts":"2022-12-09 00:14:24.351+00:00","msg":"Validator's activity has been seen","validator_index":638,"epoch":38570,"service":"doppelganger_service"}
{"lvl":"NOT","ts":"2022-12-09 00:14:24.351+00:00","msg":"Received shutdown event, exiting"}
{"lvl":"INF","ts":"2022-12-09 00:14:24.351+00:00","msg":"Shutting down metrics HTTP server"}
{"lvl":"INF","ts":"2022-12-09 00:14:24.352+00:00","msg":"Closing slashing protection","path":"/data/validator-client-sepolia-unstable-01/data/validators"}
{"lvl":"NOT","ts":"2022-12-09 00:14:24.363+00:00","msg":"REST service stopped","address":"127.0.0.1:5053"}

@jakubgs
Copy link
Member Author

jakubgs commented Dec 9, 2022

Potential fix for this according to @zah:

@arnetheduck
Copy link
Member

@jakubgs I'll presume this is no longer happening / doppel works as expected?

@jakubgs
Copy link
Member Author

jakubgs commented Jan 5, 2023

I see no signs of it happening recently:

admin@linux-01.he-eu-hel1.nimbus.sepolia:~ % zcat /data/validator-client-*/logs/service.*.log.gz | grep 'activity has been seen'                  
admin@linux-01.he-eu-hel1.nimbus.sepolia:~ % cat /data/validator-client-*/logs/service.log | grep 'activity has been seen' 
admin@linux-01.he-eu-hel1.nimbus.sepolia:~ % 

@jakubgs jakubgs closed this as completed Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants