-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pebble compaction causes intermittent, but significant performance impacts #29575
Comments
Could you share some logs from Geth when this happens? |
I usually run with |
Seems to have magically stopped on its own, haven't seen the characteristic 5-6 block pattern for a while now. Will keep an eye out and reopen if it returns. |
Did some more digging, seems the issue is caused by pebble's database compaction. Not sure how/why it gets triggered, but the result is ~1 min of slow block processing. Some recent examples where I've timed
Found the culprit by profiling during the "bad" times, and confirmed that compaction is indeed being triggered by adding logging here: go-ethereum/ethdb/pebble/pebble.go Line 97 in 44a50c9
Saw some earlier optimizations around compaction (#20130), not sure if there's anything more that can be done to smooth these out as well. |
I think I also face this issue on larger scale trying to sync full archive node I collected pprof CPU profile and during that slow period most time is spent on In my case I wonder if there is any way to make compactions less aggressive? I'd rather sacrifice some disk space for non-compacted DB for longer time to achieve faster sync |
Update for those facing the same issue
|
There's not much we can do about performance degradation during compaction. Pebble has already much better compaction than LevelDB has, but there is a significant performance hit inherent in the operation (shuffling the data around on the disk). The thing thats curious to me is that the compaction seems to be single threaded which is weird since Pebble claims to have concurrent database compaction afair. cc @rjl493456442 |
Sharing typical pprof profile when pebble seems to be CPU bound in one thread. |
System information
Geth version:
v1.13.14
CL client & version:
teku@24.1.3
OS & Version: Linux
Expected behaviour
Geth receives/processes blocks in a timely manner.
Actual behaviour
I run a number of geth/teku nodes and have recently noticed an infrequent (order of daily) pattern occurring across all of them where geth receives a burst of 6ish blocks around the same time, the oldest one, of course, being 72s stale.
Of course it could just be a temporary network issue, but I keep seeing this same number of blocks across multiple machines in multiple locations.
Could also be a teku issue, but seems a bit unlikely given the logs below.
Steps to reproduce the behaviour
I've added some custom logging in
forkchoiceUpdated()
:go-ethereum/eth/catalyst/api.go
Line 313 in 823719b
With this code in place, yesterday I got the output:
and corresponding Teku logs:
I'm interpreting this as something is infrequently hanging geth for ~1 min.
Some other recent incidents occurred at block 19678501 and 19675309. The signature is always a 6ish block pileup on geth and a late block message on Teku. These late block messages were a bit different from the above though:
and
I was running an older (and less verbose) version of Teku at the time, Lucas Saldanha from the Teku Discord told me that both of these blocks were late because of blob data unavailability.
The text was updated successfully, but these errors were encountered: