-
Notifications
You must be signed in to change notification settings - Fork 130
Single threaded world state persistence #950
Single threaded world state persistence #950
Conversation
…cksDB timeouts due to contention on the write lock.
…hread-world-state-persistence
…hread-world-state-persistence
…hread-world-state-persistence
…hread-world-state-persistence
…hread-world-state-persistence
} catch (final InterruptedException ignore) { | ||
Thread.currentThread().interrupt(); | ||
} catch (final RuntimeException e) { | ||
LOG.error("Unexpected error while persisting world state", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider logging some context for this exception so that it is easier to track down (eg header, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have any context here - there's only one world state download so the header is useless and we're attempting to persist about 394 data points - far too much to include in a log.
...th/src/main/java/tech/pegasys/pantheon/ethereum/eth/sync/worldstate/TrieNodeDataRequest.java
Outdated
Show resolved
Hide resolved
requestNodeData(header); | ||
if (error != null | ||
&& !(ExceptionUtils.rootCause(error) instanceof RejectedExecutionException)) { | ||
LOG.error("World state data request failed", error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding some context to the log message to make debugging easier (eg peer, toRequest, header, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again there's no useful context we can include here beyond the exception. This is something completely unexpected that happened somewhere through the entire create, send and process request. It shouldn't ever happen but if it does we want to see the stack trace. Most errors are already handled in the details - this is the equivalent of the uncaught exception handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had what eric asked for at debug or trace it would help to re-create a reproducible state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What log message are you actually expecting to see here? I've been working in this code a fair bit and can't imagine what we could add here that would actually be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR description
Previously the world state downloader used multiple different threads to persist the data into RocksDB. This often led to timeouts acquiring the write lock from RocksDB and the data would be lost. Now a single service thread is used to do all persistence, avoiding lock contention.
Also includes a fix for
RocksDbTaskQueue
to ensure it handles data that may have existed on disk before it was created correctly (by ignoring it consistently).