Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Single threaded world state persistence #950

Merged

Conversation

ajsutton
Copy link
Contributor

PR description

Previously the world state downloader used multiple different threads to persist the data into RocksDB. This often led to timeouts acquiring the write lock from RocksDB and the data would be lost. Now a single service thread is used to do all persistence, avoiding lock contention.

Also includes a fix for RocksDbTaskQueue to ensure it handles data that may have existed on disk before it was created correctly (by ignoring it consistently).

} catch (final InterruptedException ignore) {
Thread.currentThread().interrupt();
} catch (final RuntimeException e) {
LOG.error("Unexpected error while persisting world state", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider logging some context for this exception so that it is easier to track down (eg header, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have any context here - there's only one world state download so the header is useless and we're attempting to persist about 394 data points - far too much to include in a log.

requestNodeData(header);
if (error != null
&& !(ExceptionUtils.rootCause(error) instanceof RejectedExecutionException)) {
LOG.error("World state data request failed", error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding some context to the log message to make debugging easier (eg peer, toRequest, header, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again there's no useful context we can include here beyond the exception. This is something completely unexpected that happened somewhere through the entire create, send and process request. It shouldn't ever happen but if it does we want to see the stack trace. Most errors are already handled in the details - this is the equivalent of the uncaught exception handler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had what eric asked for at debug or trace it would help to re-create a reproducible state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What log message are you actually expecting to see here? I've been working in this code a fair bit and can't imagine what we could add here that would actually be useful.

Copy link
Contributor

@ekellstrand ekellstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ajsutton ajsutton merged commit 700896a into PegaSysEng:master Feb 26, 2019
@ajsutton ajsutton deleted the single-thread-world-state-persistence branch February 26, 2019 00:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants