Propagate light node sync errors to RPC caller #1783
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Currently, when the light node sync layer encounters an error (e.g. invalid proof), it does not report the error to the RPC caller but will silently retry. The caller will timeout after a while but there is no way to tell what caused the timeout.
This PR allows the sync layer to propagate errors to the RPC layer. Note that, even if an RPC call fails, the sync layer will still retry the query; this way a repeated query can succeed with reduced latency.
Architecture
Currently, light node modules use two separate runtimes.
The sync layer runs on top of our mio event loop, similar to full nodes. Query modules (RPC,
QueryService
), on the other hand, run on a tokio future executor. In terms of query workflow, RPC requests will notify the sync layer to retrieve the corresponding item. When the item has been received and verified, all futures waiting for it are notified (wake
).This latter notification is done using a shared-memory cache. For instance, the transactions sync handler has a field:
Originally,
PendingItem
only had two states:Pending
andReady
. In this PR, this is extended with a third stateError
. The corresponding futureFutureItem
will now resolve toResult<T, Err>
instead of justT
.Implementation challenges
error-chain
errors are neitherClone
[1] [2] norSync
[3] [4]. Unfortunately, we need both of these. We needClone
as the same error will be consumed by the sync handler (for disconnecting the peer) and by one or more RPC handlers (to report the error or retry). We needSync
as futures can be scheduled on different threads at different points in time when using a tokio executor.As a hacky workaround, I use a wrapper when I need clonable errors:
... and when I need to use this as a normal
Error
, I wrap this into a special error kind:The other changes in this PR are mostly adding more details to various error types and some refactoring.
This change is