You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, our speculative pool's garbage collection is conducted at regular intervals, which may potentially result in command execution disorder under certain extreme conditions. (Refer to #159 for details)
Background
In order to limit memory usage and increase the number of fast_path requests, after the command has been synchronized with backups, it is necessary to perform a GC on the follower's speculative pools to remove these commands.
Typical GC path
In our implementation, the leader advances the commitIndex and clean up commands in the speculative pool only after the log has been synchronized with the majority of nodes. Subsequently, an append entries request with the latest commitIndex propagates the update to the state machines in followers. At this stage, followers can clean up commands within their speculative pools.
Challenge
However, there are still some situations in which the command will not be cleared:
If the client only sends proposals to a follower and then crashes, the command will keep in the follower forever.
A proposal request is delayed significantly and arrives after the follower finished apply the log.
A minority of followers have been offline for an extended period, and their logs have already been compacted so that the leader will no longer send append entries requests for these logs.
...
During the leadership transfer, the new leader will recover the SP from the cluster. This may recover some unexecuted commands and append them into log and finally going through the typical GC path. For commands that have already been executed, they will not be restored.
But it seems that it will still recover commands that have already been compacted? (This may result in a particular command being executed twice, and it is unrelated to the SP GC issue discussed here -- It will go the typical GC path.)
The CURP paper mentions one potential solution to this issue.
Witnesses detect such uncollected records and ask masters
to retry garbage collection for them. When it rejects a record,
a witness recognizes the existing record as uncollected
garbage if there have been many garbage collections since
the record was written (three is a good number if a master
performs only one gc RPC at a time). Witnesses notify
masters of the requests that are suspected as uncollected
garbage through the response messages of gc RPCs; then the
masters retry the requests (most likely filtered by RIFL), sync
to backups, and thus include them in the next gc requests.
We do not have a structure similar to RIFL, and retrying may result in the execution of a command twice. (Perhaps we can deduplicate during the apply process?)
We can add a suspected commands collection to the response of append_entries RPC, and the leader adds these commands to the log upon receiving them, and then performs the typical GC path.
Solution 2
Introducing an additional RPC mechanism to synchronize the SP of followers and the leader. The details and security of this approach require further discussion.
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Description about the feature
Currently, our speculative pool's garbage collection is conducted at regular intervals, which may potentially result in command execution disorder under certain extreme conditions. (Refer to #159 for details)
Background
In order to limit memory usage and increase the number of
fast_path
requests, after the command has been synchronized with backups, it is necessary to perform a GC on the follower's speculative pools to remove these commands.Typical GC path
In our implementation, the leader advances the
commitIndex
and clean up commands in the speculative pool only after the log has been synchronized with the majority of nodes. Subsequently, an append entries request with the latestcommitIndex
propagates the update to the state machines in followers. At this stage, followers can clean up commands within their speculative pools.Challenge
However, there are still some situations in which the
command
will not be cleared:During the leadership transfer, the new leader will recover the SP from the cluster. This may recover some unexecuted commands and append them into log and finally going through the typical GC path. For commands that have already been executed, they will not be restored.
Xline/curp/src/server/raw_curp/mod.rs
Lines 939 to 950 in 2a034e9
Solution
The CURP paper mentions one potential solution to this issue.
We do not have a structure similar to RIFL, and retrying may result in the execution of a command twice. (Perhaps we can deduplicate during the
apply
process?)We can add a
suspected commands
collection to the response ofappend_entries
RPC, and the leader adds these commands to the log upon receiving them, and then performs the typical GC path.Introducing an additional RPC mechanism to synchronize the SP of followers and the leader. The details and security of this approach require further discussion.
Code of Conduct
The text was updated successfully, but these errors were encountered: