Massive unnecessary Raft log rollback causes spike latency. #5881
Labels
affects/none
PR/issue: this bug affects none version.
severity/none
Severity of bug
type/bug
Type: something is unexpected
Please check the FAQ documentation before raising an issue
Describe the bug (required)
I've found such a scenario with spike latency for seconds: graphd's log show that requests to a certain storaged host are timeout. when look into the storaged's log, it shows that many partitions (maybe all parts as the follower) have encountered a RaftLog Rollback.
However, there is no logs indicating leader re-election or leader change, which means it should not involve inconsistency.
Your Environments (required)
a private branch dispatched from the master branch for long, but the related code looks the same as the master branch.
How To Reproduce(required)
No idea, it happens occasionally.
In my case, it happens when the storaged is under heavy load caused by write pressure test.
Expected behavior
Should not triggle massive RaftLog Rollback and causes the storaged unresponsible for seconds.
Additional context
I've taked a look at the RaftPart Impl and have some thoughts about the issue.
the corresponding code is:
if this is the case, a simple solution might be:
The text was updated successfully, but these errors were encountered: