-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CQ: Fix shared store scanner missing messages
It was still possible, although rare, to have message store files lose message data, when the following conditions were met: * the message data contains byte values 255 (255 is used as an OK marker after a message) * the message is located after a 0-filled hole in the file * the length of the data is at least 4096 bytes and if we misread it (as detailed below) we encounter a 255 byte where we expect the OK marker The trick for the code to previously misread the length can be explained as follow: A message is stored in the following format: <<Len:64, MsgIdAndMsg:Len/unit:8, 255>> With MsgId always being 16 bytes in length. So Len is always at least 16, if the message data Msg is empty. But technically it never is. Now if we have a zero filled hole just before this message, we may end up with this: <<0, Len:64, MsgIdAndMsg:Len/unit:8, 255>> When we are scanning we are testing bytes to see if there is a message there or not. We look for a Len that gives us byte 255 after MsgIdAndMsg. Len of value 4096 looks like this in binary: <<0:48, 16, 0>> Problem is if we have leading zeroes, Len may look like this: <<0, 0:48, 16, 0>> If we take the first 64 bits we get a potential length of 16. We look at the byte after the next 16 bytes. If it is 255, we think this is a message and skip by this amount of bytes, and mistakenly miss the real message. Solving this by changing the file format would be simple enough, but we don't have the luxury to afford that. A different solution was found, which is to combine file scanning with checking that the message exists in the message store index (populated from queues at startup, and kept up to date over the life time of the store). Then we know for sure that the message above doesn't exist, because the MsgId won't be found in the index. If it is, then the file number and offset will not match, and the check will fail. There remains a small chance that we get it wrong during dirty recovery. Only a better file format would improve that.
- Loading branch information
Showing
2 changed files
with
108 additions
and
94 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters