Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop checkpoint validation when encountering a valid checkpoint #463

Merged
merged 3 commits into from
Aug 14, 2024

Conversation

the-mikedavis
Copy link
Member

@the-mikedavis the-mikedavis commented Jul 25, 2024

@mkuratczyk noticed that with many QQs on the qq-v4 branch and each QQ having many checkpoints, we spend a fair amount of effort reading the checkpoints during recovery. This is because ra_snapshot:find_checkpoints/1 uses the ra_snapshot:validate/1 callback to ensure that each snapshot is valid. validate/1 is somewhat expensive in ra_log_snapshot since it fully reads and decodes the checkpoint, discarding the result.

Not all of this validation is necessary: we can stop validating checkpoints when we find the latest checkpoint which is valid. This is likely to be good enough. I've also updated find_checkpoints/1 to stop its search when it finds a checkpoint with a lower index than the current snapshot as any checkpoints lower than the snapshot index won't be used for promotion and should be removed. For many QQs with many checkpoints each this should save some I/O usage and memory.

@the-mikedavis the-mikedavis force-pushed the md/checkpoint-defer-validation branch from 2cb471c to 0b45ffc Compare July 25, 2024 21:19
@the-mikedavis
Copy link
Member Author

the-mikedavis commented Jul 29, 2024

I took some rough measurements with tprof from OTP 27. The gist is that time and memory savings look pretty good: 1.62s down to 0.28s and 178 million words of memory down to ~20 million for ra_snapshot:init/6 on a QQ's checkpoint directory (from the qq-v4 branch) with 5 million messages.

Results...

Queue created with perf-test -qq -u qq -x 1 -y 0 -C 5000000 -c 3000

Measured with:

tprof:profile(fun() -> ra_snapshot:init(<<"uuid">>, ra_log_snapshot, "./snapshots", "./checkpoints", undefined, 3) end, #{type => call_time}).

and #{type => call_memory} for the memory breakdowns.

This branch:

FUNCTION                                        CALLS  TIME (μs)  PER CALL  [    %]
...
erlang:universaltime_to_localtime/1                 6         69     11.50  [ 0.02]
prim_file:close_nif/1                              17         91      5.35  [ 0.03]
prim_file:list_dir_nif/1                            2         92     46.00  [ 0.03]
prim_file:read_nif/2                               34        155      4.56  [ 0.05]
file:file_name_1/2                               1037        192      0.19  [ 0.07]
filename:join1/4                                 1914        203      0.11  [ 0.07]
prim_file:open_nif/2                               17        291     17.12  [ 0.10]
erlang:crc32/1                                      1      13109  13109.00  [ 4.54]
prim_file:read_file_nif/1                           1      52475  52475.00  [18.18]
ra_log_snapshot:parse_snapshot/1                    1      93138  93138.00  [32.26]
erlang:binary_to_term/1                            19     128266   6750.84  [44.43]
                                                          288697            [100.0]

0.28s

FUNCTION                                CALLS     WORDS    PER CALL  [    %]
...
prim_file:internal_native2name/1           17      1122       66.00  [ 0.01]
file:file_name_1/2                       1037      2040        1.97  [ 0.01]
lists:reverse/2                            42      3726       88.71  [ 0.02]
filename:join1/4                         1914      3758        1.96  [ 0.02]
erlang:crc32/1                              1      7450     7450.00  [ 0.04]
erlang:binary_to_term/1                    19  19871694  1045878.63  [99.89]
                                               19892656              [100.0]

main:

FUNCTION                                        CALLS  TIME (μs)  PER CALL  [    %]
...
erlang:universaltime_to_localtime/1                 6         71     11.83  [ 0.00]
prim_file:list_dir_nif/1                            2         93     46.50  [ 0.01]
prim_file:close_nif/1                              17        165      9.71  [ 0.01]
file:file_name_1/2                               1037        168      0.16  [ 0.01]
prim_file:read_nif/2                               34        190      5.59  [ 0.01]
filename:join1/4                                 2890        249      0.09  [ 0.02]
prim_file:open_nif/2                               17        704     41.41  [ 0.04]
erlang:crc32/1                                     17     141498   8323.41  [ 8.72]
prim_file:read_file_nif/1                          17     154438   9084.59  [ 9.52]
ra_log_snapshot:parse_snapshot/1                   17     520271  30604.18  [32.08]
erlang:binary_to_term/1                            51     803040  15745.88  [49.51]
                                                         1621975            [100.0]
FUNCTION                                    CALLS      WORDS    PER CALL  [    %]
...
prim_file:internal_native2name/1               17       1122       66.00  [ 0.00]
file:file_name_1/2                           1037       2040        1.97  [ 0.00]
lists:reverse/2                                57       5552       97.40  [ 0.00]
filename:join1/4                             2890       5678        1.96  [ 0.00]
erlang:crc32/1                                 17      66600     3917.65  [ 0.04]
erlang:binary_to_term/1                        51  177721626  3484737.76  [99.95]
                                                   177806402              [100.0]

@kjnilsson
Copy link
Contributor

Other checkpoints we can validate during promotion and discard ones that fai

For a quorum queue were consumers keep up with ingress checkpoints are promoted very often. It would be nice not to to have to do the validation work every time just because we optimised recovery. My thought was that once we'd found a valid checkpoint during recovery we'd assume all prior checkpoints are also valid. That should be roughly as good as promoting any other checkpoint.

The most likely way a checkpoint would become corrupted is if the server hard stopped during a write or fsync. Sure there are other ways checkpoints could become corrupted but at least we guard against the most likely one.

This refactors `ra_snapshot:find_checkpoints/1` to cut down on some
work when there are many checkpoints. We scan through the checkpoint
directories to find the first (latest) valid checkpoint we can use for
recovery. Then we can defer using the `ra_snapshot:validate/1` callback
(which can be somewhat expensive) for any older checkpoints.

We assume that any checkpoints older than the latest valid checkpoint
are valid. We expect that invalid checkpoints would be created when a
machine terminates hard and unexpectedly and may stop an in-progress
write or leave a checkpoint file unsynced. This should only affect some
number of the latest checkpoints though. Once we've found a checkpoint
file that is valid, checkpoints older than that should be fully written
and synchronized too.

We also bail out of the search when we find a checkpoint that has a
lower index than the current snapshot index. Those checkpoints cannot
be promoted and should be deleted. We scan through the checkpoints from
most recent to least recent, so when we find a checkpoint with an older
index than the snapshot, we delete that checkpoint and any older
checkpoints.
@the-mikedavis the-mikedavis force-pushed the md/checkpoint-defer-validation branch from 0b45ffc to 48ecb89 Compare August 5, 2024 14:35
@the-mikedavis the-mikedavis marked this pull request as ready for review August 5, 2024 14:36
Copy link
Contributor

@kjnilsson kjnilsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor things

src/ra_snapshot.erl Show resolved Hide resolved
src/ra_snapshot.erl Outdated Show resolved Hide resolved
src/ra_snapshot.erl Outdated Show resolved Hide resolved
@kjnilsson kjnilsson changed the title Defer some checkpoint validation until promotion Stop checkpoint validation when encountering a valid checkpoint Aug 14, 2024
@kjnilsson kjnilsson merged commit 4c5b409 into main Aug 14, 2024
9 checks passed
@dumbbell dumbbell added this to the 2.13.6 milestone Aug 14, 2024
@michaelklishin michaelklishin deleted the md/checkpoint-defer-validation branch August 14, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants