Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: dramatically reduce checkpoint memory consumption #2956

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Oct 22, 2024

Both commits describe the specific fixes, but basically our checkpoint code was collecting too much into memory when it could iterate! 🎢

With a test table:

Before

Maximum resident set size (kbytes): 19964728

After

Maximum resident set size (kbytes): 4017132

Sponsored-by: Scribd Inc

…oint

For excessively large tables which do not have sufficient checkpoints or
excessive `action` volume between checkpoints, the
checkpoint code can consume an unreasonable amount of memory.

In the case evaluated there were over a thousand transactions between
transactions, but that resulted in over 2M actions which needed to be
persisted to the checkpoint. This scenario led to 19.9GB of memory
utilized when producing checkpoints for a table which used 3.8GB to
open.

By iteratively processing the buffers which need to be serialized, the
memory is dramatically reduced:

**Before**:

    Maximum resident set size (kbytes): 19964728

**After:**

    Maximum resident set size (kbytes): 4621648

Sponsored-by: Scribd Inc
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
Similar to the prior commit, this improves the iterative nature of
checkpoint creation.

**Before:**

    Maximum resident set size (kbytes): 4621648

**After:**

    Maximum resident set size (kbytes): 4017132

Sponsored-by: Scribd Inc
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Oct 22, 2024
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.35%. Comparing base (10c6b5c) to head (ad83562).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/protocol/checkpoints.rs 77.77% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2956      +/-   ##
==========================================
- Coverage   72.38%   72.35%   -0.03%     
==========================================
  Files         131      131              
  Lines       40599    40602       +3     
  Branches    40599    40602       +3     
==========================================
- Hits        29387    29378       -9     
- Misses       9335     9336       +1     
- Partials     1877     1888      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rtyler rtyler marked this pull request as ready for review October 22, 2024 23:20
@rtyler rtyler enabled auto-merge October 22, 2024 23:20
Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@rtyler rtyler added this pull request to the merge queue Oct 23, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 23, 2024
@ion-elgreco ion-elgreco changed the title fix: dramatically reduce checkpoint memory consumption for high volume tables fix: dramatically reduce checkpoint memory consumption Oct 23, 2024
@ion-elgreco ion-elgreco added this pull request to the merge queue Oct 23, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 23, 2024
@rtyler rtyler added this pull request to the merge queue Oct 23, 2024
@rtyler rtyler removed this pull request from the merge queue due to a manual request Oct 23, 2024
@rtyler rtyler merged commit c05931a into delta-io:main Oct 23, 2024
26 checks passed
@rtyler rtyler deleted the checkjpoint-optimization branch October 23, 2024 13:39
alexwilcoxson-rel pushed a commit to relativityone/delta-rs that referenced this pull request Nov 8, 2024
Both commits describe the specific fixes, but basically our checkpoint
code was collecting too much into memory when it could iterate!
:roller_coaster:

With a test table:

Before

`Maximum resident set size (kbytes): 19964728`

After

`Maximum resident set size (kbytes): 4017132`

Sponsored-by: [Scribd Inc](https://tech.scribd.com)

---------

Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
(cherry picked from commit c05931a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants