Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement delete operation for WAL based on local storage #1566

Merged
merged 18 commits into from
Sep 14, 2024

Conversation

dracoooooo
Copy link
Contributor

Rationale

Currently the WAL based on the local disk does not support the delete function. This PR implements that functionality.

This is a follow-up task of #1552 and #1556.

Detailed Changes

  1. For each Segment, add a hashmap to record the minimum and maximum sequence numbers of all tables within that segment. During delete and write operations, this hashmap will be updated. During read operations, logs will be filtered based on this hashmap.

  2. During the delete operation, based on the aforementioned hashmap, if all logs of all tables in a read-only segment (a segment that is not currently being written to) are marked as deleted, the segment file will be physically deleted from the disk.

Test Plan

Unit test, TSBS and running a script locally that repeatedly inserts data, forcibly kills, and restarts the database process to test persistence.

@github-actions github-actions bot added the feature New feature or request label Sep 8, 2024
src/wal/src/local_storage_impl/segment.rs Show resolved Hide resolved
src/wal/src/local_storage_impl/segment.rs Outdated Show resolved Hide resolved
}

// Delete segments in all_segments
for (segment_id, _) in segments_to_remove.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segments_to_remove is Vec<(u64, Arc<...>)>, so there is no keys() for this type. Do I need to change it to a hashmap?

drop(all_segments);

// Delete segments on disk
for (_, segment) in segments_to_remove.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values()

guard.mark_deleted(location.table_id, sequence_num);

// Delete this segment if it is empty
if guard.is_empty() && guard.id != current_segment_id {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since current segment may changed, the check here should be guard.id < current_segment_id.

@@ -383,10 +465,13 @@ pub struct SegmentManager {
all_segments: Mutex<HashMap<u64, Arc<Mutex<Segment>>>>,

/// Cache for opened segments
cache: Mutex<VecDeque<u64>>,
cache: Mutex<VecDeque<(u64, Arc<Mutex<Segment>>)>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add seq here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The u64 in this tuple represents the segment ID. The reason for storing it is to determine if a segment exists in the cache without needing to acquire the lock for that segment. The newly added Arc<Mutex<Segment>> is to remove the original SegmentManager::get_segment method, which is redundant and may cause deadlock.

Copy link
Contributor

@jiacai2050 jiacai2050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiacai2050 jiacai2050 merged commit 645a8b3 into apache:main Sep 14, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants