-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement delete operation for WAL based on local storage #1566
Conversation
…uired before acquiring the lock for a segment.
} | ||
|
||
// Delete segments in all_segments | ||
for (segment_id, _) in segments_to_remove.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
segments_to_remove
is Vec<(u64, Arc<...>)>
, so there is no keys()
for this type. Do I need to change it to a hashmap?
drop(all_segments); | ||
|
||
// Delete segments on disk | ||
for (_, segment) in segments_to_remove.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values()
guard.mark_deleted(location.table_id, sequence_num); | ||
|
||
// Delete this segment if it is empty | ||
if guard.is_empty() && guard.id != current_segment_id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since current segment may changed, the check here should be guard.id < current_segment_id
.
@@ -383,10 +465,13 @@ pub struct SegmentManager { | |||
all_segments: Mutex<HashMap<u64, Arc<Mutex<Segment>>>>, | |||
|
|||
/// Cache for opened segments | |||
cache: Mutex<VecDeque<u64>>, | |||
cache: Mutex<VecDeque<(u64, Arc<Mutex<Segment>>)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add seq here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The u64
in this tuple represents the segment ID. The reason for storing it is to determine if a segment exists in the cache without needing to acquire the lock for that segment. The newly added Arc<Mutex<Segment>>
is to remove the original SegmentManager::get_segment
method, which is redundant and may cause deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Rationale
Currently the WAL based on the local disk does not support the delete function. This PR implements that functionality.
This is a follow-up task of #1552 and #1556.
Detailed Changes
For each
Segment
, add a hashmap to record the minimum and maximum sequence numbers of all tables within that segment. Duringdelete
andwrite
operations, this hashmap will be updated. During read operations, logs will be filtered based on this hashmap.During the
delete
operation, based on the aforementioned hashmap, if all logs of all tables in a read-only segment (a segment that is not currently being written to) are marked as deleted, the segment file will be physically deleted from the disk.Test Plan
Unit test, TSBS and running a script locally that repeatedly inserts data, forcibly kills, and restarts the database process to test persistence.