Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix datanode panic due to concurrent compaction and delete processing #27167

Merged

Conversation

bigsheeper
Copy link
Contributor

@bigsheeper bigsheeper commented Sep 18, 2023

fix #27145

the datanode processing delete while compaction happened so flush can not succeed.
For fixing:

  1. bring back blockall, ,make sure compaction sync segments and delete processing are mutexed.
  2. change ttNode handle updateCheckPoint async, make sure all the flowgraph node is processed fast.
  3. change some error handling

see also: #27158

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Sep 18, 2023
@mergify mergify bot added the dco-passed DCO check passed. label Sep 18, 2023
internal/datanode/flow_graph_time_tick_node.go Outdated Show resolved Hide resolved
@@ -678,7 +678,7 @@ func (c *ChannelMeta) getCollectionSchema(collID UniqueID, ts Timestamp) (*schem
return c.collSchema, nil
}

func (c *ChannelMeta) mergeFlushedSegments(ctx context.Context, seg *Segment, planID UniqueID, compactedFrom []UniqueID) error {
func (c *ChannelMeta) mergeFlushedSegments(ctx context.Context, seg *Segment, planID UniqueID, compactedFrom []UniqueID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx can be safely removed here since we don't check it anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx can be used for tracing

internal/util/flowgraph/node.go Outdated Show resolved Hide resolved
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>

Co-authored-by: bigsheeper <yihao.dai@zilliz.com>
@bigsheeper bigsheeper force-pushed the 2309-FixPanicDueToCompaction-master branch from 207a3de to 80b0fd7 Compare September 18, 2023 07:22
@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Merging #27167 (80b0fd7) into master (8e17bc3) will decrease coverage by 0.03%.
The diff coverage is 69.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27167      +/-   ##
==========================================
- Coverage   82.51%   82.49%   -0.03%     
==========================================
  Files         796      796              
  Lines      112550   112576      +26     
==========================================
- Hits        92873    92866       -7     
- Misses      16472    16496      +24     
- Partials     3205     3214       +9     
Files Changed Coverage Δ
internal/datanode/data_node.go 74.08% <ø> (ø)
internal/datanode/flow_graph_delete_node.go 89.34% <0.00%> (-0.74%) ⬇️
internal/datanode/flow_graph_insert_buffer_node.go 77.39% <0.00%> (-1.24%) ⬇️
internal/util/flowgraph/node.go 80.80% <16.66%> (+5.27%) ⬆️
internal/datanode/services.go 82.45% <66.66%> (-0.25%) ⬇️
internal/datanode/channel_meta.go 94.28% <100.00%> (+0.21%) ⬆️
internal/datanode/flow_graph_time_tick_node.go 92.39% <100.00%> (+1.60%) ⬆️
internal/datanode/flush_manager.go 91.14% <100.00%> (ø)

... and 21 files with indirect coverage changes

@XuanYang-cn
Copy link
Contributor

LGTM

@xiaofan-luan xiaofan-luan added ci-passed manual-pass manually set pass before ci-passed labeled labels Sep 18, 2023
@xiaofan-luan
Copy link
Collaborator

/lgtm
/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bigsheeper, xiaofan-luan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. lgtm manual-pass manually set pass before ci-passed labeled size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Standalone restarted once during test
4 participants