Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Delegator may becomes unserviceable after querycoord restart #37055

Merged

Conversation

weiliu1031
Copy link
Contributor

issue: #37054
after querycoord restart, segment_checker may release segment by mistake due to next target isn't ready yet.

This PR requires release segment must happens after next target is ready.

after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
@sre-ci-robot sre-ci-robot added the size/S Denotes a PR that changes 10-29 lines. label Oct 22, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug ci-passed labels Oct 22, 2024
Copy link
Contributor

@congqixia congqixia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR may cause segment checker unable to release segment if current/next target is the same

@@ -192,13 +192,14 @@ func (c *SegmentChecker) getGrowingSegmentDiff(collectionID int64,
continue
}

nextTargetExist := c.targetMgr.IsNextTargetExist(collectionID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why don't we just skip this round if nextTarget not Exist?
  2. targetVersion := c.targetMgr.GetCollectionTargetVersion(collectionID, meta.CurrentTarget)
    if view.TargetVersion != targetVersion {
    // before shard delegator update it's readable version, skip release segment
    log.RatedInfo(20, "before shard delegator update it's readable version, skip release segment",
    zap.String("channelName", channelName),
    zap.Int64("nodeID", node),
    zap.Int64("leaderVersion", view.TargetVersion),
    zap.Int64("currentVersion", targetVersion),
    )
    continue
    }
    can we handle by this logic above but not introduce another nextTargetExist check?
    the code itself is very redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we can't use same condition to skip this round, cause load segment just require next target exist, but release segment require both current target and next target exists.
  2. target version can't block releasing segment in next target, cause it only works for current target

@weiliu1031
Copy link
Contributor Author

this PR may cause segment checker unable to release segment if current/next target is the same

target_observer will update current target first, then update next target. so current target and next target will both survive even if they have same content.

@xiaofan-luan
Copy link
Collaborator

/lgtm
/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: weiliu1031, xiaofan-luan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 39a91eb into milvus-io:master Oct 24, 2024
16 of 17 checks passed
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Oct 24, 2024
…lvus-io#37055)

issue: milvus-io#37054
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
sre-ci-robot pushed a commit that referenced this pull request Oct 25, 2024
…7055) (#37100)

issue: #37054
pr: #37055
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/S Denotes a PR that changes 10-29 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants