Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: frequent flush cause minio rate limit #28625

Merged
merged 1 commit into from
Dec 20, 2023

Conversation

xiaofan-luan
Copy link
Collaborator

@xiaofan-luan xiaofan-luan commented Nov 21, 2023

related to #28549
pr: #28626

  1. avoid duplicated sync segments under syncing states
  2. add jitter to avoid sync segments at the same time

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xiaofan-luan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/S Denotes a PR that changes 10-29 lines. approved labels Nov 21, 2023
Copy link
Contributor

mergify bot commented Nov 21, 2023

@xiaofan-luan Please associate the related pr of master to the body of your Pull Request. (eg. “pr: #”)

Copy link
Contributor

mergify bot commented Nov 21, 2023

@xiaofan-luan

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

  1. Title Format: The PR title must begin with one of these prefixes:
  • feat: for introducing a new feature.
  • fix: for bug fixes.
  • enhance: for improvements to existing functionality.
  • test: for add tests to existing functionality.
  • doc: for modifying documentation.
  1. Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly 

Please review and update your PR to comply with these guidelines.

Copy link
Contributor

mergify bot commented Nov 21, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan xiaofan-luan changed the title Fix: frequent flush cause minio rate limit fix: frequent flush cause minio rate limit Nov 21, 2023
@mergify mergify bot added do-not-merge/missing-related-pr kind/bug Issues or changes related a bug labels Nov 21, 2023
@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

@bigsheeper
Copy link
Contributor

Actually, there is a logic to avoid duplicated sync segments:

if !task.dropped && !task.flushed && segment.isSyncing() {

@XuanYang-cn
Copy link
Contributor

Actually, there is a logic to avoid duplicated sync segments:

if !task.dropped && !task.flushed && segment.isSyncing() {

@bigsheeper This checks when executing tasks, which is too late. we're generating way too many sync tasks, logging them and merging them, but not executing them, which's very confusing. It's betther we ignore the syncing segment while generating sync task, more efficient.

@@ -683,6 +683,7 @@ func (t *flushBufferInsertTask) flushInsertData() error {
err := group.Wait()
metrics.DataNodeSave2StorageLatency.WithLabelValues(fmt.Sprint(paramtable.GetNodeID()), metrics.InsertLabel).Observe(float64(tr.ElapseSpan().Milliseconds()))
if err == nil {
log.Warn("failed to flush insert data", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place to log failed to flush, its err==nil above

@@ -707,6 +708,7 @@ func (t *flushBufferDeleteTask) flushDeleteData() error {
metrics.DataNodeSave2StorageLatency.WithLabelValues(fmt.Sprint(paramtable.GetNodeID()), metrics.DeleteLabel).Observe(float64(tr.ElapseSpan().Milliseconds()))
if err == nil {
for _, d := range t.data {
log.Warn("failed to flush delete data", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

mergify bot commented Nov 23, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@czs007 czs007 added this to the 2.3.4 milestone Dec 6, 2023
Copy link
Contributor

mergify bot commented Dec 8, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 8, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 10, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 14, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

2 similar comments
@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 14, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 15, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Dec 15, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

2 similar comments
@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 18, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 18, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 19, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 19, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@xiaofan-luan
Copy link
Collaborator Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Dec 19, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 19, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 19, 2023

@xiaofan-luan E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Add a jitter in periodically flush policy to aovid flush large amount of segments in a short time

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
@mergify mergify bot added the ci-passed label Dec 20, 2023
@czs007 czs007 added the lgtm label Dec 20, 2023
@czs007 czs007 merged commit 8e13199 into milvus-io:2.3 Dec 20, 2023
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/S Denotes a PR that changes 10-29 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants