Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [auto-recovery] Improve to the ReplicaitonWorker performance by deleting invalid underreplication nodes #21059

Conversation

horizonzy
Copy link
Member

@horizonzy horizonzy commented Aug 24, 2023

Fixes #21058

Main Issue: #21058

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 24, 2023
Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test to cover this change

@hangc0276 hangc0276 added this to the 3.2.0 milestone Aug 24, 2023
@hangc0276 hangc0276 added the category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost label Aug 24, 2023
@horizonzy
Copy link
Member Author

Please add a test to cover this change

Addressed

@horizonzy
Copy link
Member Author

Although ZooKeeper's containerManager exists and automatically deletes empty containers, it is not real-time. If new empty containers arrive while it is working, it cannot delete them promptly.

In a user's production environment, the deletion efficiency of the containerManager is lower compared to the efficiency of creating new empty containers. As a result, there will still be many empty containers present.

The zookeeper leader logs from the customer, it delete the empty node indeed, but there are still a significant number of empty nodes present.

00:04:43.060 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0128/80b5
00:04:43.066 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0167/0af7
00:04:43.072 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00be/02d5
00:04:43.078 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016a/0375
00:04:43.084 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016e/0282
00:04:43.091 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016a/8da8
00:04:43.097 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0130/0920
00:04:43.103 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0170/82b0
00:04:43.109 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0126/8591
00:04:43.115 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0170/07ba
00:04:43.121 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0166/8952
00:04:43.127 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0168/8b7b
00:04:43.134 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0171/8610
00:04:43.140 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0170/0047
00:04:43.146 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0122/843e
00:04:43.152 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0136/84ed
00:04:43.158 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016a/059e
00:04:43.164 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/014a/82cf
00:04:43.171 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00be/02c5
00:04:43.176 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016f/80e5
00:04:43.182 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/014a/84fb
00:04:43.188 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00ef/805e
00:04:43.194 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0166/02b8
00:04:43.200 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016e/84f7
00:04:43.206 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00e2/053c
00:04:43.212 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016e/82b5
00:04:43.218 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016b/04b6
00:04:43.224 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0167/8337
00:04:43.230 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016b/04b5
00:04:43.236 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00ef/805d
00:04:43.242 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/014a/82c6
00:04:43.248 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0178/80c3
00:04:43.255 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0128/0fbc
00:04:43.261 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0170/0034
00:04:43.267 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00be/02ab
00:04:43.273 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/010a/0711
00:04:43.279 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/014a/82d3
00:04:43.285 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/016d/8165
00:04:43.291 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/0130/8864
00:04:43.297 [ContainerManagerTask] INFO  org.apache.zookeeper.server.ContainerManager - Attempting to delete candidate container: /ledgers/underreplication/ledgers/0000/0000/00e2/052b

@Technoboy- Technoboy- merged commit ba0f2ba into apache:master Aug 31, 2023
50 checks passed
liangyepianzhou pushed a commit that referenced this pull request Sep 4, 2023
…deleting invalid underreplication nodes (#21059)

(cherry picked from commit ba0f2ba)
Technoboy- pushed a commit that referenced this pull request Sep 5, 2023
horizonzy added a commit to horizonzy/pulsar that referenced this pull request Sep 11, 2023
Technoboy- pushed a commit to horizonzy/pulsar that referenced this pull request Sep 14, 2023
Technoboy- pushed a commit to horizonzy/pulsar that referenced this pull request Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metadata category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs ready-to-test release/2.10.7 release/2.11.3 release/3.0.2 release/3.1.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [Auto-Recovery] ReplicationWorker low performance problem.
7 participants