Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Add timeout for unload namespace bundle. #15719

Merged
merged 1 commit into from
May 25, 2022

Conversation

Technoboy-
Copy link
Contributor

Motivation

We find that sometimes the broker shutdown blocked at BrokerService#unloadNamespaceBundlesGracefully:

2022-05-20T03:37:05.4960249Z "main" #1 prio=5 os_prio=0 cpu=32274.29ms elapsed=2566.54s tid=0x00007fd108024380 nid=0x1af8f waiting on condition  [0x00007fd10fcd0000]
2022-05-20T03:37:05.4960659Z    java.lang.Thread.State: WAITING (parking)
2022-05-20T03:37:05.4961114Z 	at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
2022-05-20T03:37:05.4961875Z 	- parking to wait for  <0x00000000cdf00010> (a java.util.concurrent.CompletableFuture$Signaller)
2022-05-20T03:37:05.4962343Z 	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
2022-05-20T03:37:05.4963171Z 	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.3/CompletableFuture.java:1864)
2022-05-20T03:37:05.4963683Z 	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.3/ForkJoinPool.java:3463)
2022-05-20T03:37:05.4964169Z 	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.3/ForkJoinPool.java:3434)
2022-05-20T03:37:05.4964660Z 	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.3/CompletableFuture.java:1898)
2022-05-20T03:37:05.4965158Z 	at java.util.concurrent.CompletableFuture.get(java.base@17.0.3/CompletableFuture.java:2072)
2022-05-20T03:37:05.4965715Z 	at org.apache.pulsar.broker.service.BrokerService.lambda$unloadNamespaceBundlesGracefully$21(BrokerService.java:919)
2022-05-20T03:37:05.4966467Z 	at org.apache.pulsar.broker.service.BrokerService$$Lambda$1164/0x0000000801527c70.accept(Unknown Source)
2022-05-20T03:37:05.4966882Z 	at java.lang.Iterable.forEach(java.base@17.0.3/Iterable.java:75)
2022-05-20T03:37:05.4967408Z 	at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:911)
2022-05-20T03:37:05.4968078Z 	at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:887)
2022-05-20T03:37:05.4968664Z 	at org.apache.pulsar.broker.service.BrokerService.closeAsync(BrokerService.java:732)
2022-05-20T03:37:05.4969579Z 	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:450)
2022-05-20T03:37:05.4970123Z 	at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372)
2022-05-20T03:37:05.4970720Z 	at org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown(PulsarFunctionTlsTest.java:182)
2022-05-20T03:37:05.4971338Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.3/Native Method)
2022-05-20T03:37:05.4971951Z 	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.3/NativeMethodAccessorImpl.java:77)
2022-05-20T03:37:05.4972615Z 	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.3/DelegatingMethodAccessorImpl.java:43)
2022-05-20T03:37:05.4973196Z 	at java.lang.reflect.Method.invoke(java.base@17.0.3/Method.java:568)
2022-05-20T03:37:05.4974028Z 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
2022-05-20T03:37:05.4974709Z 	at org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:61)
2022-05-20T03:37:05.4975404Z 	at org.testng.internal.ConfigInvoker.invokeConfigurationMethod(ConfigInvoker.java:366)
2022-05-20T03:37:05.4976160Z 	at org.testng.internal.ConfigInvoker.invokeConfigurations(ConfigInvoker.java:320)
2022-05-20T03:37:05.4976700Z 	at org.testng.internal.TestInvoker.runConfigMethods(TestInvoker.java:701)
2022-05-20T03:37:05.4977278Z 	at org.testng.internal.TestInvoker.runAfterGroupsConfigurations(TestInvoker.java:677)
2022-05-20T03:37:05.4977835Z 	at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:661)
2022-05-20T03:37:05.4978285Z 	at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
2022-05-20T03:37:05.4978810Z 	at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
2022-05-20T03:37:05.4979341Z 	at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
2022-05-20T03:37:05.4979863Z 	at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
2022-05-20T03:37:05.4980424Z 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
2022-05-20T03:37:05.4980961Z 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
2022-05-20T03:37:05.4981462Z 	at org.testng.TestRunner$$Lambda$167/0x0000000800d9e540.accept(Unknown Source)
2022-05-20T03:37:05.4982018Z 	at java.util.ArrayList.forEach(java.base@17.0.3/ArrayList.java:1511)
2022-05-20T03:37:05.4982483Z 	at org.testng.TestRunner.privateRun(TestRunner.java:764)
2022-05-20T03:37:05.4982908Z 	at org.testng.TestRunner.run(TestRunner.java:585)
2022-05-20T03:37:05.4983341Z 	at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
2022-05-20T03:37:05.4983784Z 	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
2022-05-20T03:37:05.4984558Z 	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
2022-05-20T03:37:05.4985005Z 	at org.testng.SuiteRunner.run(SuiteRunner.java:286)
2022-05-20T03:37:05.4985984Z 	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
2022-05-20T03:37:05.4986529Z 	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
2022-05-20T03:37:05.4986974Z 	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
2022-05-20T03:37:05.4987336Z 	at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
2022-05-20T03:37:05.4987682Z 	at org.testng.TestNG.runSuites(TestNG.java:1069)
2022-05-20T03:37:05.4987975Z 	at org.testng.TestNG.run(TestNG.java:1037)
2022-05-20T03:37:05.4988374Z 	at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:135)
2022-05-20T03:37:05.4988977Z 	at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112)
2022-05-20T03:37:05.4989654Z 	at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeLazy(TestNGDirectoryTestSuite.java:123)
2022-05-20T03:37:05.4990278Z 	at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:90)
2022-05-20T03:37:05.4990835Z 	at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:146)
2022-05-20T03:37:05.4991415Z 	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
2022-05-20T03:37:05.4992001Z 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
2022-05-20T03:37:05.4992518Z 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
2022-05-20T03:37:05.4992985Z 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

Documentation

  • no-need-doc
    (Please explain why)

@Technoboy- Technoboy- self-assigned this May 23, 2022
@Technoboy- Technoboy- added type/bug The PR fixed a bug or issue reported a bug area/broker doc-not-needed Your PR changes do not impact docs labels May 23, 2022
@Technoboy- Technoboy- added this to the 2.11.0 milestone May 23, 2022
Copy link
Member

@shibd shibd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@Technoboy- Technoboy- force-pushed the fix-unloadbundle-timeout-1 branch from e4fe311 to 15e8e79 Compare May 24, 2022 01:40
@lhotari
Copy link
Member

lhotari commented May 24, 2022

I reported #15753 about the unloading hanging. I don't think that this PR currently addresses the root cause.
The cause of the unloading getting stuck should be investigated and fixed. It is possible that the root case is a shutdown race condition such as #15643 .

@Technoboy- Technoboy- force-pushed the fix-unloadbundle-timeout-1 branch from 15e8e79 to d899e61 Compare May 24, 2022 09:42
@Technoboy-
Copy link
Contributor Author

I reported #15753 about the unloading hanging. I don't think that this PR currently addresses the root cause. The cause of the unloading getting stuck should be investigated and fixed. It is possible that the root case is a shutdown race condition such as #15643 .

@lhotari
Yes, I know. because we have namespaceBundleUnloadingTimeoutMs, so I think it can add timeout here.

Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@Technoboy- Technoboy- added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages and removed type/bug The PR fixed a bug or issue reported a bug labels May 25, 2022
@Technoboy- Technoboy- merged commit 74b108d into apache:master May 25, 2022
@Technoboy- Technoboy- deleted the fix-unloadbundle-timeout-1 branch August 10, 2022 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants