Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed the Auditor test end time. #4081

Closed

Conversation

horizonzy
Copy link
Member

Motivation

In some tests, the execution time can be quite long, even though the test methods have already finished running. For example, in the AuditorRollingRestartTest#testAuditingDuringRollingRestart() test.

I printed out the stack trace and found that it is stuck at executor.awaitTermination(30, TimeUnit.SECONDS) i
n the Auditor. In some tests, the executor in the Auditor is getting stuck, causing the test to wait for an additional 30 seconds.

"Time-limited test" #18 daemon prio=5 os_prio=31 cpu=762.48ms elapsed=69.85s tid=0x000000011fb87200 nid=0x871b waiting on condition  [0x0000000173b39000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.3.1/Native Method)
	- parking to wait for  <0x00002000063a18d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.3.1/LockSupport.java:252)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@17.0.3.1/AbstractQueuedSynchronizer.java:1672)
	at java.util.concurrent.ThreadPoolExecutor.awaitTermination(java.base@17.0.3.1/ThreadPoolExecutor.java:1464)
	at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(java.base@17.0.3.1/Executors.java:743)
	at org.apache.bookkeeper.replication.Auditor.shutdown(Auditor.java:623)
	at org.apache.bookkeeper.replication.AuditorElector.shutdown(AuditorElector.java:246)
	at org.apache.bookkeeper.replication.AutoRecoveryMain.shutdown(AutoRecoveryMain.java:157)
	at org.apache.bookkeeper.replication.AutoRecoveryMain.shutdown(AutoRecoveryMain.java:143)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase$ServerTester.stopAutoRecovery(BookKeeperClusterTestCase.java:916)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.stopReplicationService(BookKeeperClusterTestCase.java:766)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.stopBKCluster(BookKeeperClusterTestCase.java:278)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.tearDown(BookKeeperClusterTestCase.java:203)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.3.1/Native Method)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.3.1/NativeMethodAccessorImpl.java:77)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.3.1/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@17.0.3.1/Method.java:568)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.util.concurrent.FutureTask.run(java.base@17.0.3.1/FutureTask.java:264)
	at java.lang.Thread.run(java.base@17.0.3.1/Thread.java:833)

"AuditorBookie-127.0.0.1:59740" #306 daemon prio=5 os_prio=31 cpu=0.24ms elapsed=50.97s tid=0x000000011f1a1400 nid=0x13407 waiting on condition  [0x00000004e8a36000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.3.1/Native Method)
	- parking to wait for  <0x00002000050dcff0> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3.1/LockSupport.java:211)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.3.1/AbstractQueuedSynchronizer.java:715)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.3.1/AbstractQueuedSynchronizer.java:1047)
	at java.util.concurrent.CountDownLatch.await(java.base@17.0.3.1/CountDownLatch.java:230)
	at org.apache.bookkeeper.replication.ReplicationEnableCb.await(ReplicationEnableCb.java:56)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.waitIfLedgerReplicationDisabled(AuditorBookieCheckTask.java:181)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.auditBookies(AuditorBookieCheckTask.java:104)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.startAudit(AuditorBookieCheckTask.java:86)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.runTask(AuditorBookieCheckTask.java:64)
	at org.apache.bookkeeper.replication.AuditorTask.run(AuditorTask.java:72)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.run(AuditorBookieCheckTask.java:40)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@17.0.3.1/Executors.java:539)
	at java.util.concurrent.FutureTask.runAndReset(java.base@17.0.3.1/FutureTask.java:305)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@17.0.3.1/ScheduledThreadPoolExecutor.java:305)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.3.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.3.1/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(java.base@17.0.3.1/Thread.java:833)

Therefore, we can modify the behavior of awaitTermination using a system variable, so that it doesn't wait for 30 seconds when running in a single-sided mode.

@hangc0276
Copy link
Contributor

@horizonzy Would you please address the check style issue?

@hangc0276
Copy link
Contributor

My question is why the auditor is blocked for 30 seconds?

@horizonzy
Copy link
Member Author

`

My question is why the auditor is blocked for 30 seconds?

"AuditorBookie-127.0.0.1:59740" #306 daemon prio=5 os_prio=31 cpu=0.24ms elapsed=50.97s tid=0x000000011f1a1400 nid=0x13407 waiting on condition  [0x00000004e8a36000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.3.1/Native Method)
	- parking to wait for  <0x00002000050dcff0> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3.1/LockSupport.java:211)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.3.1/AbstractQueuedSynchronizer.java:715)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.3.1/AbstractQueuedSynchronizer.java:1047)
	at java.util.concurrent.CountDownLatch.await(java.base@17.0.3.1/CountDownLatch.java:230)
	at org.apache.bookkeeper.replication.ReplicationEnableCb.await(ReplicationEnableCb.java:56)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.waitIfLedgerReplicationDisabled(AuditorBookieCheckTask.java:181)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.auditBookies(AuditorBookieCheckTask.java:104)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.startAudit(AuditorBookieCheckTask.java:86)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.runTask(AuditorBookieCheckTask.java:64)
	at org.apache.bookkeeper.replication.AuditorTask.run(AuditorTask.java:72)
	at org.apache.bookkeeper.replication.AuditorBookieCheckTask.run(AuditorBookieCheckTask.java:40)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@17.0.3.1/Executors.java:539)
	at java.util.concurrent.FutureTask.runAndReset(java.base@17.0.3.1/FutureTask.java:305)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@17.0.3.1/ScheduledThreadPoolExecutor.java:305)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.3.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.3.1/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(java.base@17.0.3.1/Thread.java:833)

The AuditorBookie thread is blocked.

@horizonzy
Copy link
Member Author

This PR is a little customized, close it.

@@ -620,7 +622,7 @@ public void shutdown() {
LOG.info("Shutting down auditor");
executor.shutdown();
try {
while (!executor.awaitTermination(30, TimeUnit.SECONDS)) {
while (!executor.awaitTermination(awaitTermination, TimeUnit.SECONDS)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will wait at most 30s before enter the shutdownNow logic, not wait at least 30s.

@horizonzy horizonzy closed this Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants