-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 1626: Add tests for LedgerUnderredplicationManager#markLedgerUnderreplicatedAsync #1629
base: master
Are you sure you want to change the base?
Conversation
@reddycharan I have added the tests for markLedgerUnderreplicatedAsync. Can you review it? |
@Test | ||
public void testMarkLedgerUnderreplicatedWhenSessionExpired() throws Exception { | ||
final long ledgerId = 0xabbd00; | ||
@Cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much better than having nested try-with-resources blocks
* test methods. It is more suitable for running tests that don't require restarting bookies. | ||
*/ | ||
@Slf4j | ||
public abstract class BookKeeperClusterTestSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you come up with better name, it would be confusing to have 'BookKeeperClusterTestCase' and 'BookKeeperClusterTestSuite' base classes
public class ZkLedgerUnderreplicationManagerTest extends BookKeeperClusterTestSuite { | ||
|
||
@BeforeClass | ||
public static void setUpCluster() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see that BookKeeperClusterTestSuite has static method with same signature/annotation. In java there is nothing like static method overriding, but this is junit quirk - "in JUnit BeforeClass methods should be named differently, otherwise will result in only the subclass method being run because the parent will be shadowed."
For a regular user it is little confusing to see method with same name and @BeforeClass annotation, probably we should remove @BeforeClass annotation for setUpCluster method in BookKeeperClusterTestSuite or may be pass numBookies as argument to the constructor just like in BookKeeperClusterTestCase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also in the new testsuite, if the user named the BeforeClass method little differently (knowingly and unknowingly) then both the BeforeClass methods would be executed with the risk of setting up 2 clusters.
|
||
protected static ServerConfiguration newServerConfiguration() throws Exception { | ||
File f = createTempDir("bookie", "test"); | ||
int port = PortManager.nextFreePort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in BookKeeperClusterTestCase.newServerConfiguration, it does
if (baseConf.isEnableLocalTransport() || !baseConf.getAllowEphemeralPorts()) {
port = PortManager.nextFreePort();
} else {
port = 0;
}
return newServerConfiguration(port, f, new File[] { f }); | ||
} | ||
|
||
protected static ServerConfiguration newServerConfiguration(int port, File journalDir, File[] ledgerDirs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm..we can reuse BookKeeperClusterTestCase.newServerConfiguration. Probably make this as static method and move it to some test utility class, so that it can be used in both the places
|
||
private static void stopBookies() { | ||
synchronized (BOOKIES) { | ||
BOOKIES.forEach(BookieServer::shutdown); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: after stop method, is it not necessary to clear BOOKIES list?
log.info("Stopped all the bookies."); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general I would suggest you to consider moving common code to some utility classes and reuse them in BookKeeperClusterTestCase and BookKeeperClusterTestSuite.
public void testMarkLedgerUnderreplicatedWhenSessionExpired() throws Exception { | ||
final long ledgerId = 0xabbd00; | ||
@Cleanup | ||
ZooKeeper zk = new ZooKeeper(getZooKeeperConnectString(), 60000, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is instance variable with 'zk' name, better name it differently
long ledgerId = 0xabcdef; | ||
Collection<String> missingBookies = Lists.newArrayList("bookie-1"); | ||
|
||
long prevCtime = -1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here probably you can assign current system time to prevCtime, instead of -1
FutureUtils.result(urMgr.markLedgerUnderreplicatedAsync(ledgerId, missingBookies)); | ||
urLedgerFormat = urMgr.getLedgerUnreplicationInfo(ledgerId); | ||
assertEquals(missingBookies, urLedgerFormat.getReplicaList()); | ||
assertTrue(urLedgerFormat.getCtime() >= prevCtime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you checking ">=", it should be '==', ctime of marked underreplicated ledger should not change by adding/removing replicaslist to urledger. This would be nice test case to validate that statement.
FutureUtils.result(urMgr.markLedgerUnderreplicatedAsync(ledgerId, missingBookies)); | ||
fail("Should fail to mark ledger underreplicated if there is already corrupted data on zookeeper"); | ||
} catch (ReplicationException re) { | ||
assertTrue(re.getMessage().contains("Invalid underreplicated ledger data for ledger " + ledgerPath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm not sure how valid it is to rely on exception message. Probably getting ReplicationException should be enoungh
@Test | ||
public void testMarkLedgerUnderreplicatedConcurrently() throws Exception { | ||
final int numLedgers = 20; | ||
List<CompletableFuture<Void>> futures = Lists.newArrayListWithExpectedSize(numLedgers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is soon to be deprecated method - https://google.github.io/guava/releases/22.0/api/docs/com/google/common/collect/Lists.html#newArrayListWithExpectedSize-int-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new ArrayList<>(numLedgers() ?
|
||
@Test | ||
public void testMarkLedgerUnderreplicatedConcurrently() throws Exception { | ||
final int numLedgers = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: incorrect variable name. supposed to be numofbookies or something
// open another zookeeper client to expire current session | ||
@Cleanup | ||
ZooKeeper newZk = new ZooKeeper( | ||
getZooKeeperConnectString(), 60000, null, zk.getSessionId(), zk.getSessionPasswd()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm..forgot the reason, for why would it cause the existing zkc session to expire
couple of negative / error scenario testcases for Auditor.checkAllLedgers in AuditorPeriodicCheckTest would be helpful. |
One PR for one thing. As what the caption said, this task only focuses on ledgerunderrplicationmanager. Other tests should be done in a separated PR. |
This seems the only blocker for 4.8. |
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…pache#4432) Signed-off-by: zhiheng123 <903292776@qq.com>
### Changes - Replaced `docker build` with `docker buildx build` to enhance cross-platform compatibility. - Modified Dockerfile commands to conditionally create user and group only if they do not already exist, preventing errors during the build process.
…4435) Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…pache#4426) ### Motivation [Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes: - netty/netty#14086 - netty/netty#14093 On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS. ### Changes - Upgrade Netty to 4.1.111.Final - Switch to use grpc-netty-shaded instead of grpc-netty - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency. - Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded. - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution. - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.
…pache#4436) Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…sword (apache#4437) Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Co-authored-by: qunzhong <qunzhong@tencent.com>
…#4443) Fix apache#1413 ## Changes change ledger state to close when close metadata
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…pache#4445) Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…che#4458) ### Motivation This PR updates the release scripts to ensure more robust and consistent release processes. The changes include modifying the git tagging instructions to prevent potential mis-tags and updating the Python scripts to use Python 3 explicitly, which aligns with best practices for Python development. ### Changes 1. **Release Guide Modification**: Using ^{} ensures that the tag is created for the commit itself, not rc tag to be deleted. 2. **Python Publishing Script**: Changed all instances of `python` to `python3`, modern operation systems default has `python3`, but not all has `python` Signed-off-by: ZhangJian He <shoothzj@gmail.com>
### Motivation Reference: https://adoptium.net/blog/2021/08/using-jlink-in-dockerfiles Reduce size: 398.23 MB -> 245.23 MB New image:https://hub.docker.com/layers/nodece/bk/4.17.1-jlink/images/sha256-73c8354a981d7dfefcb1247cf84f6c3e91db2342431749e528ba8468c65be1f5?context=explore Old image: https://hub.docker.com/layers/apache/bookkeeper/4.17.1/images/sha256-342f384bc21b4d31d5349f741cf831f3c5bb47dfea34557104528ccd9597810f?context=explore ### Changes - Using jlink to compress the JDK and then copy that to the base image
…kie is not available. (apache#4439) When the bookie is not available, the RackawareEnsemblePlacementPolicyImpl default rack will be `/default-region/default-rack`, it should be `/default-rack` for RackawareEnsemblePlacementPolicyImpl. There are some logs. ``` 2024-06-17T05:22:46,591+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Cannot resolve bookieId `test-bk-3:3181` to a network address, resolving as /default-region/default-rack org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId test-bk-3:3181, bookie does not exist or it is not running at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:66) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.resolveNetworkLocation(TopologyAwareEnsemblePlacementPolicy.java:821) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.createBookieNode(TopologyAwareEnsemblePlacementPolicy.java:811) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.convertBookieToNode(TopologyAwareEnsemblePlacementPolicy.java:845) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.convertBookiesToNodes(TopologyAwareEnsemblePlacementPolicy.java:837) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:474) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:119) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.BookKeeperAdmin.getReplacementBookiesByIndexes(BookKeeperAdmin.java:993) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1025) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:473) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:301) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:249) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.108.Final.jar:4.1.108.Final] at java.lang.Thread.run(Thread.java:840) ~[?:?] Caused by: org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available at org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:226) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:45) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] ... 13 more ```
…ache#4244) ### Motivation If the system property `readonlymode.enabled` is set to true on a ZooKeeper server, read-only mode is enabled. Data can be read from the server in read-only mode even if that server is split from the quorum. https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#Experimental+Options%2FFeatures To connect to the server in read-only mode, the client must also allow read-only mode. The `ZooKeeperClient` class in the bookkeeper repository also has an option called `allowReadOnlyMode`. https://github.com/apache/bookkeeper/blob/15171e1904f7196d8e9f4116ab2aecdf582e0032/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperClient.java#L219-L222 However, even if this option is set to true, the connection to the server in read-only mode will actually fail. The cause is in the `ZooKeeperWatcherBase` class. When the `ZooKeeperWatcherBase` class receives the `SyncConnected` event, it releases `clientConnectLatch` and assumes that the connection is complete. https://github.com/apache/bookkeeper/blob/15171e1904f7196d8e9f4116ab2aecdf582e0032/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperWatcherBase.java#L128-L144 However, if the server is in read-only mode, it will receive `ConnectedReadOnly` instead of `SyncConnected`. This causes the connection to time out without being completed. ### Changes Modified the switch statement in the `ZooKeeperWatcherBase` class to release `clientConnectLatch` when `ConnectedReadOnly` is received if the `allowReadOnlyMode` option is true. By the way, `allowReadOnlyMode` is never set to true in BookKeeper. So this change would be useless for BookKeeper. However, it is useful for Pulsar. Because Pulsar also uses `ZooKeeperWatcherBase` and needs to be able to connect to ZooKeeper in read-only mode. https://github.com/apache/pulsar/blob/cba1600d0f6a82f1ea194f3214a80f283fe8dc27/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/PulsarZooKeeperClient.java#L242-L244
The PR here was defunct as master has diverged significantly. |
Descriptions of the changes in this PR:
Motivation
We introduced LedgerUnderredplicationManager#markLedgerUnderreplicatedAsync in #1619. This exposes the async
api to the public. Let's make sure we have enough test coverage for this new async api, including negative tests
and concurrent tests.
Changes
Master Issue: #1626