Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1626: Add tests for LedgerUnderredplicationManager#markLedgerUnderreplicatedAsync #1629

Open
wants to merge 1,642 commits into
base: master
Choose a base branch
from

Conversation

sijie
Copy link
Member

@sijie sijie commented Aug 27, 2018

Descriptions of the changes in this PR:

Motivation

We introduced LedgerUnderredplicationManager#markLedgerUnderreplicatedAsync in #1619. This exposes the async
api to the public. Let's make sure we have enough test coverage for this new async api, including negative tests
and concurrent tests.

Changes

  • Add basic tests
  • Add negative tests
  • Add concurrent tests on resolving conflicts

Master Issue: #1626


In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks. However running all
the precommit checks can take a long time, some trivial changes don't need to run all the precommit checks. You
can check following list to skip the tests that don't need to run for your pull request. Leave them unchecked if
you are not sure, committers will help you:

  • [skip bookkeeper-server bookie tests]: skip testing org.apache.bookkeeper.bookie in bookkeeper-server module.
  • [skip bookkeeper-server client tests]: skip testing org.apache.bookkeeper.client in bookkeeper-server module.
  • [skip bookkeeper-server replication tests]: skip testing org.apache.bookkeeper.replication in bookkeeper-server module.
  • [skip bookkeeper-server tls tests]: skip testing org.apache.bookkeeper.tls in bookkeeper-server module.
  • [skip bookkeeper-server remaining tests]: skip testing all other tests in bookkeeper-server module.
  • [skip integration tests]: skip docker based integration tests. if you make java code changes, you shouldn't skip integration tests.
  • [skip build java8]: skip build on java8. ONLY skip this when ONLY changing files under documentation under site.
  • [skip build java9]: skip build on java9. ONLY skip this when ONLY changing files under documentation under site.


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

If this PR is a BookKeeper Proposal (BP):

  • Make sure the PR title is formatted like:
    <BP-#>: Description of bookkeeper proposal
    e.g. BP-1: 64 bits ledger is support
  • Attach the master issue link in the description of this PR.
  • Attach the google doc link if the BP is written in Google Doc.

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

@sijie
Copy link
Member Author

sijie commented Aug 27, 2018

@reddycharan I have added the tests for markLedgerUnderreplicatedAsync. Can you review it?

@Test
public void testMarkLedgerUnderreplicatedWhenSessionExpired() throws Exception {
final long ledgerId = 0xabbd00;
@Cleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much better than having nested try-with-resources blocks

* test methods. It is more suitable for running tests that don't require restarting bookies.
*/
@Slf4j
public abstract class BookKeeperClusterTestSuite {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you come up with better name, it would be confusing to have 'BookKeeperClusterTestCase' and 'BookKeeperClusterTestSuite' base classes

public class ZkLedgerUnderreplicationManagerTest extends BookKeeperClusterTestSuite {

@BeforeClass
public static void setUpCluster() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see that BookKeeperClusterTestSuite has static method with same signature/annotation. In java there is nothing like static method overriding, but this is junit quirk - "in JUnit BeforeClass methods should be named differently, otherwise will result in only the subclass method being run because the parent will be shadowed."

For a regular user it is little confusing to see method with same name and @BeforeClass annotation, probably we should remove @BeforeClass annotation for setUpCluster method in BookKeeperClusterTestSuite or may be pass numBookies as argument to the constructor just like in BookKeeperClusterTestCase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in the new testsuite, if the user named the BeforeClass method little differently (knowingly and unknowingly) then both the BeforeClass methods would be executed with the risk of setting up 2 clusters.


protected static ServerConfiguration newServerConfiguration() throws Exception {
File f = createTempDir("bookie", "test");
int port = PortManager.nextFreePort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in BookKeeperClusterTestCase.newServerConfiguration, it does

       if (baseConf.isEnableLocalTransport() || !baseConf.getAllowEphemeralPorts()) {
            port = PortManager.nextFreePort();
        } else {
            port = 0;
        }

return newServerConfiguration(port, f, new File[] { f });
}

protected static ServerConfiguration newServerConfiguration(int port, File journalDir, File[] ledgerDirs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm..we can reuse BookKeeperClusterTestCase.newServerConfiguration. Probably make this as static method and move it to some test utility class, so that it can be used in both the places


private static void stopBookies() {
synchronized (BOOKIES) {
BOOKIES.forEach(BookieServer::shutdown);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: after stop method, is it not necessary to clear BOOKIES list?

log.info("Stopped all the bookies.");
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I would suggest you to consider moving common code to some utility classes and reuse them in BookKeeperClusterTestCase and BookKeeperClusterTestSuite.

public void testMarkLedgerUnderreplicatedWhenSessionExpired() throws Exception {
final long ledgerId = 0xabbd00;
@Cleanup
ZooKeeper zk = new ZooKeeper(getZooKeeperConnectString(), 60000, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is instance variable with 'zk' name, better name it differently

long ledgerId = 0xabcdef;
Collection<String> missingBookies = Lists.newArrayList("bookie-1");

long prevCtime = -1L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here probably you can assign current system time to prevCtime, instead of -1

FutureUtils.result(urMgr.markLedgerUnderreplicatedAsync(ledgerId, missingBookies));
urLedgerFormat = urMgr.getLedgerUnreplicationInfo(ledgerId);
assertEquals(missingBookies, urLedgerFormat.getReplicaList());
assertTrue(urLedgerFormat.getCtime() >= prevCtime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you checking ">=", it should be '==', ctime of marked underreplicated ledger should not change by adding/removing replicaslist to urledger. This would be nice test case to validate that statement.

FutureUtils.result(urMgr.markLedgerUnderreplicatedAsync(ledgerId, missingBookies));
fail("Should fail to mark ledger underreplicated if there is already corrupted data on zookeeper");
} catch (ReplicationException re) {
assertTrue(re.getMessage().contains("Invalid underreplicated ledger data for ledger " + ledgerPath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure how valid it is to rely on exception message. Probably getting ReplicationException should be enoungh

@Test
public void testMarkLedgerUnderreplicatedConcurrently() throws Exception {
final int numLedgers = 20;
List<CompletableFuture<Void>> futures = Lists.newArrayListWithExpectedSize(numLedgers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new ArrayList<>(numLedgers() ?


@Test
public void testMarkLedgerUnderreplicatedConcurrently() throws Exception {
final int numLedgers = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: incorrect variable name. supposed to be numofbookies or something

// open another zookeeper client to expire current session
@Cleanup
ZooKeeper newZk = new ZooKeeper(
getZooKeeperConnectString(), 60000, null, zk.getSessionId(), zk.getSessionPasswd());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm..forgot the reason, for why would it cause the existing zkc session to expire

@reddycharan
Copy link
Contributor

couple of negative / error scenario testcases for Auditor.checkAllLedgers in AuditorPeriodicCheckTest would be helpful.

@sijie
Copy link
Member Author

sijie commented Aug 29, 2018

couple of negative / error scenario testcases for Auditor.checkAllLedgers in AuditorPeriodicCheckTest would be helpful.

One PR for one thing. As what the caption said, this task only focuses on ledgerunderrplicationmanager. Other tests should be done in a separated PR.

@eolivelli
Copy link
Contributor

This seems the only blocker for 4.8.

@sijie sijie modified the milestones: 4.9.0, 4.10.0 Dec 27, 2018
shoothzj and others added 29 commits July 12, 2024 00:07
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
### Changes

- Replaced `docker build` with `docker buildx build` to enhance cross-platform compatibility.
- Modified Dockerfile commands to conditionally create user and group only if they do not already exist, preventing errors during the build process.
…pache#4426)

### Motivation

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

### Changes

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded. 
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.
…sword (apache#4437)

Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Co-authored-by: qunzhong <qunzhong@tencent.com>
…#4443)

Fix apache#1413

## Changes

change ledger state to close when close metadata
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…che#4458)

### Motivation

This PR updates the release scripts to ensure more robust and consistent release processes. The changes include modifying the git tagging instructions to prevent potential mis-tags and updating the Python scripts to use Python 3 explicitly, which aligns with best practices for Python development.

### Changes

1. **Release Guide Modification**: Using ^{} ensures that the tag is created for the commit itself, not rc tag to be deleted. 

2. **Python Publishing Script**: Changed all instances of `python` to `python3`, modern operation systems default has `python3`, but not all has `python`

Signed-off-by: ZhangJian He <shoothzj@gmail.com>
…kie is not available. (apache#4439)

When the bookie is not available, the RackawareEnsemblePlacementPolicyImpl default rack will be `/default-region/default-rack`, it should be `/default-rack` for RackawareEnsemblePlacementPolicyImpl.


There are some logs.
```
2024-06-17T05:22:46,591+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Cannot resolve bookieId `test-bk-3:3181` to a network address, resolving as /default-region/default-rack
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId test-bk-3:3181, bookie does not exist or it is not running
	at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:66) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.resolveNetworkLocation(TopologyAwareEnsemblePlacementPolicy.java:821) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.createBookieNode(TopologyAwareEnsemblePlacementPolicy.java:811) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.convertBookieToNode(TopologyAwareEnsemblePlacementPolicy.java:845) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.convertBookiesToNodes(TopologyAwareEnsemblePlacementPolicy.java:837) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:474) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:119) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.BookKeeperAdmin.getReplacementBookiesByIndexes(BookKeeperAdmin.java:993) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1025) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:473) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:301) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:249) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.108.Final.jar:4.1.108.Final]
	at java.lang.Thread.run(Thread.java:840) ~[?:?]
Caused by: org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available
	at org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:226) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:45) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2]
	... 13 more
```
…ache#4244)

### Motivation

If the system property `readonlymode.enabled` is set to true on a ZooKeeper server, read-only mode is enabled. Data can be read from the server in read-only mode even if that server is split from the quorum.
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#Experimental+Options%2FFeatures

To connect to the server in read-only mode, the client must also allow read-only mode. The `ZooKeeperClient` class in the bookkeeper repository also has an option called `allowReadOnlyMode`.
https://github.com/apache/bookkeeper/blob/15171e1904f7196d8e9f4116ab2aecdf582e0032/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperClient.java#L219-L222

However, even if this option is set to true, the connection to the server in read-only mode will actually fail. The cause is in the `ZooKeeperWatcherBase` class. When the `ZooKeeperWatcherBase` class receives the `SyncConnected` event, it releases `clientConnectLatch` and assumes that the connection is complete.
https://github.com/apache/bookkeeper/blob/15171e1904f7196d8e9f4116ab2aecdf582e0032/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperWatcherBase.java#L128-L144

However, if the server is in read-only mode, it will receive `ConnectedReadOnly` instead of `SyncConnected`. This causes the connection to time out without being completed.

### Changes

Modified the switch statement in the `ZooKeeperWatcherBase` class to release `clientConnectLatch` when `ConnectedReadOnly` is received if the `allowReadOnlyMode` option is true.

By the way, `allowReadOnlyMode` is never set to true in BookKeeper. So this change would be useless for BookKeeper. However, it is useful for Pulsar. Because Pulsar also uses `ZooKeeperWatcherBase` and needs to be able to connect to ZooKeeper in read-only mode.
https://github.com/apache/pulsar/blob/cba1600d0f6a82f1ea194f3214a80f283fe8dc27/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/PulsarZooKeeperClient.java#L242-L244
@Ghatage
Copy link
Contributor

Ghatage commented Jul 12, 2024

The PR here was defunct as master has diverged significantly.
I added a commit to rebase it to master and make it work with the updated codebase.
Will take a look at Charan's review comments and push the changes once the Github checks pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.