close binary-proto-lookup connection after connection-lifetime #252

rdhabalia · 2017-02-28T19:14:01Z

Motivation

PulsarClient creates a separate dedicated connection with broker for a given service-lookup url. So, in case of broker's cold-restart, all clients may end-up creating lookup-connection to same one broker and this connection will be persisted for entire lifetime of the client, which may create lookup-overload on that one broker. So, lookup connection should not be persistent and it should be closed after sometime.

Modifications

disconnect lookup connection after x seconds (default = 10 min)

Result

Lookup connection will not be persistent and it will avoid one broker to be overloaded with lookup requests.

merlimat · 2017-02-28T19:17:13Z

So, in case of broker's cold-restart, all clients may end-up creating lookup-connection to same one broker

Uhm, all the lookup from a single client will end up on the same broker/discovery-service instance, but there is no reason why lookups from multiple clients should do the same.

rdhabalia · 2017-02-28T20:10:56Z

but there is no reason why lookups from multiple clients should do the same.

In case of cold-restart as soon as first broker comes up all the client tries to do lookup and they will endup creating lookup-connection with one broker. Also, lookup connection is persistent so, it will be used forever for that client so, that one broker will endup serving all the lookup requests. So, does it make sense to not keep lookup-connection persistent and can be closed after fixed time..??

saandrews · 2017-03-01T07:20:23Z

pulsar-client/src/main/java/com/yahoo/pulsar/client/impl/ConnectionPool.java

+    /**
+     * 
+     * @param address
+     *            remote client {@link InetSocketAddress} of the to connect


sentence seems to be incomplete?

saandrews · 2017-03-01T07:20:57Z

pulsar-client/src/main/java/com/yahoo/pulsar/client/impl/ClientCnx.java

@@ -377,6 +378,7 @@ SocketAddress serverAddrees() {
        ctx.writeAndFlush(cmd).addListener(writeFuture -> {
            if (!writeFuture.isSuccess()) {
                log.warn("{} Failed to send request to broker: {}", ctx.channel(), writeFuture.cause().getMessage());
+                pendingRequests.remove(requestId);


Shouldn't it be getAndRemovePendingLookupRequest?

no, actually we had a bug where if we failed to publish message on socket then we should remove from the queue. line:341 removes from lookup-PendingRequests and this one removes from pendingRequests

saandrews · 2017-03-01T07:25:36Z

pulsar-client/src/main/java/com/yahoo/pulsar/client/impl/ConnectionPool.java

+                        cnx.channel().disconnect();
+                    }
+                }, connectionLifetimeInSecond, TimeUnit.SECONDS);
+            }


Are we closing the connection irrespective of it's active or not? We should close it only if it's inactive for the last n minutes.

I think our goal is to not keep lookup connection persistent forever and close after n minute. however, we can add check to close connection if not active for last n minute. i will add another commit.

@saandrews updated PR with the change.

merlimat · 2017-03-01T22:12:21Z

In case of cold-restart as soon as first broker comes up all the client tries to do lookup and they will endup creating lookup-connection with one broker.

Good point.. In that case all the connections would stick to same broker.

My only concern is that by closing the connection after X time will make every lookup after a failover to open a new TCP connection (one per client) and re-do authentication, while if we maintain the connection open it would be ready to use. I'm not sure that would be a big deal though.

saandrews · 2017-03-01T22:15:56Z

In that case, it would behave similar to our http lookup for the first request. Should be ok I think.

saandrews · 2017-03-01T22:17:49Z

There are few more scenarios to address: Handling failure in case broker is unable to serve the lookup(similar to http 500) and closing it only if inactive for the configured duration. Will be addressed in a separate PR.

merlimat · 2017-03-01T22:19:37Z

In that case, it would behave similar to our http lookup for the first request. Should be ok I think.

True, it would kind of be the same. One other option could be to react on the lookups being rejected by the broker (from #181) and close the connection only if errors are received in a certain timeframe.

rdhabalia · 2017-03-02T20:03:36Z

One other option could be to react on the lookups being rejected by the broker (from #181) and close the connection only if errors are received in a certain timeframe.

Does it mean client should close the connection when broker gives TooManyRequestError back? It means

If client doesn't receive TooManyRequestError then client will keep this persistent-connection forever. So, there is a still chance that all clients go to the same broker and can still create load on one broker if traffic is < 20K (maxConcurrentLookupRequest=20000)

But I think I agree with the idea that if client gets TooManyRequestError in certain timeframe then client should disconnect the connection and try to connect to different broker (as that broker is already overloaded).

Same as like #265 we close http-connection on internal-server-error, we need to do it for binary-lookup. So, I will combine 2 changes in new PR:

close binary-proto-lookup connection on 500 error
close connection if client receives TooManyLookupRequestError in certain timeframe.

And we can think of if we don't need this PR-change (close persistent-connection) then we may close this one.

Any thought?

merlimat · 2017-03-02T20:16:09Z

My reasoning is:

All client will end up in a single broker only if it's the only one left in cluster
At that point the broker should be overwhelmed by lookups, if it is not... then we should be good anyway
If we force close the connections, we can give the clients a chance to reconnect to the other brokers that are being brought up

close binary-proto-lookup connection on 500 error

close connection if client receives TooManyLookupRequestError in certain timeframe.

Yep, that's what I was thinking

merlimat · 2017-03-10T21:56:27Z

@rdhabalia Should this PR be closed in favor of #274 ?

rdhabalia · 2017-03-10T21:57:24Z

yes, let's close this one.

Fix apache#252 ### Motivation When bundle unload triggered, the `consumerTopicManagers` cache won't be evicted and it will return the old `KafkaTopicConsumerManager` instance in the next fetch request handle. However, after bundle unload, the producer/consumer/managedLedger of topics in related bundle will be closed. If we use old `KafkaTopicConsumerManager` instance to read messages, it will return `managedLedger has been closed` exception. ### Changes 1. Change `consumerTopicManagers`, `topics`, `references` map to static attribute ConcurrentHashMap. 2. Evict related cache information for topics whose bundle trigged unload. 3. Turn on `DistributedClusterTest `.

rdhabalia added area/client type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Feb 28, 2017

rdhabalia added this to the 1.17 milestone Feb 28, 2017

rdhabalia self-assigned this Feb 28, 2017

rdhabalia requested a review from saandrews March 1, 2017 00:46

saandrews reviewed Mar 1, 2017

View reviewed changes

close binary-proto-lookup connection after connection-lifetime

1664eca

rdhabalia force-pushed the client_lk branch from f9ff5f0 to 1664eca Compare March 1, 2017 19:00

close inactive lookupconnection

33d8bca

saandrews approved these changes Mar 1, 2017

View reviewed changes

rdhabalia closed this Mar 10, 2017

merlimat removed this from the 1.17 milestone Mar 10, 2017

sijie pushed a commit to sijie/pulsar that referenced this pull request Mar 4, 2018

Corrected the name of the java instance main (apache#252)

599077a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

close binary-proto-lookup connection after connection-lifetime #252

close binary-proto-lookup connection after connection-lifetime #252

rdhabalia commented Feb 28, 2017

merlimat commented Feb 28, 2017

rdhabalia commented Feb 28, 2017

saandrews Mar 1, 2017

saandrews Mar 1, 2017

rdhabalia Mar 1, 2017

saandrews Mar 1, 2017

rdhabalia Mar 1, 2017

rdhabalia Mar 1, 2017

merlimat commented Mar 1, 2017

saandrews commented Mar 1, 2017

saandrews commented Mar 1, 2017

merlimat commented Mar 1, 2017

rdhabalia commented Mar 2, 2017

merlimat commented Mar 2, 2017

merlimat commented Mar 10, 2017

rdhabalia commented Mar 10, 2017

close binary-proto-lookup connection after connection-lifetime #252

close binary-proto-lookup connection after connection-lifetime #252

Conversation

rdhabalia commented Feb 28, 2017

Motivation

Modifications

Result

merlimat commented Feb 28, 2017

rdhabalia commented Feb 28, 2017

saandrews Mar 1, 2017

Choose a reason for hiding this comment

saandrews Mar 1, 2017

Choose a reason for hiding this comment

rdhabalia Mar 1, 2017

Choose a reason for hiding this comment

saandrews Mar 1, 2017

Choose a reason for hiding this comment

rdhabalia Mar 1, 2017

Choose a reason for hiding this comment

rdhabalia Mar 1, 2017

Choose a reason for hiding this comment

merlimat commented Mar 1, 2017

saandrews commented Mar 1, 2017

saandrews commented Mar 1, 2017

merlimat commented Mar 1, 2017

rdhabalia commented Mar 2, 2017

merlimat commented Mar 2, 2017

merlimat commented Mar 10, 2017

rdhabalia commented Mar 10, 2017