-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
close binary-proto-lookup connection after connection-lifetime #252
Conversation
Uhm, all the lookup from a single client will end up on the same broker/discovery-service instance, but there is no reason why lookups from multiple clients should do the same. |
In case of cold-restart as soon as first broker comes up all the client tries to do lookup and they will endup creating lookup-connection with one broker. Also, lookup connection is persistent so, it will be used forever for that client so, that one broker will endup serving all the lookup requests. So, does it make sense to not keep lookup-connection persistent and can be closed after fixed time..?? |
/** | ||
* | ||
* @param address | ||
* remote client {@link InetSocketAddress} of the to connect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentence seems to be incomplete?
@@ -377,6 +378,7 @@ SocketAddress serverAddrees() { | |||
ctx.writeAndFlush(cmd).addListener(writeFuture -> { | |||
if (!writeFuture.isSuccess()) { | |||
log.warn("{} Failed to send request to broker: {}", ctx.channel(), writeFuture.cause().getMessage()); | |||
pendingRequests.remove(requestId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be getAndRemovePendingLookupRequest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, actually we had a bug where if we failed to publish message on socket then we should remove from the queue. line:341 removes from lookup-PendingRequests
and this one removes from pendingRequests
cnx.channel().disconnect(); | ||
} | ||
}, connectionLifetimeInSecond, TimeUnit.SECONDS); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we closing the connection irrespective of it's active or not? We should close it only if it's inactive for the last n minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our goal is to not keep lookup connection persistent forever and close after n minute. however, we can add check to close connection if not active for last n minute. i will add another commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saandrews updated PR with the change.
Good point.. In that case all the connections would stick to same broker. My only concern is that by closing the connection after X time will make every lookup after a failover to open a new TCP connection (one per client) and re-do authentication, while if we maintain the connection open it would be ready to use. I'm not sure that would be a big deal though. |
In that case, it would behave similar to our http lookup for the first request. Should be ok I think. |
There are few more scenarios to address: Handling failure in case broker is unable to serve the lookup(similar to http 500) and closing it only if inactive for the configured duration. Will be addressed in a separate PR. |
True, it would kind of be the same. One other option could be to react on the lookups being rejected by the broker (from #181) and close the connection only if errors are received in a certain timeframe. |
Does it mean client should close the connection when broker gives
But I think I agree with the idea that if client gets Same as like #265 we close http-connection on internal-server-error, we need to do it for binary-lookup. So, I will combine 2 changes in new PR:
And we can think of if we don't need this PR-change (close persistent-connection) then we may close this one. Any thought? |
My reasoning is:
Yep, that's what I was thinking |
@rdhabalia Should this PR be closed in favor of #274 ? |
yes, let's close this one. |
Fix apache#252 ### Motivation When bundle unload triggered, the `consumerTopicManagers` cache won't be evicted and it will return the old `KafkaTopicConsumerManager` instance in the next fetch request handle. However, after bundle unload, the producer/consumer/managedLedger of topics in related bundle will be closed. If we use old `KafkaTopicConsumerManager` instance to read messages, it will return `managedLedger has been closed` exception. ### Changes 1. Change `consumerTopicManagers`, `topics`, `references` map to static attribute ConcurrentHashMap. 2. Evict related cache information for topics whose bundle trigged unload. 3. Turn on `DistributedClusterTest `.
Motivation
PulsarClient creates a separate dedicated connection with broker for a given service-lookup url. So, in case of broker's cold-restart, all clients may end-up creating lookup-connection to same one broker and this connection will be persisted for entire lifetime of the client, which may create lookup-overload on that one broker. So, lookup connection should not be persistent and it should be closed after sometime.
Modifications
Result
Lookup connection will not be persistent and it will avoid one broker to be overloaded with lookup requests.