-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5.4.0 transport client failed to get local cluster state while using 5.3.0 to connect to 5.4.0 servers works #24575
Comments
It looks like a bug to me. Is sniffing enabled on your transport client? |
Yes it is enabled. |
Same issue here. We have:
Client and Elasticsearch both on the same machine, connecting through localhost:
The exception we have on startup:
|
I am seeing this issue as well on some nodes connecting to ES. We run a service that has multiple machines that each connect to ES, some of them are able to connect successfully and others do not. |
Thanks for reporting, I think I know where the issue is. |
With the current implementation, SniffNodesSampler might close the current connection right after a request is sent but before the response is correctly handled. This causes to timeouts in the transport client when the sniffing is activated. closes elastic#24575 closes elastic#24557
Thanks @tlrx. I'm not sure if you are also aware, but I also saw errors that looked like the following when I disabled sniffing.
If it helps we have a service discovery framework to discover services: (https://medium.com/airbnb-engineering/smartstack-service-discovery-in-the-cloud-4b8a080de619). We "randomly" pick an ES box to connect to and then use sniffing (if enabled) to discover the rest. Even though ES is running on 9200/9300 we use a different port on our client machines because of the service discovery framework does the correct routing. Both the service discovery port and the "direct access" port are reachable over the network. I am rolling back our transport client version to 5.3.2 and will report back on the results. |
…port handlers Today we prune transport handlers in TransporService when a node is disconnected. This can cause connections to starve in the TransportService if the connection is opened as a short living connection ie. without sharing the connection to a node via registering in the transport itself. This change now moves to pruning based on the connections cache key to ensure we notify handlers as soon as the connection is closed for all connections not just for registered connections. Relates to elastic#24632 Relates to elastic#24575 Relates to elastic#24557
Same here... 5.4.0 to 5.4.0 fails.... but 5.3.0 to 5.4.0 works |
…port handlers (#24639) Today we prune transport handlers in TransportService when a node is disconnected. This can cause connections to starve in the TransportService if the connection is opened as a short living connection ie. without sharing the connection to a node via registering in the transport itself. This change now moves to pruning based on the connections cache key to ensure we notify handlers as soon as the connection is closed for all connections not just for registered connections. Relates to #24632 Relates to #24575 Relates to #24557
…port handlers (#24639) Today we prune transport handlers in TransportService when a node is disconnected. This can cause connections to starve in the TransportService if the connection is opened as a short living connection ie. without sharing the connection to a node via registering in the transport itself. This change now moves to pruning based on the connections cache key to ensure we notify handlers as soon as the connection is closed for all connections not just for registered connections. Relates to #24632 Relates to #24575 Relates to #24557
…port handlers (#24639) Today we prune transport handlers in TransportService when a node is disconnected. This can cause connections to starve in the TransportService if the connection is opened as a short living connection ie. without sharing the connection to a node via registering in the transport itself. This change now moves to pruning based on the connections cache key to ensure we notify handlers as soon as the connection is closed for all connections not just for registered connections. Relates to #24632 Relates to #24575 Relates to #24557
Same here... 5.4.0 to 5.4.0 fails.... but 5.3.0 to 5.4.0 works |
I am seeing a similar exception in 2.3.1. Below is the exception:-
Is the issue not fixed in 2.3.1? |
I didn't test in 2.3.1 since the fix fixed a bug introduced in #22828 for 5.4.0. It's possible that this bug exists in 2.3.1 but this version is EOL and not supported anymore. |
ok thanks for the update.
Sent from GMail on Android
…On Oct 9, 2017 2:42 PM, "Tanguy Leroux" ***@***.***> wrote:
Is the issue not fixed in 2.3.1?
I didn't test in 2.3.1 since the fix fixed a bug introduced in #22828
<#22828> for 5.4.0. It's
possible that this bug exists in 2.3.1 but this version is EOL and not
supported anymore.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24575 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHw8JPzLkGBjp7CpN19QJ6_lhFvamNlLks5sqeOCgaJpZM4NWYB_>
.
|
Hi, I have a rest service using Netty as basis and connecting to ElasticSearch backend via java transport client API.
It worked very well with Netty 4.1.8 and ES 5.3.0.
Now I tried to upgrade ES backend and transport client to 5.4.0, and also Netty to 4.1.9. Then following problems happened:
10 May 2017;17:01:59.645 Developer linux-68qh [elasticsearch[client][generic][T#3]] INFO o.e.c.t.TransportClientNodesService - failed to get local cluster state for {#transport#-1}{WlTQjgcGQ1uqyNNsw4ZnAw}{127.0.0.1}{127.0.0.1:9300}, disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][127.0.0.1:9300][cluster:monitor/state] request_id [7] timed out after [5001ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:925)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I roll back the transport client to 5.3.0 but keep backend 5.4.0.
Then it is able to connect to Es backend.
I use SBT and the build dependencies for the error are:
"io.netty" % "netty-all" % "4.1.9.Final"
"org.elasticsearch" % "elasticsearch" % "5.4.0"
"org.elasticsearch.client" % "transport" % "5.4.0",
and "io.netty" % "netty-transport-native-epoll" % "4.1.9.Final" classifier "linux-x86_64"
Environment:
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (IcedTea 3.3.0) (suse-3.3-x86_64)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)
Linux linux-68qh 4.10.13-1-default #1 SMP PREEMPT Thu Apr 27 12:23:31 UTC 2017 (e5d11ce) x86_64 x86_64 x86_64 GNU/Linux
Thanks
The text was updated successfully, but these errors were encountered: