Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot reach clickhouse host? #92

Open
sunny19930321 opened this issue May 14, 2020 · 20 comments
Open

cannot reach clickhouse host? #92

sunny19930321 opened this issue May 14, 2020 · 20 comments

Comments

@sunny19930321
Copy link

When using agents, what are the causes of the following problems?
Caused by: java.lang.Throwable: [ Id: 160857F337DC563A; User "tmplarge"(1) proxying as "default"(1) to "d085126100.aliyun.com:8123"(6); RemoteAddr: "10.13.56.73:51080"; LocalAddr: "10.85.129.101:9090"; Duration: 825 ?s]: cannot reach d085126100.aliyun.com:8123; query: "select timezone()\nFORMAT TabSeparatedWithNamesAndTypes;"

@VitoLiao
Copy link

We are meet same question, our chproxy version is v1.14.0, any body know how to fix it ?

@hagen1778
Copy link
Contributor

Can you verify if this request works without using agent, maybe via curl or any other http client?

@sidanasparsh
Copy link

Facing the same issue intermittently? Are there updates on this?

@apetrov88
Copy link

apetrov88 commented Jul 11, 2020

+1

[502] [ Id: 161FB3A842CE4135; User "default"(1) proxying as "default"(1) to "CHCluster03:8123"(6); RemoteAddr: "10.0.0.24:51324"; LocalAddr: "10.0.0.12:9090"; Duration: 16798 μs]: cannot reach CHCluster03:8123; query: ..

CHProxy 1.14

@sunny19930321
Copy link
Author

Can you verify if this request works without using agent, maybe via curl or any other http client?

@hagen1778 ,Verify that CK service is ok to request

@sunny19930321
Copy link
Author

Facing the same issue intermittently? Are there updates on this?

@sidanasparsh Try to adjust the timeout, but it doesn't seem to work

@sunny19930321
Copy link
Author

Facing the same issue intermittently? Are there updates on this?

Have you found the cause of this problem?

@sunny19930321
Copy link
Author

+1

[502] [ Id: 161FB3A842CE4135; User "default"(1) proxying as "default"(1) to "CHCluster03:8123"(6); RemoteAddr: "10.0.0.24:51324"; LocalAddr: "10.0.0.12:9090"; Duration: 16798 μs]: cannot reach CHCluster03:8123; query: ..

CHProxy 1.14

Have you found the cause of this problem?

@karas2015
Copy link

+1, how to fix it?

@hagen1778
Copy link
Contributor

hagen1778 commented Jan 16, 2021

Proxy returns "cannot reach" error when it unable to establish connection to the given address https://github.com/Vertamedia/chproxy/blob/1758e7399fe57c97aeec8e55dd13c6300399969b/proxy.go#L186-L193. I do not know what causes this because a lot of things can be involved to affect reachability between "your_application"<=>"chproxy"<=>"clickhouse" scheme.
Btw, proxy exposes host_health metric (and a plenty of others) to show if configured CH host is reachable. Can you check the state of this metric in the moments when query from the agent fails? If you send queries without agent - does it work?

@JustHarris
Copy link

We are experiencing the same kind of connection issues

ERROR: 2021/02/03 06:36:39 proxy.go:192: [ Id: 165F9CDFD12DDA70; User "compass-insert-hits"(1) proxying as "admin"(1) to "myclusternode02.io"(6); RemoteAddr: "100.64.4.2:50028"; LocalAddr: "100.65.205.183:80"; Duration: 60133561 μs]: cannot reach myclusternode02.io:8123; query: "INSERT INTO hits ........."
...
ERROR: 2021/02/03 13:05:11 scope.go:643: error while health-checking "myclusternode01.io:8123" host: cannot send request in 3.00015176s: Get "http://myclusternode01.io.io:8123/?query=SELECT%201": context deadline exceeded
.....
ERROR: 2021/02/04 11:46:26 proxy.go:192: [ Id: 165F9CDFD1355353; User "newsroom-cache"(1) proxying as "default"(1) to "asinglechnode.io:8123"(6); RemoteAddr: "100.65.205.170:47344"; LocalAddr: "100.65.205.183:80"; Duration: 1097 μs]: cannot reach asinglechnode.io:8123; query: "SELECT maxMerge(Hit.event_time) as maxEventTime......"

We have done a full network check simulating the healthchecks with curl and checking for packet loss with mtr with no luck. The network is working flawless with not even a single packet lost or a failed curl.

@akimrx
Copy link

akimrx commented Mar 12, 2021

We have same problem with three clickhouse nodes.
Also, ClickHouse backends available at this time directly.

@bzed
Copy link

bzed commented Jul 7, 2021

The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.

@bzed
Copy link

bzed commented Jul 7, 2021

Using

diff --git a/proxy.go b/proxy.go
index 11684b6..356cd2c 100644
--- a/proxy.go
+++ b/proxy.go
@@ -3,6 +3,7 @@ package main
 import (
        "context"
        "fmt"
+       "net"
        "net/http"
        "net/http/httputil"
        "net/url"
@@ -43,6 +44,24 @@ func newReverseProxy() *reverseProxy {
                        // Suppress error logging in ReverseProxy, since all the errors
                        // are handled and logged in the code below.
                        ErrorLog: log.NilLogger,
+                       ErrorHandler: func(rw http.ResponseWriter, req *http.Request, err error) {
+                               log.Errorf("http: proxy error: %v", err)
+                               rw.WriteHeader(http.StatusBadGateway)
+                       },
+                       Transport: &http.Transport{
+                               // DisableKeepAlives: false,
+                               // Proxy: http.ProxyFromEnvironment,
+                               DialContext: (&net.Dialer{
+                                       Timeout:   2 * time.Second,
+                                       KeepAlive: 2 * time.Second,
+                                       DualStack: true,
+                               }).DialContext,
+                               // ForceAttemptHTTP2:     true,
+                               // MaxIdleConns:          100,
+                               // IdleConnTimeout:       90 * time.Second,
+                               // TLSHandshakeTimeout:   10 * time.Second,
+                               // ExpectContinueTimeout: 1 * time.Second,
+                       },
                },
                reloadSignal: make(chan struct{}),
                reloadWG:     sync.WaitGroup{},

seems to work fine so far. Based on #121 - but just lowering keepalive time/timeouts.

@bzed
Copy link

bzed commented Jul 7, 2021

Seems to make things better, but doesn't fix them unfortunately.

@bzed
Copy link

bzed commented Jul 7, 2021

Jul 07 11:48:00 mon01 chproxy[40652]: ERROR: 2021/07/07 09:48:00 proxy.go:48: http: proxy error: net/http: HTTP/1.x transport connection broken: write tcp 127.0.0.1:58570->127.0.0.1:8123: write: broken pipe

@mchades
Copy link

mchades commented Aug 17, 2021

The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.

@bzed I also found the similar phenomenon, but I think the HTTP param <keep_alive_timeout>3</keep_alive_timeout> in CH is corresponding to IdleConnTimeout of Transport in CHProxy.

so I tune the keep_alive_timeout in CH to 90, as the same value in CHProxy. That does works! The cannot reach host error was gone.

@sunny19930321
Copy link
Author

@mchades very good

@den-crane
Copy link

den-crane commented Oct 16, 2023

client keepalive should substantially bigger than server keepalive ClickHouse/ClickHouse#52571 (comment) ClickHouse/ClickHouse#53068

--
update: bigger not smaller

@egplat
Copy link

egplat commented Nov 13, 2023

The default keepalive-timeout in CH is 3 seconds, while the default keepalive time used by net.Dialer is 15 seconds. That won't work.

@bzed I also found the similar phenomenon, but I think the HTTP param <keep_alive_timeout>3</keep_alive_timeout> in CH is corresponding to IdleConnTimeout of Transport in CHProxy.

so I tune the keep_alive_timeout in CH to 90, as the same value in CHProxy. That does works! The cannot reach host error was gone.

That Did work! The following code demonstrates the related settings for HTTP Transport in the CH proxy.
transport := &http.Transport{ Proxy: http.ProxyFromEnvironment, DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) { dialer := &net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, } return dialer.DialContext(ctx, network, addr) }, ForceAttemptHTTP2: true, MaxIdleConns: cfgCp.MaxIdleConns, MaxIdleConnsPerHost: cfgCp.MaxIdleConnsPerHost, IdleConnTimeout: 9 * time.Second, TLSHandshakeTimeout: 10 * time.Second, ExpectContinueTimeout: 1 * time.Second, }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests