You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the follow deployment script, curl https://raw.githubusercontent.com/kubeflow/kubeflow/v0.2.2/scripts/deploy.sh | bash.
Ambassador failed to start on one node.
kubectl logs --namespace kubeflow ambassador-849fb9c8c5-kgrkb ambassador
./entrypoint.sh: set: line 65: can't access tty; job control turned off
2018-07-31 05:46:50 kubewatch 0.30.1 INFO: generating config with gencount 1 (4 changes)
2018-07-31 05:46:56 kubewatch 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7383625940>: Failed to establish a new connection: [Errno -3] Try again',))
2018-07-31 05:46:56 kubewatch 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7383625940>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1533016011.063859}
[2018-07-31 05:46:56.133][10][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-07-31 05:46:56.133][10][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-07-31 05:46:56.150][10][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-07-31 05:46:56.150][10][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 11:diagd 12:envoy 13:kubewatch
[2018-07-31 05:46:56.556][14][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-07-31 05:46:57.574][14][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-07-31 05:46:57.767][14][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-07-31 05:46:57.767][14][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-07-31 05:46:57.769][14][info][main] source/server/server.cc:359] starting main dispatch loop
2018-07-31 05:47:04 diagd 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0bee6d95f8>: Failed to establish a new connection: [Errno -3] Try again',))
2018-07-31 05:47:04 diagd 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f0bee6d95f8>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1533016019.808133}
2018-07-31 05:47:14 kubewatch 0.30.1 INFO: generating config with gencount 2 (4 changes)
2018-07-31 05:47:19 kubewatch 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6fbb8468d0>: Failed to establish a new connection: [Errno -3] Try again',))
2018-07-31 05:47:19 kubewatch 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6fbb8468d0>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1533016034.702365}
[2018-07-31 05:47:19.770][26][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-07-31 05:47:19.771][26][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-07-31 05:47:19.788][26][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-07-31 05:47:19.788][26][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
unable to initialize hot restart: previous envoy process is still initializing
starting hot-restarter with target: /application/start-envoy.sh
forking and execing new child process at epoch 0
forked new child process with PID=14
got SIGHUP
forking and execing new child process at epoch 1
forked new child process with PID=27
got SIGCHLD
PID=27 exited with code=1
Due to abnormal exit, force killing all child processes and exiting
force killing PID=14
exiting due to lack of child processes
AMBASSADOR: envoy exited with status 1
Here's the envoy.json we were trying to run with:
{
"listeners": [
{
"address": "tcp://0.0.0.0:80",
"filters": [
{
"type": "read",
"name": "http_connection_manager",
"config": {"codec_type": "auto",
"stat_prefix": "ingress_http",
"access_log": [
{
"format": "ACCESS [%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\"\n",
"path": "/dev/fd/1"
}
],
"route_config": {
"virtual_hosts": [
{
"name": "backend",
"domains": ["*"],"routes": [
{
"timeout_ms": 3000,"prefix": "/ambassador/v0/check_ready","prefix_rewrite": "/ambassador/v0/check_ready",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_127_0_0_1_8877", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/ambassador/v0/check_alive","prefix_rewrite": "/ambassador/v0/check_alive",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_127_0_0_1_8877", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/ambassador/v0/","prefix_rewrite": "/ambassador/v0/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_127_0_0_1_8877", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/tfjobs/","prefix_rewrite": "/tfjobs/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_tf_job_dashboard_default", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/k8s/ui/","prefix_rewrite": "/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_kubernetes_dashboard_kube_system_otls", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 300000,"prefix": "/user/","prefix_rewrite": "/user/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_tf_hub_lb_default", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 300000,"prefix": "/hub/","prefix_rewrite": "/hub/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_tf_hub_lb_default", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/","prefix_rewrite": "/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_centraldashboard_default", "weight": 100.0 }
]
}
}
]
}
]
},
"filters": [
{
"name": "cors",
"config": {}
},{"type": "decoder",
"name": "router",
"config": {}
}
]
}
}
]
}
],
"admin": {
"address": "tcp://127.0.0.1:8001",
"access_log_path": "/tmp/admin_access_log"
},
"cluster_manager": {
"clusters": [
{
"name": "cluster_127_0_0_1_8877",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://127.0.0.1:8877"
}
]},
{
"name": "cluster_centraldashboard_default",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://centraldashboard.default:80"
}
]},
{
"name": "cluster_kubernetes_dashboard_kube_system_otls",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://kubernetes-dashboard.kube-system:443"
}
],
"ssl_context": {
}},
{
"name": "cluster_tf_hub_lb_default",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://tf-hub-lb.default:80"
}
]},
{
"name": "cluster_tf_job_dashboard_default",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://tf-job-dashboard.default:80"
}
]}
]
},
"statsd_udp_ip_address": "127.0.0.1:8125",
"stats_flush_interval_ms": 1000
}AMBASSADOR: shutting down
The text was updated successfully, but these errors were encountered:
@jlewi Thanks for your kindly reply. I checked DNS service on every node by executing nslookup kubernetes on busybox pod, and finally found that the node where ambassador cashed have a wrong DNS resolver address. The root cause was the configuration of kubelet, which used erroneous --cluster-dns
jlewi
changed the title
Ambassador failed to start using the 0.2.2 deploy script
ambassador crashing on node with wrong DNS resolver address due to misconfigured kubelet
Aug 2, 2018
surajkota
pushed a commit
to surajkota/kubeflow
that referenced
this issue
Jun 13, 2022
After the follow deployment script,
curl https://raw.githubusercontent.com/kubeflow/kubeflow/v0.2.2/scripts/deploy.sh | bash
.Ambassador failed to start on one node.
The text was updated successfully, but these errors were encountered: