Implement check for resolving localhost #1046

aelkugia · 2020-01-25T01:03:55Z

Describe the error

Customer reported the watcher container in the influxdb pod is constantly failing because it couldn't connect to influxdb as expected. After upgrading from 1.4.5 to 1.4.54, started to experience issues with InfluxDB. The pod kept restarting with status CrashLoopBackOff.

Cause of the error

The error was caused by a bad localhost resolution to an invalid ip which is not 127.0.0.1.

Expected behavior

A precheck should exist for localhost to resolve correctly in the cluster. This will ensure user fixes dns before continuing with installation.

The text was updated successfully, but these errors were encountered:

aelkugia · 2020-01-27T23:17:56Z

It was found that resolving from the dns server instead of /etc/hosts/is the Go behavior if cannot find /etc/nsswitch file in the container.

There is a fix expected in Go 1.15 release. Discussion can be found here: golang/go#35305

terrywang · 2020-01-31T00:43:17Z

Just to correct, it's the /etc/nsswitch.conf file provided by glibc package on Fedora / EL and libc-bin on Debian/Ubuntu.

NOTE: On Arch Linux and its variants (e.g. Manjaro) that I use as workstation, the file is provided by filesystem.

knisbet · 2020-02-28T19:16:06Z

It might be possible to just hardcode localhost in the coredns configuration. Depends a bit on whether the query is for localhost or localhost..

knisbet · 2020-03-06T20:34:54Z

This is fixed in 5.5.x by #1214 / gravitational/monitoring-app#145 by fixing the cause of leaking localhost queries, as opposed to implementing a check for a capturing DNS resolver which is a valid customer deployment option.

aelkugia added this to the Gravity Reliability Engineering Improvements milestone Jan 25, 2020

aelkugia assigned gravitational-jenkins Jan 25, 2020

r0mant changed the title ~~InfluxDB failed to start (CrashLoopBackOff) - improve pre-check~~ Implement check for resolving localhost Feb 19, 2020

r0mant added port/5.5 Requires port to version/5.5.x port/6.1 Requires port to version/6.1.x port/6.3 Requires port to version/6.3.x kind/enhancement New feature or request priority/1 Medium priority support-load Mark issues that increase support load labels Feb 19, 2020

aelkugia closed this as completed Mar 11, 2020

wadells unassigned gravitational-jenkins Apr 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement check for resolving localhost #1046

Implement check for resolving localhost #1046

aelkugia commented Jan 25, 2020

aelkugia commented Jan 27, 2020

terrywang commented Jan 31, 2020

knisbet commented Feb 28, 2020

knisbet commented Mar 6, 2020

Implement check for resolving localhost #1046

Implement check for resolving localhost #1046

Comments

aelkugia commented Jan 25, 2020

aelkugia commented Jan 27, 2020

terrywang commented Jan 31, 2020

knisbet commented Feb 28, 2020

knisbet commented Mar 6, 2020