Logspout stops posting logs from some containers #377

janantala · 2018-01-31T08:40:21Z

Hello there,

I use logspout in docker cloud where I have multiple app containers in the single server. However logspout sometimes stops sending logs to cloudwatch for some of them. I see no error, they are just missing. Logspout container restart helps. This is my configuration:

logspout:
  autoredeploy: true
  command: 'cloudwatch://eu-west-1?DEBUG=1'
  deployment_strategy: every_node
  environment:
    - ALLOW_TTY=true
    - AWS_ACCESS_KEY_ID=*redacted*
    - AWS_SECRET_ACCESS_KEY=*redacted*
    - INACTIVITY_TIMEOUT=1m
    - LOGSPOUT=ignore
    - LOGSPOUT_GROUP=$DOCKERCLOUD_NODE_HOSTNAME
  image: 'mdsol/logspout:latest'
  restart: on-failure
  volumes:
    - '/var/run/docker.sock:/tmp/docker.sock'

The text was updated successfully, but these errors were encountered:

tf42src · 2018-03-14T01:09:03Z

Same issue, similar config. I can see quite a few error messages like these:

2018/03/14 01:07:36 pump.pumpLogs(): d33949a9fd1d stopped with error: inactivity time exceeded timeout

for various containers. Is that actually an error or not? Does it mean it didn't receive logs within a certain timeframe oder something actually broke?

Running latest logspout including the #374 fix from our forked repo.

cat /src/VERSION
v3.3-dev
docker --version
Docker version 17.12.1-ce, build 7390fc6

tf42src · 2018-03-29T00:17:41Z

I tried switching to TCP as I could as well see some "message too long" errors. Now I am not getting any logs in my ELK stack anymore, but pump.Run() event: 19757wcrb4zq network:disconnect messages instead (from the logspout container using docker service logs <logspout service>).
I can successfully connect to logstash using telnet <ELK IP> 5000.

Apparently 19757wcrb4zq is a container id. Why am I getting disconnect messages for container ids? Is logspout maintaining one tcp connection to logstash per container?

Please shed some light.

nehresma · 2018-05-02T18:32:16Z

I experience a similar issue with streaming to a remote syslog at papertrailapp. Restarting the logspout container fixes it. Came here looking to see if I'm the only one; apparently I'm not. ☹️

shengsh · 2018-05-11T10:55:15Z

I had the same problem。Some containers log can't be send to logstash. Restarting the logstpout can fix it。But, after a few minutes, the problem has arisen again. There was no error message of logspout. Some information:

logspout version : v3.3-dev
docker ：version : 17.12.0-ce, build c97c6d6
log-dirver : json-file
log-opts : max_size:1g

@michaelshobbs

richardARPANET · 2018-07-02T15:03:27Z

I've also ran into this issue multiple times. Before it was happening due to losing connection to papertrail. The logspout container logs did show that.

But now its happening and there is no interesting log output from the logspout container.

gbolo · 2018-08-16T18:15:44Z

can someone provide full log output from logspout container when this happens?

do you see the following message? pump.pumpLogs(): <container_id> dead

lionheart2016 · 2019-08-22T03:04:41Z

nothing more？

richardARPANET · 2019-08-23T09:53:30Z

@lionheart2016 setting the env var RETRY_COUNT to something like 30 helped for me (how many times to retry a broken socket (default 10))

forestjohnsonpeoplenet · 2019-09-18T21:39:46Z

I might be having the same problem. I believe it is related to this: but I am not sure.

977986a

Use InactivityTimeout to work around a bug in docker.

Log rotation causes 'docker logs --follow' to stop outputing logs
(moby/moby#23913).

moby/moby#23913

BUG REPORT: docker logs follow gets stuck when the json-file log driver rotates files #23913

forestjohnsonpeoplenet · 2019-09-18T23:26:36Z

I'm looking into this, what I found is this loop inside pump.go:

logspout/router/pump.go

Line 234 in e671009

for {

It looks to me like that loop is intended to spin around and around relatively slowly, basically, it is supposed to re-create this long-running streaming HTTP request to the docker API, whose response body contains the log stream from the desired container.

However, in my environment, that loop spins around many many times per second, consuming half of the CPU cycles on the host!! Therefore, as an experiment, I tried setting INACTIVITY_TIMEOUT=10s and changing

			sinceTime = time.Now()
			if err == docker.ErrInactivityTimeout {
				sinceTime = sinceTime.Add(-inactivityTimeout)
			}

to


			sinceTime = time.Now()
			if err == docker.ErrInactivityTimeout {
				sinceTime = sinceTime.Add(-inactivityTimeout)
			} else {
				time.Sleep(inactivityTimeout)
			}

This way at least this silly loop will not spin any faster than once every 10 seconds. Also, due to the way sinceTime is handled, it should not miss any logs either.

I've also started looking at collecting metrics from the log pumps and then configuring them to be automatically re-booted when they stop reporting log lines being pumped.

I have been testing it using this docker-compose:

version: '2'
services:
  logspout-test:
    image: logspout:test
    mem_reservation: 50428672
    environment:
      BACKLOG: 'false'
      DEBUG: '1'
      LOGSPOUT: ignore
      ROUTE_URIS: raw://output-tester:8080?filter.labels=forest.test.logger:*
      PUMP_REQUIRE_LABEL: forest.test.logger:*
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock
    logging:
      driver: json-file
      options:
        max-file: '2'
        max-size: 10m
    command:
    - ./app
  logger:
    image: ubuntu
    entrypoint: /bin/bash
    command: ["-c", "while true; do cat /dev/urandom | tr -dc \"a-zA-Z0-9\" | fold -w 128 | head -n 100; sleep 0.1; done"]
    logging:
      driver: json-file
      options:
        max-file: '2'
        max-size: 1m
    labels:
      forest.test.logger: true
  output-tester:
    image: udp-byte-counter:1
    logging:
      driver: json-file
      options:
        max-file: '2'
        max-size: 10m

Due to that afformentioned bug in docker, with this setup, the stream of logs will cease when the log rotates:

9/10/2019 9:53:06 AM2019/09/10 14:53:06 bytes read: 0
9/10/2019 9:53:07 AM2019/09/10 14:53:07 bytes read: 0
9/10/2019 9:53:08 AM2019/09/10 14:53:08 bytes read: 0
9/10/2019 9:53:09 AM2019/09/10 14:53:09 bytes read: 0
9/10/2019 9:53:10 AM2019/09/10 14:53:10 bytes read: 0
9/10/2019 9:53:11 AM2019/09/10 14:53:11 bytes read: 0
9/10/2019 9:53:12 AM2019/09/10 14:53:12 bytes read: 0
9/10/2019 9:53:13 AM2019/09/10 14:53:13 bytes read: 75465
9/10/2019 9:53:14 AM2019/09/10 14:53:14 bytes read: 150285
9/10/2019 9:53:15 AM2019/09/10 14:53:15 bytes read: 217365
9/10/2019 9:53:16 AM2019/09/10 14:53:16 bytes read: 294765
9/10/2019 9:53:17 AM2019/09/10 14:53:17 bytes read: 359265
9/10/2019 9:53:18 AM2019/09/10 14:53:18 bytes read: 423249
9/10/2019 9:53:19 AM2019/09/10 14:53:19 bytes read: 423249
9/10/2019 9:53:20 AM2019/09/10 14:53:20 bytes read: 423249
9/10/2019 9:53:21 AM2019/09/10 14:53:21 bytes read: 423249
9/10/2019 9:53:22 AM2019/09/10 14:53:22 bytes read: 432284
9/10/2019 9:53:23 AM2019/09/10 14:53:23 bytes read: 436330
9/10/2019 9:53:24 AM2019/09/10 14:53:24 bytes read: 436330
9/10/2019 9:53:25 AM2019/09/10 14:53:25 bytes read: 436330
9/10/2019 9:53:26 AM2019/09/10 14:53:26 bytes read: 436330
9/10/2019 9:53:27 AM2019/09/10 14:53:27 bytes read: 436330
9/10/2019 9:53:28 AM2019/09/10 14:53:28 bytes read: 436330
9/10/2019 9:53:29 AM2019/09/10 14:53:29 bytes read: 436330
9/10/2019 9:53:30 AM2019/09/10 14:53:30 bytes read: 436330
9/10/2019 9:53:31 AM2019/09/10 14:53:31 bytes read: 436330
9/10/2019 9:53:32 AM2019/09/10 14:53:32 bytes read: 436330
9/10/2019 9:53:33 AM2019/09/10 14:53:33 bytes read: 436330
9/10/2019 9:53:34 AM2019/09/10 14:53:34 bytes read: 436330
9/10/2019 9:53:35 AM2019/09/10 14:53:35 bytes read: 436330
9/10/2019 9:53:36 AM2019/09/10 14:53:36 bytes read: 436330
9/10/2019 9:53:37 AM2019/09/10 14:53:37 bytes read: 436330
9/10/2019 9:53:38 AM2019/09/10 14:53:38 bytes read: 436330
9/10/2019 9:53:39 AM2019/09/10 14:53:39 bytes read: 436330

I am hoping to use this setup to test my metrics-based pump-restarter solution.

forestjohnsonpeoplenet · 2019-09-18T23:59:04Z

@gbolo

can someone provide full log output from logspout container when this happens?

It doesn't log anything. We see the log stream drop off in our log aggregation system (1 container at a time, not all at once) and there is zilcho in the logspout log around that time that it drops off. Even with DEBUG=1 logging on.

Now, this statement is true (for us) but it might be a bit misleading for everyone else. Let me explain, like I mentioned before, in our environment that loop inside pumpLogs() spins like crazy, so I did create another log level called trace for the following lines:

logspout/router/pump.go

Line 235 in e671009

debug("pump.pumpLogs():", id, "started, tail:", tail)

logspout/router/pump.go

Line 251 in e671009

debug("pump.pumpLogs():", id, "stopped")

logspout/router/pump.go

Line 249 in e671009

debug("pump.pumpLogs():", id, "stopped with error:", err)

also, line 245 was changed, I changed

			if err != nil {
				debug("pump.pumpLogs():", id, "stopped with error:", err)
			} else {
				debug("pump.pumpLogs():", id, "stopped")
			}

TO

			if err != nil {
				if err.Error() == "inactivity time exceeded timeout" {
					traceLog("pump.pumpLogs():", normalName(container.Name), "stopped with error:", err)
				} else {
					debugLog("pump.pumpLogs():", normalName(container.Name), "stopped with error:", err)
				}
			} else {
				traceLog("pump.pumpLogs():", normalName(container.Name), "stopped")
			}

So, besides those things which are called traceLog in my code, it does not log anything at all when it dies.

I was able to get my metrics-based "kick the tv to get the antenna to work" solution to at least "go off" in my test environment.

9/18/2019 6:38:25 PMtagValuesCSV: b6bc6e69659f,r-forest-test-logspout-logspout-test-logger-1-8e33059e, numberOfSamplesInARowThatHadZeroLogs: 0
9/18/2019 6:38:35 PMtagValuesCSV: b6bc6e69659f,r-forest-test-logspout-logspout-test-logger-1-8e33059e, numberOfSamplesInARowThatHadZeroLogs: 0
9/18/2019 6:38:45 PMtagValuesCSV: b6bc6e69659f,r-forest-test-logspout-logspout-test-logger-1-8e33059e, numberOfSamplesInARowThatHadZeroLogs: 1
9/18/2019 6:38:45 PM2019/09/18 23:38:45 tagValuesCSV: b6bc6e69659f,r-forest-test-logspout-logspout-test-logger-1-8e33059e, numberOfSamplesInARowThatHadZeroLogs: 1, distanceFromAverageToZeroInStandardDeviations: 1.59, logStreamIsDead: true
9/18/2019 6:38:45 PM2019/09/18 23:38:45 consumeDeadPumpAlerts() pumpId: b6bc6e69659f
9/18/2019 6:38:45 PM2019/09/18 23:38:45 pump.newContainerPump(): r-forest-test-logspout-logspout-test-logger-1-8e33059e stderr: EOF
9/18/2019 6:38:45 PM2019/09/18 23:38:45 pump.newContainerPump(): r-forest-test-logspout-logspout-test-logger-1-8e33059e stdout: EOF
9/18/2019 6:38:45 PM2019/09/18 23:38:45 pump.pumpLogs(): now tailing logs for: r-forest-test-logspout-logspout-test-logger-1-8e33059e

If it helps us solve our problem for real, I will see what I can do to contribute it back. However its quite a hack at this point :(

forestjohnsonpeoplenet · 2019-09-23T16:19:39Z

I was able to solve this by re-creating the pumps when metrics detect the log stream has halted un-expectedly.

This is a graph of time intervals and whether or not they contained at least 1 log message from each of our container instances. As you can see prior to my changes the logs from certain containers would drop off over time.

When I get time (when im not at work), I can push a slightly-cleaned-up version of my code to a fork for you guys.

(Edit: over longer intervals of time my metrics based alert and re-create system proved to be imperfect. Usually on containers which log infrequently. Its possible for the alert to get confused about the difference between a container that logs infrequently and a container that died. when this happens it does not restart the pump and things can still die permanently)

forestjohnsonpeoplenet · 2019-10-02T22:29:51Z

Please recognize that this is not meant to be a perfect contribution ready for merge. I am just sharing what I did in hopes that someone else can learn from it, take and run with it or use this as a stepping stone to discover a better solution.

Note that I did not test this at all beyond running make so its possible I jacked something up when I was patching this in.

forestjohnsonpeoplenet@d927709

My code will attempt to send InfluxDB formatted metrics to a configurable endpoint as well (telegraf or influxdb). Remove this yourself if you wish. Otherwise you can leave it on in UDP mode and it will just scream into the void maybe??

Note that I also changed the default value for the inactivity timeout to 10s. And I added this sleep to prevent accidental DOS attacks against the docker API when the Docker API is being buggy: forestjohnsonpeoplenet@d927709#diff-c035a4a8cbe5cb4d97ac075d044b8b84R303

Also, note that my code will adjust sinceTime, pushing it into the past sufficiently to ensure that no log messages will be skipped when the pump is restarted. As a side effect of this, some log messages may be duplicated when this happens. Keep that in mind if you chose to use this code. 🤷‍♂️

Alternatively, you may be able to fix this issue by updating to the the latest verison of docker. I say that because I was not able to re-produce the issue on my local machine when using the latest version of Docker CE.

jackbenny · 2020-05-11T21:05:49Z

Have this been fixed yet (or have @forestjohnsonpeoplenet solution been implemented)?

I'm having the same issue, and not even @richardARPANET solution with RETRY_COUNT=30 seems to help for me. It usually occurs after 18-24 hours of silence. After that no more logs are being sent (tcp+syslog). The only workaround I've found so far is using udp+syslog instead. The problem have never occurred with udp+syslog as far as I know.

pongnguy · 2021-08-06T17:51:47Z

Does logspout send logs only from currently running containers? I am not seeing logs from container which execute and then terminate. Logs are sent if the same container is kept running (and without the -t option).

Hossy mentioned this issue Oct 5, 2022

Logspout fails to "spout" after TCP/UDP connection breaks and doesn't try to reconnect #516

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logspout stops posting logs from some containers #377

Logspout stops posting logs from some containers #377

janantala commented Jan 31, 2018

tf42src commented Mar 14, 2018 •

edited

Loading

tf42src commented Mar 29, 2018

nehresma commented May 2, 2018 •

edited

Loading

shengsh commented May 11, 2018 •

edited

Loading

richardARPANET commented Jul 2, 2018 •

edited

Loading

gbolo commented Aug 16, 2018 •

edited

Loading

lionheart2016 commented Aug 22, 2019

richardARPANET commented Aug 23, 2019

forestjohnsonpeoplenet commented Sep 18, 2019

forestjohnsonpeoplenet commented Sep 18, 2019 •

edited

Loading

forestjohnsonpeoplenet commented Sep 18, 2019 •

edited

Loading

forestjohnsonpeoplenet commented Sep 23, 2019 •

edited

Loading

forestjohnsonpeoplenet commented Oct 2, 2019 •

edited

Loading

jackbenny commented May 11, 2020

pongnguy commented Aug 6, 2021 •

edited

Loading

Logspout stops posting logs from some containers #377

Logspout stops posting logs from some containers #377

Comments

janantala commented Jan 31, 2018

tf42src commented Mar 14, 2018 • edited Loading

tf42src commented Mar 29, 2018

nehresma commented May 2, 2018 • edited Loading

shengsh commented May 11, 2018 • edited Loading

richardARPANET commented Jul 2, 2018 • edited Loading

gbolo commented Aug 16, 2018 • edited Loading

lionheart2016 commented Aug 22, 2019

richardARPANET commented Aug 23, 2019

forestjohnsonpeoplenet commented Sep 18, 2019

forestjohnsonpeoplenet commented Sep 18, 2019 • edited Loading

forestjohnsonpeoplenet commented Sep 18, 2019 • edited Loading

forestjohnsonpeoplenet commented Sep 23, 2019 • edited Loading

forestjohnsonpeoplenet commented Oct 2, 2019 • edited Loading

jackbenny commented May 11, 2020

pongnguy commented Aug 6, 2021 • edited Loading

tf42src commented Mar 14, 2018 •

edited

Loading

nehresma commented May 2, 2018 •

edited

Loading

shengsh commented May 11, 2018 •

edited

Loading

richardARPANET commented Jul 2, 2018 •

edited

Loading

gbolo commented Aug 16, 2018 •

edited

Loading

forestjohnsonpeoplenet commented Sep 18, 2019 •

edited

Loading

forestjohnsonpeoplenet commented Sep 18, 2019 •

edited

Loading

forestjohnsonpeoplenet commented Sep 23, 2019 •

edited

Loading

forestjohnsonpeoplenet commented Oct 2, 2019 •

edited

Loading

pongnguy commented Aug 6, 2021 •

edited

Loading