oomparser: update to use kmsg based parser #1544

euank · 2016-11-20T07:01:51Z

My comment for PR kubernetes/node-problem-detector#41 (comment) applies here too.

This would fix kubernetes/kubernetes#34965, probably fix #1484, and work-around/fix #645.

Things that should be considered before merging:

anywhere kubelet or cadvisor runs in a container now needs access to /dev/kmsg. This should be documented as a breaking change and various examples / docs might need changing.
this reduces a little how far back we can see (namely after the system reboots, we don't get things before that). This implies that a "kubelet crash -> container OOM -> node reboot" chain of events can lose that container OOM.
this works everywhere where the kernel is modern enough to run docker! Fewer dependencies!

Testing note: I removed the log-file formatted files. The parsing of log-files is now the responsibility of the kmsgparser package which lives elsewhere, so we can trust it to do the right thing. We also get to assume the timestamp etc.
I have not done manual testing of this change, but I have done decent manual testing of the kmsgparser.

timstclair · 2016-11-21T19:45:28Z

Nice! This LGTM, but I'd like to get input from a few more people who understand the OOM pathways better than I do.

timstclair · 2016-11-21T19:46:12Z

@vishh - Could you take a look at this?

timstclair · 2016-11-21T19:25:57Z

utils/oomparser/oomparser.go

-	}()
+// StreamOoms writes to a provided a stream of OomInstance objects representing
+// OOM events that are found in the logs.
+// It will blocks and should be called from a goroutine.


nit: s/blocks/block/

timstclair · 2016-11-21T19:28:11Z

utils/oomparser/oomparser.go

-		ioreader: bufio.NewReader(tail),
-	}, nil
+	// Should not happen
+	glog.Warningf("exiting analyzeLines. OOM events will not be reported.")


nit: If this really should not happen it should be error level

Before it was info and I thought it should be error, so I picked the middle as a compromise 😇

timstclair · 2016-11-21T19:37:59Z

utils/oomparser/oomparser_test.go

+					"[ 1532]  1020  1532   410347   398810     788        0             0 badsysprogram",
+					"Out of memory: Kill process 1532 (badsysprogram) score 919 or sacrifice child",
+					"Killed process 1532 (badsysprogram) total-vm:1641388kB, anon-rss:1595164kB, file-rss:76kB",
+				},


Add a test case with multiple OOM events (note: I think writeAll needs to run in a goroutine to prevent deadlock)

timstclair · 2016-11-21T19:44:40Z

utils/oomparser/oomparser.go

-	parser, err = tryLogFile()
-	if err == nil {
-		return parser, nil
+	parser, err := kmsgparser.NewParser()


nit: I like the logger pattern, consider setting the logger to use glog for consistency?

euank · 2016-11-21T19:57:47Z

utils/oomparser/oomparser.go

+// OOM events that are found in the logs.
+// It will blocks and should be called from a goroutine.
+func (self *OomParser) StreamOoms(outStream chan<- *OomInstance) {
+	kmsgEntries := self.parser.Parse()


Question: Should we call self.parser.SeekEnd() here or not?

I think SeekEnd more closely matches the journalctl .. -f behavior of before (start at the end of kmsg), but I think that looking a little into the past makes the kubelet behave more correctly after a restart.

Will OOM events that happen while the kubelet is down, but are detected on kubelet startup actually do the right thing?
Are they idempotent so that power-cycling the kubelet having bunches of duplicate events is okay?

I think reporting recent OOM events is desireable, and it's also consistent with the non-journald behavior.

euank · 2016-11-21T21:43:24Z

@timstclair addressed your comments and fixed a bug where OOM events with a process name containing a - in them would break the parser (wouldn't match the end-line for that oom). Lucky me that my test program has a dash 😄

Happy to break that one-line fix out into its own PR if you'd like

euank · 2016-12-09T00:16:20Z

Rebased and squashed the commit addressing comments from back whenever.

goettl79 · 2017-01-23T08:26:01Z

This fix would also make the packaging issue of cadvisor / journalctl inside a docker container obsolete. #1313

euank · 2017-01-23T17:34:49Z

Bump on this, did a rote rebase and afaict there aren't open concerns (other than waiting on whether @vishh feels like reviewing)

euank · 2017-01-31T23:14:29Z

@k8s-bot test this

goettl79

Looks great to me. Most of the essential work is implemented in the dependency https://github.com/euank/go-kmsg-parser.

From my point of view this change should be integrated, although I'm not a official reviewer.

dchen1107 · 2017-03-11T00:49:38Z

cc/ @timstclair @dashpole @vishh

The oomparser logic would end up stuck, unable to detect the end of a given oom trace, for any process with name that didn't match \w+. This includes processes like "python3.4" due to the '.', or 'docker-containerd' due to the '-'. This fix was included in pr google#1544 last year, but since that PR seems dead it seems like a good idea to break this more important fix out. I've updated the test such that it would have caught this issue.

The oomparser logic would end up stuck, unable to detect the end of a given oom trace, for any process with a name that didn't match \w+. This includes processes like 'python3.4' due to the '.', or 'docker-containerd' due to the '-'. This fix was included in pr google#1544 last year, but since that PR seems dead it seems like a good idea to break this more important fix out. I've updated the tests such that they would have caught this issue.

euank · 2017-08-30T21:06:12Z

Rebased for probably the 10th or so time.

Manually tested via the following:

$ sudo ./cadvisor -v 3 -logtostderr
...
# cgroup OOM
$ docker run -m 15M euank/gunpowder-memhog:latest 30M
# cadvisor logs:
I0830 20:24:01.875221    3264 manager.go:1142] Created an OOM event in container "/docker/6da85dc9675db12149e970c6ef6835240bab712e5499963d13e88a446248e82b" at 2017-08-30 20:24:01.583571597 +0000 UTC

# system OOM
$ docker run -d euank/gunpowder-memhog 500M
$ sleep 5
$ echo f | sudo tee /proc/sysrq-trigger
# cadvisor logs
I0830 20:59:27.831699    1611 manager.go:1142] Created an OOM event in container "/" at 2017-08-30 20:59:27.717686391 +0000 UTC

@timstclair if this is going to land, can we land it?
Having to come by and rebase it once a month is ridiculous.

vishh · 2017-08-30T22:20:35Z

@euank I can help review this. is this ready for a review?

This provides much more robust support for kernel logs via accessing the `/dev/kmsg` interface to them directly.

euank · 2017-08-30T22:36:15Z

@vishh yup, it's been ready for review for 9 months now.

euank · 2017-08-30T22:48:49Z

🎉

This is required, because recent versions of containerd try to read from this file. Older LXC versions had a dedicated setting to create this symlink, so we have to rely on systemd here.

euank force-pushed the kmsg-parser branch from 6d6b3a7 to 1f66337 Compare November 20, 2016 07:03

timstclair reviewed Nov 21, 2016

View reviewed changes

euank commented Nov 21, 2016

View reviewed changes

euank force-pushed the kmsg-parser branch 2 times, most recently from b79ade5 to 5b2e38a Compare November 23, 2016 00:45

euank force-pushed the kmsg-parser branch from 5b2e38a to c4fe0b7 Compare December 9, 2016 00:15

euank force-pushed the kmsg-parser branch from c4fe0b7 to 3451c86 Compare January 23, 2017 17:33

goettl79 reviewed Mar 7, 2017

View reviewed changes

dchen1107 mentioned this pull request Mar 14, 2017

cAdvisor leaking journalctl processes kubernetes/kubernetes#34965

Closed

dashpole requested a review from vishh March 15, 2017 05:10

euank force-pushed the kmsg-parser branch from 3451c86 to e60c091 Compare July 27, 2017 01:41

euank mentioned this pull request Jul 27, 2017

oomparser: don't get stuck for certain processes #1706

Merged

sjenning mentioned this pull request Jul 27, 2017

use sdjournal instead of journalctl for oom parsing #1707

Closed

euank force-pushed the kmsg-parser branch from e60c091 to 3c2f498 Compare August 1, 2017 20:49

euank force-pushed the kmsg-parser branch from 3c2f498 to 5287626 Compare August 1, 2017 23:43

euank force-pushed the kmsg-parser branch from 5287626 to 5e8b3a2 Compare August 30, 2017 21:03

euank added 2 commits August 30, 2017 15:35

godeps: vendor in kmsgparser

cfb16f1

oomparser: update to use kmsg based parser

95f9c8c

This provides much more robust support for kernel logs via accessing the `/dev/kmsg` interface to them directly.

euank force-pushed the kmsg-parser branch from 5e8b3a2 to 95f9c8c Compare August 30, 2017 22:35

vishh approved these changes Aug 30, 2017

View reviewed changes

vishh merged commit 03d7288 into google:master Aug 30, 2017

euank deleted the kmsg-parser branch August 30, 2017 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oomparser: update to use kmsg based parser #1544

oomparser: update to use kmsg based parser #1544

euank commented Nov 20, 2016

timstclair commented Nov 21, 2016

timstclair commented Nov 21, 2016

timstclair Nov 21, 2016

timstclair Nov 21, 2016

euank Nov 21, 2016

timstclair Nov 21, 2016

euank Nov 21, 2016

timstclair Nov 21, 2016

euank Nov 21, 2016

timstclair Nov 21, 2016

euank commented Nov 21, 2016 •

edited

Loading

euank commented Dec 9, 2016

goettl79 commented Jan 23, 2017

euank commented Jan 23, 2017

euank commented Jan 31, 2017

goettl79 left a comment

dchen1107 commented Mar 11, 2017

euank commented Aug 30, 2017 •

edited

Loading

vishh commented Aug 30, 2017

euank commented Aug 30, 2017

euank commented Aug 30, 2017

oomparser: update to use kmsg based parser #1544

oomparser: update to use kmsg based parser #1544

Conversation

euank commented Nov 20, 2016

timstclair commented Nov 21, 2016

timstclair commented Nov 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

euank commented Nov 21, 2016 • edited Loading

euank commented Dec 9, 2016

goettl79 commented Jan 23, 2017

euank commented Jan 23, 2017

euank commented Jan 31, 2017

goettl79 left a comment

Choose a reason for hiding this comment

dchen1107 commented Mar 11, 2017

euank commented Aug 30, 2017 • edited Loading

vishh commented Aug 30, 2017

euank commented Aug 30, 2017

euank commented Aug 30, 2017

euank commented Nov 21, 2016 •

edited

Loading

euank commented Aug 30, 2017 •

edited

Loading