Add support for pushing logs to loki #1576

mstoykov · 2020-07-29T10:15:11Z

No description provided.

codecov-commenter · 2020-07-29T10:35:29Z

Codecov Report

Merging #1576 into master will decrease coverage by 1.08%.
The diff coverage is 26.24%.

@@            Coverage Diff             @@
##           master    #1576      +/-   ##
==========================================
- Coverage   77.14%   76.05%   -1.09%     
==========================================
  Files         162      163       +1     
  Lines       13255    13541     +286     
==========================================
+ Hits        10225    10299      +74     
- Misses       2509     2715     +206     
- Partials      521      527       +6

Impacted Files	Coverage Δ
cmd/root.go	`26.19% <7.89%> (-6.60%)`	⬇️
log/loki.go	`28.89% <28.89%> (ø)`
lib/executor/vu_handle.go	`93.69% <0.00%> (-1.81%)`	⬇️
js/runner.go	`83.04% <0.00%> (-0.70%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f4797d...abd5d93. Read the comment docs.

na--

I still haven't completely reviewed the loop() and push() logic, but pushing what I currently have as comments

cmd/root.go

na-- · 2020-07-29T10:57:12Z

cmd/root.go

+// RawFormater it does nothing with the message just prints it
 type RawFormater struct{}


There is a typo in the variable name, it should have 2 ts, RawFormatter. No idea how the comment was right but the variable name wrong, but please change the variable, not the comment

This is gofumpt 🤦

😕 Not sure what gofumpt has to do with this - the current variable name is wrong.

cmd/root.go

na-- · 2020-07-29T11:32:57Z

log/loki.go

+	// TODO use something better ... maybe
+	// https://godoc.org/github.com/kubernetes/helm/pkg/strvals
+	// atleast until https://github.com/loadimpact/k6/issues/926?


Ugh, yeah... strvals or something like that sounds better than rolling out our own parsing logic everywhere... Even URL-encoded values might work: https://github.com/google/go-querystring

Yeah, looking at the confusing mess of ,, ; and = signs that come from additionalParams, using a query string might not be a bad idea... https://github.com/gorilla/schema is another library that does it

Though strvals is probably going to be the most readable of these...

loki=somewhere:1233,labels.something=else,labels.foo=bar,limit=32,level=debug

hmm btw it's probably going to be easy to write something like strvals, but which instead converts its format to JSON, which you then can easily unmarshal into a struct without bothering with reflection... but that can go in #883, for now strvals+mapstructure is probably the way to go... and I'll stop this conversation with myself now, so I can push the review 😅

I don't mind the current parsing, but agree that having a consistent format for options like this and reusing the parsing code would be good. But it can also wait for #883, so not sure if this is a blocker.

na-- · 2020-07-29T11:40:51Z

log/loki.go

+		key := paramParts[0]
+		value := paramParts[1]
+		switch key {
+		case "additionalParams":


additionalParams is confusing, looking at the code, this is just for adding extra key=value labels to each log message, right? So, wouldn't something like extraLabels or simply labels be better?

log/loki.go

na-- · 2020-07-29T11:48:02Z

log/loki.go

+}
+
+// fill one of two equally sized slices with entries and then push it while filling the other one
+// TODO clean old entries after push?


Is this still relevant?

I don't think so ... I am pretty sure this was meant for when I was keeping logrus.Entry around for longer

na-- · 2020-07-29T11:49:50Z

log/loki.go

+		pushCh     = make(chan chan struct{})
+	)
+
+	defer close(pushCh)


Is this ever going to happen? I don't see anything closing the h.ch channel? 😕

The idea was that :

if we ever get out it should be closed ;)

if we have a context (sometime in the future maybe?) we can push one last time when it's closed

na-- · 2020-07-29T11:50:58Z

log/loki.go

+}
+
+func (h *lokiHook) start() {
+	h.ch = make(chan *logrus.Entry, 1000)


if there's a better (i.e. non manually-parsed) configuration, this should probably be configurable as a bufferSize parameter or something like that

We can have it even now :D

lib/executor/helpers.go

lib/netext/httpext/httpdebug_transport.go

log/loki.go

imiric · 2020-07-30T08:50:16Z

log/loki.go

+	// TODO use something better ... maybe
+	// https://godoc.org/github.com/kubernetes/helm/pkg/strvals
+	// atleast until https://github.com/loadimpact/k6/issues/926?


I don't mind the current parsing, but agree that having a consistent format for options like this and reusing the parsing code would be good. But it can also wait for #883, so not sure if this is a blocker.

log/loki.go

na-- · 2020-07-31T13:10:14Z

log/loki.go

+	for {
+		select {
+		case entry := <-h.ch:
+			if count == h.limit {


I think the check here should be if h.limit > 0 && count == h.limit, to more explicitly allow for unlimited messages when limit is 0, though I guess you can do it even now with -1...

the way the code is currently written I am pretty sure h.limit<1 will panic ;) Maybe something that should be an error during the parsing of the config

tested:
1 seems to work
0 means nothing is send(including the dropped message)
-1 (and below I would imagine) panics

the way the code is currently written I am pretty sure h.limit<1 will panic

That should probably be considered a bug, so 👍 for some validation. And 👍 for supporting 0 (i.e. no limit), since it seems reasonable to me that if I'm running my own loki instance, I'd like to get all of my log messages.

As I said before, I understand why the current limit per second exists, but I don't like it one bit. I can't think of any other piece of software that omits log messages from the middle... And while I can live with it as a form of abuse prevention / DoS protection in the cloud, for people self-hosting k6 and loki, it doesn't make any sense to have a limit that can eat your logs. Especially because, when loki is enabled, we set the regular output of the log to io.Discard (which I'm not sure is the correct action anymore...).

I will leave the "unlimited" for another PR - the current implementation is fast(er) because it uses the fact that I have an upper limit to the messages. Arguably anyone can set the limit to 1000000 and ... not hit it, even if they do all the other things in the current implementation will mean that that push to loki will take forever.

Hmm if you've set the limit to 1000000, you'd needlessly allocate a couple of slices with 1000000 elements, right? This seems like a premature optimization that might not always work now used as a justification for always requiring the limit 🤷‍♂️ But with 1000000 or even 10000, you'd likely never reach that many messages per second, so it'd probably have the opposite effect...

I like that the msgs buffers are reused, but wouldn't it be better to not have a limit and just append() stuff to the buffer, allowing it to grow naturally to its optimal size? After the first few extensions, it's likely to reach the limit or some equilibrium, if there's no limit (and the limit, if there's a low limit)?

Btw, not that it matters all that much, but with the current strategy of dropping log messages from the middle of the stream, and with messages arriving somewhat out of order, it will lead to the following strange situation. Say the limit is 100 messages per second, and the user is sending ~150. Normally, they'd see the first 100 messages every second, followed by a k6 dropped some packages because they were above the limit of 100/1s message for the remaining ~50. But, because the sorting happens after the dropping, some of these dropped messages could have arrived out-of-order and technically be with timestamps from before the k6 dropped some... message 😞

well yes ... but now imagine that the user said they thought they will get 10000 messages and they got 10000 times more ... they just ran out of memory ... the process died, they not only didn't get 10000 messages they got 0 messages and a big stacktrace.

Organically growing and unlimited are things that just lead to the software running out of memory and dying of that.

In a normal (perfectly running) and well-configured k6 none of this matters - we might have allocated 10x/100x/1000x times more space for eventual logs and handling them. But at least in the (more likely IMO scenario) where something isn't perfect and it generates 1000x more logs we don't suddenly have an ever-expanding list of logs ... which just kills k6 for no good reason and no explanation

I didn't request that you remove the limit altogether, or grow the msgs buffer infinitely if there's a limit, I even said "I understand why the current limit per second exists", so don't become snarky please 😛

To be even more clear, I'm not suggesting that you remove this code completely: https://github.com/loadimpact/k6/blob/5f39dc2ef5c070b74c6b7d0fd7959ee323af071d/log/loki.go#L213-L216

Just replace it with if h.limit > 0 && count == h.limit, ditch the fixed buffer sizes, and use append(msgs, tmpMsg{ ... }) instead of msgs[count] = tmpMsg{...}.

This way, if you have a limit (which would still be the case by default), the msgs buffer will never be able to grow above it, right? Or whatever power of 2 is above it, I guess, given how Go grows slices. But, if someone deliberately chooses to remove the limit completely, yes, they can potentially shoot themselves in the foot. Though, to be fair, there are probably a ton of other easier ways to run out of memory with k6 😉

Just read through this discussion + the one on Slack, and you both make valid points, so not sure who I agree more with, but here are my €0.02. 😅

The current limit is essentially a throttling feature, right? It tries to accomplish two things: don't overload Loki, and don't run out of memory in k6.

I agree with Ned's concerns about the current implementation dropping logs if the limit is reached, which you definitely never want (if it can be avoided), especially if it's statically defined for the entire process runtime. Consider that Loki's performance might vary over time, so this package artifically limiting the throughput even if Loki itself could handle more messages, or viceversa, sending more messages than Loki can handle because the limit was set too high and Loki is temporarily struggling with other clients... is not good.

If we do want to introduce something like this, then I would argue it should be more sophisticated:

React either on current RAM usage (probably has to wait for New CPU and memory usage metrics #888) and/or some type of Loki healthcheck or heuristic we can use to determine it's becoming "overloaded" (e.g. response times), and factor that into calculating what the next payload size should be and when to push the next request. Sort of like a smart backoff.

Don't just drop logs, but save them in a third "missed" buffer, and gradually feed them in subsequent requests. I think this should be fixed even if we don't implement the more advanced throttling.

I'm torn on being able to specify "unlimited" as an option. On the one hand we need some type of hard limit in the Cloud to prevent one test from flooding the Loki instance (BTW, how does this work in the backend exactly? It's one central instance for all tests?), but we don't care for it during development. I wouldn't mind having to set e.g. limit=1000000 to make that work, though.

Even after a week of vacation, I still think the most sensible choice is for the default limit to be 0 (i.e. "unlimited") and that the current fixed-length buffer approach is a case of way premature optimization 😉 It even makes more sense when you reverse the question, instead of asking "why not 0", to ask "why 100 and not 57 or 114?"... or "why not 50 or 64", if we're fans of round numbers 😉

That said, it's a minor issue in a feature most people probably won't use, so I don't really care if you're so determined to keep the current defaults. This is not a blocking issue from my side, in contrast to https://github.com/loadimpact/k6/pull/1576/files#r463614996

log/loki.go

Co-authored-by: na-- <n@andreev.sh>

log/loki.go

imiric · 2020-08-03T11:36:12Z

log/loki.go

+			oldCount, oldDropped := count, dropped
+			count, dropped = 0, 0
+			cutOff := <-ch
+			close(ch) // signal that more buffering can continue


I didn't mention it before, but I'm loving this double buffering approach, nice work! 👍

na--

Didn't find anything else objectionable, but the printing to stdout is an issue I think should be fixed

cmd/root.go

log/loki.go

na-- · 2020-08-11T12:01:15Z

log/loki.go

+	for {
+		select {
+		case entry := <-h.ch:
+			if count == h.limit {


Even after a week of vacation, I still think the most sensible choice is for the default limit to be 0 (i.e. "unlimited") and that the current fixed-length buffer approach is a case of way premature optimization 😉 It even makes more sense when you reverse the question, instead of asking "why not 0", to ask "why 100 and not 57 or 114?"... or "why not 50 or 64", if we're fans of round numbers 😉

That said, it's a minor issue in a feature most people probably won't use, so I don't really care if you're so determined to keep the current defaults. This is not a blocking issue from my side, in contrast to https://github.com/loadimpact/k6/pull/1576/files#r463614996

log/loki.go

Co-authored-by: na-- <n@andreev.sh>

cmd/root.go

… variable

mstoykov added this to the v0.28.0 milestone Jul 29, 2020

mstoykov requested review from imiric and na-- July 29, 2020 10:15

na-- reviewed Jul 29, 2020

View reviewed changes

mstoykov changed the title ~~Add POC support for pushign metrics to loki~~ Add POC support for pushing logs to loki Jul 30, 2020

imiric reviewed Jul 30, 2020

View reviewed changes

Add support for pushing logs to loki

969f9ba

mstoykov force-pushed the lokiSupport branch from d3e64ae to 969f9ba Compare July 30, 2020 13:58

mstoykov changed the title ~~Add POC support for pushing logs to loki~~ Add support for pushing logs to loki Jul 30, 2020

mstoykov added 5 commits July 31, 2020 11:33

Marshal the log message with encoding/json to escape it properly

0a29fc4

Support for redirecting the loki push and some more debugging info

08ac162

Add msgMaxSize flag to loki

9ffee17

Also escape field values as error for example has quotes inside

d93b9f1

s/RawFormater/RawFormatter/g

686ea5f

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

Add todo to rename --logformat

325ce6a

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

mstoykov added 2 commits July 31, 2020 16:03

Support all logrus log levels

ee774b9

drop the start method

5f39dc2

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

remove todo

ad54b40

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

mstoykov added 2 commits July 31, 2020 16:51

slighly better/faster cutOff calculation

2d0ea19

check that loki.limit > 0

dd740d6

mstoykov added 3 commits July 31, 2020 16:55

check that loki.msgMaxSize > 0

94e9a69

just context.Background() - timeout is set elsewhere

e983035

fix dropped message

5ff3499

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Show resolved Hide resolved

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

Just return an error from push instead of fmt.Println

ee51ebb

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

mstoykov and others added 2 commits July 31, 2020 17:28

fixes to LokiFromConfigLine

9ab822a

Log responses for status code >=400 from loki

862bd06

Co-authored-by: na-- <n@andreev.sh>

na-- reviewed Jul 31, 2020

View reviewed changes

log/loki.go Outdated Show resolved Hide resolved

imiric reviewed Aug 3, 2020

View reviewed changes

na-- requested changes Aug 11, 2020

View reviewed changes

mstoykov and others added 3 commits August 11, 2020 15:52

Update cmd/root.go

abd5d93

Co-authored-by: na-- <n@andreev.sh>

lint fixes

605b7b0

renames

f93c23a

mstoykov requested review from na-- and imiric August 11, 2020 13:16

na-- previously approved these changes Aug 11, 2020

View reviewed changes

Support for log-output configuration through K6_LOG_OUTPUT env variable

c322ca8

mstoykov dismissed na--’s stale review via c322ca8 August 12, 2020 13:14

na-- reviewed Aug 12, 2020

View reviewed changes

cmd/root.go Outdated Show resolved Hide resolved

mstoykov added 2 commits August 12, 2020 16:49

fixup! Support for log-output configuration through K6_LOG_OUTPUT env…

8a30291

… variable

Use cmf.Flags().Changed() instead of getNullString()

18ae5a0

na-- approved these changes Aug 12, 2020

View reviewed changes

imiric approved these changes Aug 12, 2020

View reviewed changes

mstoykov merged commit 5ee709d into master Aug 12, 2020

mstoykov deleted the lokiSupport branch August 12, 2020 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for pushing logs to loki #1576

Add support for pushing logs to loki #1576

mstoykov commented Jul 29, 2020

codecov-commenter commented Jul 29, 2020 •

edited

Loading

na-- left a comment

na-- Jul 29, 2020

mstoykov Jul 29, 2020

na-- Jul 31, 2020

na-- Jul 29, 2020

na-- Jul 29, 2020

na-- Jul 29, 2020

na-- Jul 29, 2020

imiric Jul 30, 2020

na-- Jul 29, 2020

na-- Jul 29, 2020

mstoykov Jul 29, 2020

na-- Jul 29, 2020

mstoykov Jul 29, 2020

na-- Jul 29, 2020

mstoykov Jul 29, 2020

imiric Jul 30, 2020

na-- Jul 31, 2020

mstoykov Jul 31, 2020

mstoykov Jul 31, 2020

na-- Jul 31, 2020 •

edited

Loading

mstoykov Jul 31, 2020

na-- Jul 31, 2020 •

edited

Loading

mstoykov Jul 31, 2020

na-- Jul 31, 2020

imiric Aug 3, 2020

na-- Aug 11, 2020

imiric Aug 3, 2020

na-- left a comment

na-- Aug 11, 2020

		// RawFormater it does nothing with the message just prints it
		type RawFormater struct{}

Add support for pushing logs to loki #1576

Add support for pushing logs to loki #1576

Conversation

mstoykov commented Jul 29, 2020

codecov-commenter commented Jul 29, 2020 • edited Loading

Codecov Report

na-- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 29, 2020 •

edited

Loading

na-- Jul 31, 2020 •

edited

Loading

na-- Jul 31, 2020 •

edited

Loading