Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pushing logs to loki #1576

Merged
merged 24 commits into from
Aug 12, 2020
Merged

Add support for pushing logs to loki #1576

merged 24 commits into from
Aug 12, 2020

Conversation

mstoykov
Copy link
Contributor

No description provided.

@mstoykov mstoykov added this to the v0.28.0 milestone Jul 29, 2020
@mstoykov mstoykov requested review from imiric and na-- July 29, 2020 10:15
@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2020

Codecov Report

Merging #1576 into master will decrease coverage by 1.08%.
The diff coverage is 26.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1576      +/-   ##
==========================================
- Coverage   77.14%   76.05%   -1.09%     
==========================================
  Files         162      163       +1     
  Lines       13255    13541     +286     
==========================================
+ Hits        10225    10299      +74     
- Misses       2509     2715     +206     
- Partials      521      527       +6     
Impacted Files Coverage Δ
cmd/root.go 26.19% <7.89%> (-6.60%) ⬇️
log/loki.go 28.89% <28.89%> (ø)
lib/executor/vu_handle.go 93.69% <0.00%> (-1.81%) ⬇️
js/runner.go 83.04% <0.00%> (-0.70%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f4797d...abd5d93. Read the comment docs.

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still haven't completely reviewed the loop() and push() logic, but pushing what I currently have as comments

cmd/root.go Outdated Show resolved Hide resolved
cmd/root.go Outdated
Comment on lines 173 to 175
// RawFormater it does nothing with the message just prints it
type RawFormater struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the variable name, it should have 2 ts, RawFormatter. No idea how the comment was right but the variable name wrong, but please change the variable, not the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gofumpt 🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😕 Not sure what gofumpt has to do with this - the current variable name is wrong.

cmd/root.go Show resolved Hide resolved
Comment on lines +73 to +72
// TODO use something better ... maybe
// https://godoc.org/github.com/kubernetes/helm/pkg/strvals
// atleast until https://github.com/loadimpact/k6/issues/926?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, yeah... strvals or something like that sounds better than rolling out our own parsing logic everywhere... Even URL-encoded values might work: https://github.com/google/go-querystring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking at the confusing mess of ,, ; and = signs that come from additionalParams, using a query string might not be a bad idea... https://github.com/gorilla/schema is another library that does it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though strvals is probably going to be the most readable of these...

loki=somewhere:1233,labels.something=else,labels.foo=bar,limit=32,level=debug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm btw it's probably going to be easy to write something like strvals, but which instead converts its format to JSON, which you then can easily unmarshal into a struct without bothering with reflection... but that can go in #883, for now strvals+mapstructure is probably the way to go... and I'll stop this conversation with myself now, so I can push the review 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the current parsing, but agree that having a consistent format for options like this and reusing the parsing code would be good. But it can also wait for #883, so not sure if this is a blocker.

log/loki.go Outdated
key := paramParts[0]
value := paramParts[1]
switch key {
case "additionalParams":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionalParams is confusing, looking at the code, this is just for adding extra key=value labels to each log message, right? So, wouldn't something like extraLabels or simply labels be better?

log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated
}

// fill one of two equally sized slices with entries and then push it while filling the other one
// TODO clean old entries after push?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so ... I am pretty sure this was meant for when I was keeping logrus.Entry around for longer

pushCh = make(chan chan struct{})
)

defer close(pushCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever going to happen? I don't see anything closing the h.ch channel? 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that :

  1. if we ever get out it should be closed ;)
  2. if we have a context (sometime in the future maybe?) we can push one last time when it's closed

log/loki.go Outdated
}

func (h *lokiHook) start() {
h.ch = make(chan *logrus.Entry, 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's a better (i.e. non manually-parsed) configuration, this should probably be configurable as a bufferSize parameter or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have it even now :D

@mstoykov mstoykov changed the title Add POC support for pushign metrics to loki Add POC support for pushing logs to loki Jul 30, 2020
lib/executor/helpers.go Outdated Show resolved Hide resolved
lib/netext/httpext/httpdebug_transport.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
Comment on lines +73 to +72
// TODO use something better ... maybe
// https://godoc.org/github.com/kubernetes/helm/pkg/strvals
// atleast until https://github.com/loadimpact/k6/issues/926?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the current parsing, but agree that having a consistent format for options like this and reusing the parsing code would be good. But it can also wait for #883, so not sure if this is a blocker.

log/loki.go Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Show resolved Hide resolved
@mstoykov mstoykov changed the title Add POC support for pushing logs to loki Add support for pushing logs to loki Jul 30, 2020
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
for {
select {
case entry := <-h.ch:
if count == h.limit {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the check here should be if h.limit > 0 && count == h.limit, to more explicitly allow for unlimited messages when limit is 0, though I guess you can do it even now with -1...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way the code is currently written I am pretty sure h.limit<1 will panic ;) Maybe something that should be an error during the parsing of the config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested:
1 seems to work
0 means nothing is send(including the dropped message)
-1 (and below I would imagine) panics

Copy link
Member

@na-- na-- Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way the code is currently written I am pretty sure h.limit<1 will panic

That should probably be considered a bug, so 👍 for some validation. And 👍 for supporting 0 (i.e. no limit), since it seems reasonable to me that if I'm running my own loki instance, I'd like to get all of my log messages.

As I said before, I understand why the current limit per second exists, but I don't like it one bit. I can't think of any other piece of software that omits log messages from the middle... And while I can live with it as a form of abuse prevention / DoS protection in the cloud, for people self-hosting k6 and loki, it doesn't make any sense to have a limit that can eat your logs. Especially because, when loki is enabled, we set the regular output of the log to io.Discard (which I'm not sure is the correct action anymore...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave the "unlimited" for another PR - the current implementation is fast(er) because it uses the fact that I have an upper limit to the messages. Arguably anyone can set the limit to 1000000 and ... not hit it, even if they do all the other things in the current implementation will mean that that push to loki will take forever.

Copy link
Member

@na-- na-- Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm if you've set the limit to 1000000, you'd needlessly allocate a couple of slices with 1000000 elements, right? This seems like a premature optimization that might not always work now used as a justification for always requiring the limit 🤷‍♂️ But with 1000000 or even 10000, you'd likely never reach that many messages per second, so it'd probably have the opposite effect...

I like that the msgs buffers are reused, but wouldn't it be better to not have a limit and just append() stuff to the buffer, allowing it to grow naturally to its optimal size? After the first few extensions, it's likely to reach the limit or some equilibrium, if there's no limit (and the limit, if there's a low limit)?

Btw, not that it matters all that much, but with the current strategy of dropping log messages from the middle of the stream, and with messages arriving somewhat out of order, it will lead to the following strange situation. Say the limit is 100 messages per second, and the user is sending ~150. Normally, they'd see the first 100 messages every second, followed by a k6 dropped some packages because they were above the limit of 100/1s message for the remaining ~50. But, because the sorting happens after the dropping, some of these dropped messages could have arrived out-of-order and technically be with timestamps from before the k6 dropped some... message 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well yes ... but now imagine that the user said they thought they will get 10000 messages and they got 10000 times more ... they just ran out of memory ... the process died, they not only didn't get 10000 messages they got 0 messages and a big stacktrace.

Organically growing and unlimited are things that just lead to the software running out of memory and dying of that.

In a normal (perfectly running) and well-configured k6 none of this matters - we might have allocated 10x/100x/1000x times more space for eventual logs and handling them. But at least in the (more likely IMO scenario) where something isn't perfect and it generates 1000x more logs we don't suddenly have an ever-expanding list of logs ... which just kills k6 for no good reason and no explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't request that you remove the limit altogether, or grow the msgs buffer infinitely if there's a limit, I even said "I understand why the current limit per second exists", so don't become snarky please 😛

To be even more clear, I'm not suggesting that you remove this code completely: https://github.com/loadimpact/k6/blob/5f39dc2ef5c070b74c6b7d0fd7959ee323af071d/log/loki.go#L213-L216

Just replace it with if h.limit > 0 && count == h.limit, ditch the fixed buffer sizes, and use append(msgs, tmpMsg{ ... }) instead of msgs[count] = tmpMsg{...}.

This way, if you have a limit (which would still be the case by default), the msgs buffer will never be able to grow above it, right? Or whatever power of 2 is above it, I guess, given how Go grows slices. But, if someone deliberately chooses to remove the limit completely, yes, they can potentially shoot themselves in the foot. Though, to be fair, there are probably a ton of other easier ways to run out of memory with k6 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just read through this discussion + the one on Slack, and you both make valid points, so not sure who I agree more with, but here are my €0.02. 😅

The current limit is essentially a throttling feature, right? It tries to accomplish two things: don't overload Loki, and don't run out of memory in k6.

I agree with Ned's concerns about the current implementation dropping logs if the limit is reached, which you definitely never want (if it can be avoided), especially if it's statically defined for the entire process runtime. Consider that Loki's performance might vary over time, so this package artifically limiting the throughput even if Loki itself could handle more messages, or viceversa, sending more messages than Loki can handle because the limit was set too high and Loki is temporarily struggling with other clients... is not good.

If we do want to introduce something like this, then I would argue it should be more sophisticated:

  1. React either on current RAM usage (probably has to wait for New CPU and memory usage metrics #888) and/or some type of Loki healthcheck or heuristic we can use to determine it's becoming "overloaded" (e.g. response times), and factor that into calculating what the next payload size should be and when to push the next request. Sort of like a smart backoff.
  2. Don't just drop logs, but save them in a third "missed" buffer, and gradually feed them in subsequent requests. I think this should be fixed even if we don't implement the more advanced throttling.

I'm torn on being able to specify "unlimited" as an option. On the one hand we need some type of hard limit in the Cloud to prevent one test from flooding the Loki instance (BTW, how does this work in the backend exactly? It's one central instance for all tests?), but we don't care for it during development. I wouldn't mind having to set e.g. limit=1000000 to make that work, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after a week of vacation, I still think the most sensible choice is for the default limit to be 0 (i.e. "unlimited") and that the current fixed-length buffer approach is a case of way premature optimization 😉 It even makes more sense when you reverse the question, instead of asking "why not 0", to ask "why 100 and not 57 or 114?"... or "why not 50 or 64", if we're fans of round numbers 😉

That said, it's a minor issue in a feature most people probably won't use, so I don't really care if you're so determined to keep the current defaults. This is not a blocking issue from my side, in contrast to https://github.com/loadimpact/k6/pull/1576/files#r463614996

log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
oldCount, oldDropped := count, dropped
count, dropped = 0, 0
cutOff := <-ch
close(ch) // signal that more buffering can continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mention it before, but I'm loving this double buffering approach, nice work! 👍

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't find anything else objectionable, but the printing to stdout is an issue I think should be fixed

cmd/root.go Outdated Show resolved Hide resolved
log/loki.go Outdated Show resolved Hide resolved
for {
select {
case entry := <-h.ch:
if count == h.limit {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after a week of vacation, I still think the most sensible choice is for the default limit to be 0 (i.e. "unlimited") and that the current fixed-length buffer approach is a case of way premature optimization 😉 It even makes more sense when you reverse the question, instead of asking "why not 0", to ask "why 100 and not 57 or 114?"... or "why not 50 or 64", if we're fans of round numbers 😉

That said, it's a minor issue in a feature most people probably won't use, so I don't really care if you're so determined to keep the current defaults. This is not a blocking issue from my side, in contrast to https://github.com/loadimpact/k6/pull/1576/files#r463614996

log/loki.go Outdated Show resolved Hide resolved
log/loki.go Show resolved Hide resolved
mstoykov and others added 3 commits August 11, 2020 15:52
Co-authored-by: na-- <n@andreev.sh>
@mstoykov mstoykov requested review from na-- and imiric August 11, 2020 13:16
na--
na-- previously approved these changes Aug 11, 2020
cmd/root.go Outdated Show resolved Hide resolved
@mstoykov mstoykov merged commit 5ee709d into master Aug 12, 2020
@mstoykov mstoykov deleted the lokiSupport branch August 12, 2020 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants