-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support to collect Usage dynamically #3917
Conversation
usage/usage.go
Outdated
// Package usage implements usage tracking for k6 in order to figure what is being used within a given execution | ||
package usage | ||
|
||
import ( | ||
"strings" | ||
"sync" | ||
) | ||
|
||
// Usage is a way to collect usage data for within k6 | ||
type Usage struct { | ||
l *sync.Mutex | ||
m map[string]any | ||
} | ||
|
||
// New returns a new empty Usage ready to be used | ||
func New() *Usage { | ||
return &Usage{ | ||
l: new(sync.Mutex), | ||
m: make(map[string]any), | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with the idea of either reusing expvar or the new golang telemetry , but both of those are global and I didn't really want to add global state to k6.
btw it seems at least loki does use expvar , but I am not certain how much this can apply to us as well
9f371dc
to
63175cd
Compare
usage/usage.go
Outdated
} | ||
|
||
// String appends the provided value to a slice of strings that is the value. | ||
// If called only a single time, the value will be just a string not a slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop this part - a lot of the code becomes a lot nicer as we only have 2 types.
Also in theory we can go the more typesafe way and actually have separate counters and strings which might help as well ... it will just mean they need to be merged at a later point.
I specifically stopped trying to polish this part as I did a couple of tries and all of them had problems and decided on going with reproducing the current report as well as possible.
output/cloud/output.go
Outdated
@@ -341,6 +345,8 @@ func (out *Output) startVersionedOutput() error { | |||
} | |||
var err error | |||
|
|||
out.usage.String("output.cloud.test_run_id", out.testRunID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out.usage.String("output.cloud.test_run_id", out.testRunID) | |
out.usage.String("cloud/test_run_id", out.testRunID) |
This likely is the better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good improvement, thanks for raising the topic.
I haven't reviewed the pull request yet, but there are two things I would like to see before:
- A comment on why we can't reuse the already available OpenTelmetry dependency. Of course, we need to abstract it, but isn't possible to have the implementation mostly based on open telemetry primitives?
- Any new additional package added as
internal
. If we plan to have it as a public API, we might graduate it after, but I would start with it only internally available.
Maybe it is possible, I looked into the APIs now .... but for example stuff such as In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:
I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example. The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it. I am not strictly against putting this in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me! Left a few non-blocking questions/suggestions
usage/usage.go
Outdated
case []string: | ||
u.m[k] = append(oldV, v) | ||
default: | ||
// TODO: error, panic?, nothing, log? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer log, but agree that it can be tuned later
Hey @mstoykov, thanks for the proposal! I haven't reviewed the PR in-depth yet, cause I guess there's some polishing still remaining. But here are my 2cts on the proposal:
That said,
I agree , I think we should clearly differentiate what's k6 observability (logs, metrics and traces), meant to be used for whoever that operates k6 - either us (cloud) or users (on their computers/infra); from usage data collection, only used by us, to back our decisions on data. Despite the partial overlap, I think long-term it will be better to keep both separate, to make our lives easier.
I can see both of your arguments - my only concern here is the common tradeoff between breaking changes on a exposed API vs endless iterations towards a perfect API, so I'd suggest to try to ship a decently good, exposed, API for usage data collection asap and iterate over the next couple of releases before v1.0. Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API. |
Seems like there is agreement on the general idea, so let's try to make this change happen with as little future regrets as possible. Currently I identify 3 API questions that can't be fixed with "we will add it later": String and StringsThe current API having Strings and String, exist in order to support making the whole report only through the methods and have the same layout and types. Removing String is possible if we decide that.
Or if we are okay with changing the layout: We can remove it have the top level string fields also be slices. or We can have the backend support things being either slices or a single string. For example modules or outputs might have only a single value, and without Strings it will be a single value, but if there are multipel both Strings and String will end up with it being a slice. I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around. Count currently takes int64This does allow someone to potentially decrease a counter ... which seems strange on second iteration. It might be better to Rename it to Uint64 and change the type. The map that is returned currently supports only 1 level of grouping with
|
Here's my take:
I agree, in fact, we could always introduce something to store single strings later, right?
Also agree 👍🏻 Moreover, I'm also wondering if it would make sense to have a similar method but just to
No strong opinions here. Ideally I'd prefer to avoid breaking changes on the resulting report, of course, but I guess that in the worse case we can have some values to be added to the report (map) without explicitly being exposed through the API 🤔
I have contradictory feelings here, because partly I feel like the errors that may happen here should theoretically never happen and if so be captured as soon as possible, and so I'd suggest to So, I guess I'd probably suggest to return errors from the API, and perhaps panicking on our code when calling those, if any error.
I'd be happy to add it to https://github.com/joanlopez/xk6-aws, for instance to analyze which services are being used more. However, I feel like nobody is using that extension yet (didn't exposed much except for listing it in the docs site, and it's "competing" with our official jslib), so not very useful. But, perhaps we could ask browser folks? Finally, one extra thought, partly related to the errors topic, is: should we reserve some keys? So, for instance nobody can re-write the `k6_version` attribute? For instance, by forbidding it at the API level, and setting it "manually" in k6, at report creation time. |
cmd/report.go
Outdated
for _, ec := range execScheduler.GetExecutorConfigs() { | ||
executors[ec.GetType()]++ | ||
err := u.Uint64("executors/"+ec.GetType(), 1) | ||
if err != nil { // TODO report it as an error but still send the report? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we panic
here? 🤔
As I stated before, I think it's better to make Uint64
to return the error, because it's like a library, and especially if we have the idea of exposing that report/usage package to extensions.
But, from the consumer perspective, at this point, this should technically never fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really prefer that k6 never panics - especially not on usage reporting.
But, from the consumer perspective, at this point, this should technically never fail, right?
There is none zero chance someone will use this API and add a string with exucutors/ramping-arrival-rate
- is this likely, no. I would argue it will likely be a try at breaking something. But I still don't think it is justified to panic.
Panicking in this case just means that we will get a users test stopped, potentially with a lot of confusion.
I would prefer if this usage accumulation also happens outside of here for all cases - this likely won't be one of them.
I can remove the error handling as a whole for this case as well, so we again get a report even if this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yeah... That's why I also suggested to have some reserved keywords - of course it should never panic, and even less for a non-critical feature like usage reporting, but having reports with basic information k6_version
faked will also be confusing - as you said will likely be a try at breaking something.
Overall, and to be clear, my intention when suggesting to panic
here, is to early capture any programming mistake, if the panic
can be "forced" from "outside", then I agree is a no-go.
I can remove the error handling as a whole for this case as well, so we again get a report even if this happens.
Perhaps a good idea, yeah. Although, to be honest, I don't feel really attracted by any approach. I wrote the comment because you left the TODO, but any approach that looks good for you will most likely do it for me as well 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some reserved keywords -
There in practice are - all of the ones we add directly to the map.
For me having reserved words is strange as ... well it just means we now centralize the usage reporting by having some names codefied in a central place ...
I will drop the error reporting on this and a different TODO, will also try to move the reporting away from thsi function for the cases where this is possible.
usage/usage_test.go
Outdated
require.NoError(t, u.Strings("test3", "some")) | ||
require.NoError(t, u.Strings("test3/one", "one")) | ||
|
||
m, err := u.Map() | ||
require.ErrorContains(t, err, | ||
"key test3 was expected to be a map[string]any but was []string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, don't consider this message blocking, because I don't have any immediate suggestion/alternative.
But, from my perspective, this is a bit confusing, cause I'd expect from u.Strings("test3/one", "one")
point of view, it to fail if later won't be included in the final report. Otherwise, how I'm supposed to know it? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to fix this without remaking the map on each call - which might be better 🤷 Just wasn't particularly nice when we had 3 types (number, string or slice, slice of strings)
Will see if now it works slightly better.
Threading usage through k6 turned out to not be so easy ... due to tests. I am not certain I like the particular way it ended up being, but arguably I prefer this than having to export module resolver for a runner to the cmd package ... 😬 The above can be removed in a later PR, I really don't think I should keep adding changes to this one, I might even be for taking back 0b49b3f, although it is probably a pretty important we do that. I will try to make a refactoring PR on actually not needing to do all the changes the next time we add a field like this. The predominant reasons was rerunning tests with an archive in |
Previously Usage collection happened in one place in a pull way. The usage report needed to get access to the given data and then pull the info from it and put it in. This reverses the pattern and adds (if available) the cloud test run id to the usage report. Future work can pull a bunch of the other parts of it out. For example: 1. used modules can now be reported from the modules 2. outputs can also report their usage 3. same for executors Currently all the above are still done in the usage report code, but that is not necessary. This also will allow additional usage reporting without the need to propagate this data through getters to the usage report, and instead just push it from the place it is used. Allowing potentially reporting usages that we are interested to remove in a more generic and easy way.
Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>
Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>
Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>
c155d50
to
9ddef76
Compare
What?
Implement support to collect Usage dynamically
Why?
Previously Usage collection happened in one place in a pull way. The usage report needed to get access to the given data and then pull the info from it and put it in.
This reverses the pattern and adds (if available) the cloud test run id to the usage report.
Future work can pull a bunch of the other parts of it out. For example:
Currently all the above are still done in the usage report code, but that is not necessary.
This also will allow additional usage reporting without the need to propagate this data through getters to the usage report, and instead just push it from the place it is used.
Allowing potentially reporting usages that we are interested to remove in a more generic and easy way.
Checklist
make lint
) and all checks pass.make tests
) and all tests pass.Related PR(s)/Issue(s)