Implement support to collect Usage dynamically #3917

mstoykov · 2024-08-28T07:55:32Z

What?

Implement support to collect Usage dynamically

Why?

Previously Usage collection happened in one place in a pull way. The usage report needed to get access to the given data and then pull the info from it and put it in.

This reverses the pattern and adds (if available) the cloud test run id to the usage report.

Future work can pull a bunch of the other parts of it out. For example:

used modules can now be reported from the modules
outputs can also report their usage
same for executors

Currently all the above are still done in the usage report code, but that is not necessary.

This also will allow additional usage reporting without the need to propagate this data through getters to the usage report, and instead just push it from the place it is used.

Allowing potentially reporting usages that we are interested to remove in a more generic and easy way.

Checklist

I have performed a self-review of my code.
I have added tests for my changes.
I have run linter locally (make lint) and all checks pass.
I have run tests locally (make tests) and all tests pass.
I have commented on my code, particularly in hard-to-understand areas.

Related PR(s)/Issue(s)

mstoykov · 2024-08-28T08:08:43Z

usage/usage.go

+// Package usage implements usage tracking for k6 in order to figure what is being used within a given execution
+package usage
+
+import (
+	"strings"
+	"sync"
+)
+
+// Usage is a way to collect usage data for within k6
+type Usage struct {
+	l *sync.Mutex
+	m map[string]any
+}
+
+// New returns a new empty Usage ready to be used
+func New() *Usage {
+	return &Usage{
+		l: new(sync.Mutex),
+		m: make(map[string]any),
+	}
+}
+


I started with the idea of either reusing expvar or the new golang telemetry , but both of those are global and I didn't really want to add global state to k6.

btw it seems at least loki does use expvar , but I am not certain how much this can apply to us as well

mstoykov · 2024-08-28T08:13:20Z

usage/usage.go

+}
+
+// String appends the provided value to a slice of strings that is the value.
+// If called only a single time, the value will be just a string not a slice


If we drop this part - a lot of the code becomes a lot nicer as we only have 2 types.

Also in theory we can go the more typesafe way and actually have separate counters and strings which might help as well ... it will just mean they need to be merged at a later point.

I specifically stopped trying to polish this part as I did a couple of tries and all of them had problems and decided on going with reproducing the current report as well as possible.

mstoykov · 2024-08-28T08:13:56Z

output/cloud/output.go

@@ -341,6 +345,8 @@ func (out *Output) startVersionedOutput() error {
 	}
 	var err error

+	out.usage.String("output.cloud.test_run_id", out.testRunID)


Suggested change

out.usage.String("output.cloud.test_run_id", out.testRunID)

out.usage.String("cloud/test_run_id", out.testRunID)

This likely is the better name

codebien

That's a good improvement, thanks for raising the topic.

I haven't reviewed the pull request yet, but there are two things I would like to see before:

A comment on why we can't reuse the already available OpenTelmetry dependency. Of course, we need to abstract it, but isn't possible to have the implementation mostly based on open telemetry primitives?
Any new additional package added as internal. If we plan to have it as a public API, we might graduate it after, but I would start with it only internally available.

mstoykov · 2024-08-28T15:18:14Z

A comment on why we can't reuse the already available OpenTelmetry dependency. Of course, we need to abstract it, but isn't possible to have the implementation mostly based on open telemetry primitives?

Maybe it is possible, I looked into the APIs now .... but for example stuff such as modules or outputs will require ... some changes or (ab)using the API in strange ways, especially if we want to keep the same end format we send.

In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have module/k6 with value 1 and module/k6/http with value 1 and that just means that they were used. That definitely will work, but IMO doesn't really benefit us.

Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:

contaminate user data
have 2 different streams - one for usage, one for users. Which IMO will be confusing.

Any new additional package added as internal. If we plan to have it as a public API, we might graduate it after, but I would start with it only internally available.

I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example.

The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it.

I am not strictly against putting this in internal, but I think we will probably want it moved very soon after, so not certian it will help much 🤷

olegbespalov

Generally looks good to me! Left a few non-blocking questions/suggestions

cmd/report.go

olegbespalov · 2024-08-29T08:17:54Z

usage/usage.go

+	case []string:
+		u.m[k] = append(oldV, v)
+	default:
+		// TODO: error, panic?, nothing, log?


I do prefer log, but agree that it can be tuned later

usage/usage.go

joanlopez · 2024-08-29T09:04:53Z

Hey @mstoykov, thanks for the proposal! I haven't reviewed the PR in-depth yet, cause I guess there's some polishing still remaining. But here are my 2cts on the proposal:

Generally speaking, I like this reversed approach. I remember myself (among others) doing similar kind of reversal in Grafana codebase, and I remember that being very positive for decoupling the whole usage data collection from the rest of the services. Subtleties apart, I think this is the way to go.

That said,

Maybe it is possible, I looked into the APIs now .... but for example stuff such as modules or outputs will require ... some changes or (ab)using the API in strange ways, especially if we want to keep the same end format we send.

In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have module/k6 with value 1 and module/k6/http with value 1 and that just means that they were used. That definitely will work, but IMO doesn't really benefit us.

Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:

contaminate user data

have 2 different streams - one for usage, one for users. Which IMO will be confusing.

I agree , I think we should clearly differentiate what's k6 observability (logs, metrics and traces), meant to be used for whoever that operates k6 - either us (cloud) or users (on their computers/infra); from usage data collection, only used by us, to back our decisions on data.

Despite the partial overlap, I think long-term it will be better to keep both separate, to make our lives easier.

I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example.

The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it.

I am not strictly against putting this in internal, but I think we will probably want it moved very soon after, so not certain it will help much 🤷

I can see both of your arguments - my only concern here is the common tradeoff between breaking changes on a exposed API vs endless iterations towards a perfect API, so I'd suggest to try to ship a decently good, exposed, API for usage data collection asap and iterate over the next couple of releases before v1.0.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

mstoykov · 2024-08-29T10:07:19Z

Seems like there is agreement on the general idea, so let's try to make this change happen with as little future regrets as possible.

Currently I identify 3 API questions that can't be fixed with "we will add it later":

String and Strings

The current API having Strings and String, exist in order to support making the whole report only through the methods and have the same layout and types. Removing String is possible if we decide that.

The top level string values currently existing will not be done through this API but just added to the map before returning
cloud/test_run_id (or whatever it ends up being called) is a slice of string - usualyl with one value. In theory you can do -o cloud -o cloud twice on k6 and ... it will create two test runs actually.

Or if we are okay with changing the layout:

We can remove it have the top level string fields also be slices.

or

We can have the backend support things being either slices or a single string. For example modules or outputs might have only a single value, and without Strings it will be a single value, but if there are multipel both Strings and String will end up with it being a slice.

I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around.

Count currently takes int64

This does allow someone to potentially decrease a counter ... which seems strange on second iteration. It might be better to Rename it to Uint64 and change the type.

The map that is returned currently supports only 1 level of grouping with `/`

Arguably we can remove the grouping, again it will mean changes for the output this time due to the executors key.

I kind of like the one level though and it is part of hte golang telemetry.

I didn't really want to make it work for multiple levels, but that might be a good idea as well.

Arguably this could be done in hte backend service but it might add too much processing required there, so I prefer to keep it in the actual code here.

Error/log something else on coflicts

I kind of prefer to just return an error and kick this down to the user of the API - I expect this will practically never happen.

Otherwise we need to give a logger to the Usage.

Panicking seems like a bad option here IMO.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

If anyone has a particular extension wiht usage we want to track that might be a great idea.

joanlopez · 2024-08-29T13:17:21Z

Here's my take:

Currently I identify 3 API questions that can't be fixed with "we will add it later":

String and Strings

...

I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around.

I agree, in fact, we could always introduce something to store single strings later, right?
Cause, internally the map will remain with values typed as any, and from the consumer point of view is more or less the same, I guess.

Count currently takes int64

...

Also agree 👍🏻 Moreover, I'm also wondering if it would make sense to have a similar method but just to Set (instead of "counting" or "adding"). Particularly for values that doesn't change over the executed, or that are reported at the end, I guess it's more semantically correct.

The map that is returned currently supports only 1 level of grouping with /

...

No strong opinions here. Ideally I'd prefer to avoid breaking changes on the resulting report, of course, but I guess that in the worse case we can have some values to be added to the report (map) without explicitly being exposed through the API 🤔

Error/log something else on conflicts

...

I have contradictory feelings here, because partly I feel like the errors that may happen here should theoretically never happen and if so be captured as soon as possible, and so I'd suggest to panic. But it's also true, and not less important, that if this becomes a public API, there could be conflicts organically, and those could even be only discoverable at runtime (depending on the extensions are enabled, for instance).

So, I guess I'd probably suggest to return errors from the API, and perhaps panicking on our code when calling those, if any error.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

If anyone has a particular extension wiht usage we want to track that might be a great idea.

I'd be happy to add it to https://github.com/joanlopez/xk6-aws, for instance to analyze which services are being used more.

However, I feel like nobody is using that extension yet (didn't exposed much except for listing it in the docs site, and it's "competing" with our official jslib), so not very useful. But, perhaps we could ask browser folks?

Finally, one extra thought, partly related to the errors topic, is: should we reserve some keys? So, for instance nobody can re-write the `k6_version` attribute? For instance, by forbidding it at the API level, and setting it "manually" in k6, at report creation time.

usage/usage.go

joanlopez · 2024-09-03T14:46:10Z

cmd/report.go

 	for _, ec := range execScheduler.GetExecutorConfigs() {
-		executors[ec.GetType()]++
+		err := u.Uint64("executors/"+ec.GetType(), 1)
+		if err != nil { // TODO report it as an error but still send the report?


Couldn't we panic here? 🤔

As I stated before, I think it's better to make Uint64 to return the error, because it's like a library, and especially if we have the idea of exposing that report/usage package to extensions.

But, from the consumer perspective, at this point, this should technically never fail, right?

I really prefer that k6 never panics - especially not on usage reporting.

But, from the consumer perspective, at this point, this should technically never fail, right?

There is none zero chance someone will use this API and add a string with exucutors/ramping-arrival-rate - is this likely, no. I would argue it will likely be a try at breaking something. But I still don't think it is justified to panic.

Panicking in this case just means that we will get a users test stopped, potentially with a lot of confusion.

I would prefer if this usage accumulation also happens outside of here for all cases - this likely won't be one of them.

I can remove the error handling as a whole for this case as well, so we again get a report even if this happens.

Well, yeah... That's why I also suggested to have some reserved keywords - of course it should never panic, and even less for a non-critical feature like usage reporting, but having reports with basic information k6_version faked will also be confusing - as you said will likely be a try at breaking something.

Overall, and to be clear, my intention when suggesting to panic here, is to early capture any programming mistake, if the panic can be "forced" from "outside", then I agree is a no-go.

I can remove the error handling as a whole for this case as well, so we again get a report even if this happens.

Perhaps a good idea, yeah. Although, to be honest, I don't feel really attracted by any approach. I wrote the comment because you left the TODO, but any approach that looks good for you will most likely do it for me as well 👍🏻

some reserved keywords -

There in practice are - all of the ones we add directly to the map.

For me having reserved words is strange as ... well it just means we now centralize the usage reporting by having some names codefied in a central place ...

I will drop the error reporting on this and a different TODO, will also try to move the reporting away from thsi function for the cases where this is possible.

cmd/report.go

joanlopez · 2024-09-03T15:09:40Z

usage/usage_test.go

+	require.NoError(t, u.Strings("test3", "some"))
+	require.NoError(t, u.Strings("test3/one", "one"))
+
+	m, err := u.Map()
+	require.ErrorContains(t, err,
+		"key test3 was expected to be a map[string]any but was []string")


Please, don't consider this message blocking, because I don't have any immediate suggestion/alternative.

But, from my perspective, this is a bit confusing, cause I'd expect from u.Strings("test3/one", "one") point of view, it to fail if later won't be included in the final report. Otherwise, how I'm supposed to know it? 🤔

I don't know how to fix this without remaking the map on each call - which might be better 🤷 Just wasn't particularly nice when we had 3 types (number, string or slice, slice of strings)

Will see if now it works slightly better.

mstoykov · 2024-09-05T14:08:39Z

Threading usage through k6 turned out to not be so easy ... due to tests.

I am not certain I like the particular way it ended up being, but arguably I prefer this than having to export module resolver for a runner to the cmd package ... 😬

The above can be removed in a later PR, I really don't think I should keep adding changes to this one, I might even be for taking back 0b49b3f, although it is probably a pretty important we do that.

I will try to make a refactoring PR on actually not needing to do all the changes the next time we add a field like this. The predominant reasons was rerunning tests with an archive in js package is repeated everywhere ...

Previously Usage collection happened in one place in a pull way. The usage report needed to get access to the given data and then pull the info from it and put it in. This reverses the pattern and adds (if available) the cloud test run id to the usage report. Future work can pull a bunch of the other parts of it out. For example: 1. used modules can now be reported from the modules 2. outputs can also report their usage 3. same for executors Currently all the above are still done in the usage report code, but that is not necessary. This also will allow additional usage reporting without the need to propagate this data through getters to the usage report, and instead just push it from the place it is used. Allowing potentially reporting usages that we are interested to remove in a more generic and easy way.

Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>

Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>

mstoykov added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Aug 28, 2024

mstoykov added this to the v0.54.0 milestone Aug 28, 2024

mstoykov requested a review from a team as a code owner August 28, 2024 07:55

mstoykov requested review from olegbespalov and joanlopez and removed request for a team August 28, 2024 07:55

mstoykov commented Aug 28, 2024

View reviewed changes

mstoykov force-pushed the usageRefactor branch from 9f371dc to 63175cd Compare August 28, 2024 08:09

mstoykov requested a review from codebien August 28, 2024 08:10

mstoykov mentioned this pull request Aug 28, 2024

Collect the Cloud test run ID #3883

Closed

mstoykov commented Aug 28, 2024

View reviewed changes

codebien reviewed Aug 28, 2024

View reviewed changes

olegbespalov previously approved these changes Aug 29, 2024

View reviewed changes

mstoykov dismissed olegbespalov’s stale review via f0bde38 August 30, 2024 13:32

olegbespalov previously approved these changes Sep 2, 2024

View reviewed changes

usage/usage.go Outdated Show resolved Hide resolved

usage/usage.go Outdated Show resolved Hide resolved

mstoykov dismissed olegbespalov’s stale review via 05e9fd6 September 2, 2024 16:27

olegbespalov previously approved these changes Sep 2, 2024

View reviewed changes

joanlopez reviewed Sep 3, 2024

View reviewed changes

cmd/report.go Outdated Show resolved Hide resolved

mstoykov dismissed olegbespalov’s stale review via 5233211 September 3, 2024 14:57

joanlopez reviewed Sep 3, 2024

View reviewed changes

mstoykov self-assigned this Sep 4, 2024

mstoykov requested review from joanlopez and olegbespalov September 5, 2024 14:08

mstoykov removed the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Sep 6, 2024

mstoykov mentioned this pull request Sep 6, 2024

js: DRY creation of a runner from archive in the tests #3935

Merged

5 tasks

olegbespalov previously approved these changes Sep 10, 2024

View reviewed changes

mstoykov and others added 10 commits September 10, 2024 10:55

Count->Uint64

64574f3

Drop String, leave only Strings

43a981c

usage: return errors and huge simplification

ae4f851

Update usage/usage.go

daf17e9

Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>

Update usage/usage.go

da37a86

Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>

Apply suggestions from code review

894436d

Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>

usage: move outputs reporting

9462625

usage: move modules reporting

1c88b78

usage: refactor to do levels earlier

9ddef76

mstoykov dismissed olegbespalov’s stale review via 9ddef76 September 10, 2024 08:04

mstoykov force-pushed the usageRefactor branch from c155d50 to 9ddef76 Compare September 10, 2024 08:04

mstoykov requested a review from olegbespalov September 10, 2024 08:06

olegbespalov approved these changes Sep 10, 2024

View reviewed changes

joanlopez approved these changes Sep 10, 2024

View reviewed changes

mstoykov merged commit 791803f into master Sep 10, 2024
23 checks passed

mstoykov deleted the usageRefactor branch September 10, 2024 09:59

BrewTestBot mentioned this pull request Sep 30, 2024

k6 0.54.0 Homebrew/homebrew-core#192373

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support to collect Usage dynamically #3917

Implement support to collect Usage dynamically #3917

mstoykov commented Aug 28, 2024

mstoykov Aug 28, 2024

mstoykov Aug 28, 2024

mstoykov Aug 28, 2024

codebien left a comment •

edited

Loading

mstoykov commented Aug 28, 2024

olegbespalov left a comment

olegbespalov Aug 29, 2024

joanlopez commented Aug 29, 2024 •

edited

Loading

mstoykov commented Aug 29, 2024

joanlopez commented Aug 29, 2024

String and Strings

Count currently takes int64

The map that is returned currently supports only 1 level of grouping with `/`

Error/log something else on conflicts

joanlopez Sep 3, 2024

mstoykov Sep 3, 2024

joanlopez Sep 3, 2024

mstoykov Sep 3, 2024

joanlopez Sep 3, 2024 •

edited

Loading

mstoykov Sep 3, 2024

mstoykov commented Sep 5, 2024

	out.usage.String("output.cloud.test_run_id", out.testRunID)
	out.usage.String("cloud/test_run_id", out.testRunID)

Implement support to collect Usage dynamically #3917

Implement support to collect Usage dynamically #3917

Conversation

mstoykov commented Aug 28, 2024

What?

Why?

Checklist

Related PR(s)/Issue(s)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codebien left a comment • edited Loading

Choose a reason for hiding this comment

mstoykov commented Aug 28, 2024

olegbespalov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joanlopez commented Aug 29, 2024 • edited Loading

mstoykov commented Aug 29, 2024

String and Strings

Count currently takes int64

The map that is returned currently supports only 1 level of grouping with /

Error/log something else on coflicts

joanlopez commented Aug 29, 2024

String and Strings

Count currently takes int64

The map that is returned currently supports only 1 level of grouping with /

Error/log something else on conflicts

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joanlopez Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstoykov commented Sep 5, 2024

codebien left a comment •

edited

Loading

joanlopez commented Aug 29, 2024 •

edited

Loading

The map that is returned currently supports only 1 level of grouping with `/`

The map that is returned currently supports only 1 level of grouping with `/`

joanlopez Sep 3, 2024 •

edited

Loading