Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support to collect Usage dynamically #3917

Merged
merged 10 commits into from
Sep 10, 2024
Merged

Implement support to collect Usage dynamically #3917

merged 10 commits into from
Sep 10, 2024

Conversation

mstoykov
Copy link
Contributor

What?

Implement support to collect Usage dynamically

Why?

Previously Usage collection happened in one place in a pull way. The usage report needed to get access to the given data and then pull the info from it and put it in.

This reverses the pattern and adds (if available) the cloud test run id to the usage report.

Future work can pull a bunch of the other parts of it out. For example:

  1. used modules can now be reported from the modules
  2. outputs can also report their usage
  3. same for executors

Currently all the above are still done in the usage report code, but that is not necessary.

This also will allow additional usage reporting without the need to propagate this data through getters to the usage report, and instead just push it from the place it is used.

Allowing potentially reporting usages that we are interested to remove in a more generic and easy way.

Checklist

  • I have performed a self-review of my code.
  • I have added tests for my changes.
  • I have run linter locally (make lint) and all checks pass.
  • I have run tests locally (make tests) and all tests pass.
  • I have commented on my code, particularly in hard-to-understand areas.

Related PR(s)/Issue(s)

@mstoykov mstoykov added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Aug 28, 2024
@mstoykov mstoykov added this to the v0.54.0 milestone Aug 28, 2024
@mstoykov mstoykov requested a review from a team as a code owner August 28, 2024 07:55
@mstoykov mstoykov requested review from olegbespalov and joanlopez and removed request for a team August 28, 2024 07:55
usage/usage.go Outdated
Comment on lines 1 to 23
// Package usage implements usage tracking for k6 in order to figure what is being used within a given execution
package usage

import (
"strings"
"sync"
)

// Usage is a way to collect usage data for within k6
type Usage struct {
l *sync.Mutex
m map[string]any
}

// New returns a new empty Usage ready to be used
func New() *Usage {
return &Usage{
l: new(sync.Mutex),
m: make(map[string]any),
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with the idea of either reusing expvar or the new golang telemetry , but both of those are global and I didn't really want to add global state to k6.

btw it seems at least loki does use expvar , but I am not certain how much this can apply to us as well

usage/usage.go Outdated
}

// String appends the provided value to a slice of strings that is the value.
// If called only a single time, the value will be just a string not a slice
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop this part - a lot of the code becomes a lot nicer as we only have 2 types.

Also in theory we can go the more typesafe way and actually have separate counters and strings which might help as well ... it will just mean they need to be merged at a later point.

I specifically stopped trying to polish this part as I did a couple of tries and all of them had problems and decided on going with reproducing the current report as well as possible.

@@ -341,6 +345,8 @@ func (out *Output) startVersionedOutput() error {
}
var err error

out.usage.String("output.cloud.test_run_id", out.testRunID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out.usage.String("output.cloud.test_run_id", out.testRunID)
out.usage.String("cloud/test_run_id", out.testRunID)

This likely is the better name

Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good improvement, thanks for raising the topic.

I haven't reviewed the pull request yet, but there are two things I would like to see before:

  1. A comment on why we can't reuse the already available OpenTelmetry dependency. Of course, we need to abstract it, but isn't possible to have the implementation mostly based on open telemetry primitives?
  2. Any new additional package added as internal. If we plan to have it as a public API, we might graduate it after, but I would start with it only internally available.

@mstoykov
Copy link
Contributor Author

A comment on why we can't reuse the already available OpenTelmetry dependency. Of course, we need to abstract it, but isn't possible to have the implementation mostly based on open telemetry primitives?

Maybe it is possible, I looked into the APIs now .... but for example stuff such as modules or outputs will require ... some changes or (ab)using the API in strange ways, especially if we want to keep the same end format we send.

In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have module/k6 with value 1 and module/k6/http with value 1 and that just means that they were used. That definitely will work, but IMO doesn't really benefit us.

Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:

  1. contaminate user data
  2. have 2 different streams - one for usage, one for users. Which IMO will be confusing.

Any new additional package added as internal. If we plan to have it as a public API, we might graduate it after, but I would start with it only internally available.

I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example.

The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it.

I am not strictly against putting this in internal, but I think we will probably want it moved very soon after, so not certian it will help much 🤷

olegbespalov
olegbespalov previously approved these changes Aug 29, 2024
Copy link
Contributor

@olegbespalov olegbespalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me! Left a few non-blocking questions/suggestions

cmd/report.go Outdated Show resolved Hide resolved
usage/usage.go Outdated
case []string:
u.m[k] = append(oldV, v)
default:
// TODO: error, panic?, nothing, log?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer log, but agree that it can be tuned later

usage/usage.go Show resolved Hide resolved
@joanlopez
Copy link
Contributor

joanlopez commented Aug 29, 2024

Hey @mstoykov, thanks for the proposal! I haven't reviewed the PR in-depth yet, cause I guess there's some polishing still remaining. But here are my 2cts on the proposal:

  • Generally speaking, I like this reversed approach. I remember myself (among others) doing similar kind of reversal in Grafana codebase, and I remember that being very positive for decoupling the whole usage data collection from the rest of the services. Subtleties apart, I think this is the way to go.

That said,

Maybe it is possible, I looked into the APIs now .... but for example stuff such as modules or outputs will require ... some changes or (ab)using the API in strange ways, especially if we want to keep the same end format we send.

In particular how are you going to get a slice of strings as a value from either a metric, a trace or a log(which arguably are the best candidate and also the one that isn't stable). In theory we can have module/k6 with value 1 and module/k6/http with value 1 and that just means that they were used. That definitely will work, but IMO doesn't really benefit us.

Arguably having traces, logs and or metrics that will not be useful to users but just for usage will either have to:

  1. contaminate user data
  2. have 2 different streams - one for usage, one for users. Which IMO will be confusing.

I agree , I think we should clearly differentiate what's k6 observability (logs, metrics and traces), meant to be used for whoever that operates k6 - either us (cloud) or users (on their computers/infra); from usage data collection, only used by us, to back our decisions on data.

Despite the partial overlap, I think long-term it will be better to keep both separate, to make our lives easier.

I am in theory not against this, but part of the whole idea here was for this to be also usable in extensions. And especially with the possibility of some of what we consider core functionality moving to extensions or at separate repos, or even the ones that are in separate repos - browser for example.

The idea behind this change is that instead of k6 core and the report itself knowing and caring about all the things we want to report on usage and then pull them. It will get things pushed to it and then it will use it.

I am not strictly against putting this in internal, but I think we will probably want it moved very soon after, so not certain it will help much 🤷

I can see both of your arguments - my only concern here is the common tradeoff between breaking changes on a exposed API vs endless iterations towards a perfect API, so I'd suggest to try to ship a decently good, exposed, API for usage data collection asap and iterate over the next couple of releases before v1.0.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

@mstoykov
Copy link
Contributor Author

Seems like there is agreement on the general idea, so let's try to make this change happen with as little future regrets as possible.

Currently I identify 3 API questions that can't be fixed with "we will add it later":

String and Strings

The current API having Strings and String, exist in order to support making the whole report only through the methods and have the same layout and types. Removing String is possible if we decide that.

  1. The top level string values currently existing will not be done through this API but just added to the map before returning
  2. cloud/test_run_id (or whatever it ends up being called) is a slice of string - usualyl with one value. In theory you can do -o cloud -o cloud twice on k6 and ... it will create two test runs actually.

Or if we are okay with changing the layout:

We can remove it have the top level string fields also be slices.

or

We can have the backend support things being either slices or a single string. For example modules or outputs might have only a single value, and without Strings it will be a single value, but if there are multipel both Strings and String will end up with it being a slice.

I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around.

Count currently takes int64

This does allow someone to potentially decrease a counter ... which seems strange on second iteration. It might be better to Rename it to Uint64 and change the type.

The map that is returned currently supports only 1 level of grouping with /

Arguably we can remove the grouping, again it will mean changes for the output this time due to the executors key.

I kind of like the one level though and it is part of hte golang telemetry.

I didn't really want to make it work for multiple levels, but that might be a good idea as well.

Arguably this could be done in hte backend service but it might add too much processing required there, so I prefer to keep it in the actual code here.

Error/log something else on coflicts

I kind of prefer to just return an error and kick this down to the user of the API - I expect this will practically never happen.

Otherwise we need to give a logger to the Usage.

Panicking seems like a bad option here IMO.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

If anyone has a particular extension wiht usage we want to track that might be a great idea.

@joanlopez
Copy link
Contributor

Here's my take:

Currently I identify 3 API questions that can't be fixed with "we will add it later":

String and Strings

...

I personally feel like 1 seems like the best option - it has the least amount of trouble and removes 1 of the methods here while allowing us to return it if we want. Three of the fields are actually constants we have access to either way and hte last one is from the executionState which might not be the worst thing to keep around.

I agree, in fact, we could always introduce something to store single strings later, right?
Cause, internally the map will remain with values typed as any, and from the consumer point of view is more or less the same, I guess.

Count currently takes int64

...

Also agree 👍🏻 Moreover, I'm also wondering if it would make sense to have a similar method but just to Set (instead of "counting" or "adding"). Particularly for values that doesn't change over the executed, or that are reported at the end, I guess it's more semantically correct.

The map that is returned currently supports only 1 level of grouping with /

...

No strong opinions here. Ideally I'd prefer to avoid breaking changes on the resulting report, of course, but I guess that in the worse case we can have some values to be added to the report (map) without explicitly being exposed through the API 🤔

Error/log something else on conflicts

...

I have contradictory feelings here, because partly I feel like the errors that may happen here should theoretically never happen and if so be captured as soon as possible, and so I'd suggest to panic. But it's also true, and not less important, that if this becomes a public API, there could be conflicts organically, and those could even be only discoverable at runtime (depending on the extensions are enabled, for instance).

So, I guess I'd probably suggest to return errors from the API, and perhaps panicking on our code when calling those, if any error.

Perhaps we should try to use it from some of the extensions, so we can try to early identify potential breaking changes and/or unsatisfied needs from the API.

If anyone has a particular extension wiht usage we want to track that might be a great idea.

I'd be happy to add it to https://github.com/joanlopez/xk6-aws, for instance to analyze which services are being used more.

However, I feel like nobody is using that extension yet (didn't exposed much except for listing it in the docs site, and it's "competing" with our official jslib), so not very useful. But, perhaps we could ask browser folks?


Finally, one extra thought, partly related to the errors topic, is: should we reserve some keys? So, for instance nobody can re-write the `k6_version` attribute? For instance, by forbidding it at the API level, and setting it "manually" in k6, at report creation time.

olegbespalov
olegbespalov previously approved these changes Sep 2, 2024
usage/usage.go Outdated Show resolved Hide resolved
usage/usage.go Outdated Show resolved Hide resolved
olegbespalov
olegbespalov previously approved these changes Sep 2, 2024
cmd/report.go Outdated
for _, ec := range execScheduler.GetExecutorConfigs() {
executors[ec.GetType()]++
err := u.Uint64("executors/"+ec.GetType(), 1)
if err != nil { // TODO report it as an error but still send the report?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we panic here? 🤔

As I stated before, I think it's better to make Uint64 to return the error, because it's like a library, and especially if we have the idea of exposing that report/usage package to extensions.

But, from the consumer perspective, at this point, this should technically never fail, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really prefer that k6 never panics - especially not on usage reporting.

But, from the consumer perspective, at this point, this should technically never fail, right?

There is none zero chance someone will use this API and add a string with exucutors/ramping-arrival-rate - is this likely, no. I would argue it will likely be a try at breaking something. But I still don't think it is justified to panic.

Panicking in this case just means that we will get a users test stopped, potentially with a lot of confusion.

I would prefer if this usage accumulation also happens outside of here for all cases - this likely won't be one of them.

I can remove the error handling as a whole for this case as well, so we again get a report even if this happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yeah... That's why I also suggested to have some reserved keywords - of course it should never panic, and even less for a non-critical feature like usage reporting, but having reports with basic information k6_version faked will also be confusing - as you said will likely be a try at breaking something.

Overall, and to be clear, my intention when suggesting to panic here, is to early capture any programming mistake, if the panic can be "forced" from "outside", then I agree is a no-go.

I can remove the error handling as a whole for this case as well, so we again get a report even if this happens.

Perhaps a good idea, yeah. Although, to be honest, I don't feel really attracted by any approach. I wrote the comment because you left the TODO, but any approach that looks good for you will most likely do it for me as well 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some reserved keywords -

There in practice are - all of the ones we add directly to the map.

For me having reserved words is strange as ... well it just means we now centralize the usage reporting by having some names codefied in a central place ...

I will drop the error reporting on this and a different TODO, will also try to move the reporting away from thsi function for the cases where this is possible.

cmd/report.go Outdated Show resolved Hide resolved
Comment on lines 23 to 28
require.NoError(t, u.Strings("test3", "some"))
require.NoError(t, u.Strings("test3/one", "one"))

m, err := u.Map()
require.ErrorContains(t, err,
"key test3 was expected to be a map[string]any but was []string")
Copy link
Contributor

@joanlopez joanlopez Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't consider this message blocking, because I don't have any immediate suggestion/alternative.

But, from my perspective, this is a bit confusing, cause I'd expect from u.Strings("test3/one", "one") point of view, it to fail if later won't be included in the final report. Otherwise, how I'm supposed to know it? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to fix this without remaking the map on each call - which might be better 🤷 Just wasn't particularly nice when we had 3 types (number, string or slice, slice of strings)

Will see if now it works slightly better.

@mstoykov mstoykov self-assigned this Sep 4, 2024
@mstoykov
Copy link
Contributor Author

mstoykov commented Sep 5, 2024

Threading usage through k6 turned out to not be so easy ... due to tests.

I am not certain I like the particular way it ended up being, but arguably I prefer this than having to export module resolver for a runner to the cmd package ... 😬

The above can be removed in a later PR, I really don't think I should keep adding changes to this one, I might even be for taking back 0b49b3f, although it is probably a pretty important we do that.

I will try to make a refactoring PR on actually not needing to do all the changes the next time we add a field like this. The predominant reasons was rerunning tests with an archive in js package is repeated everywhere ...

@mstoykov mstoykov removed the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Sep 6, 2024
olegbespalov
olegbespalov previously approved these changes Sep 10, 2024
mstoykov and others added 10 commits September 10, 2024 10:55
Previously Usage collection happened in one place in a pull way. The
usage report needed to get access to the given data and then pull the
info from it and put it in.

This reverses the pattern and adds (if available) the cloud test run id
to the usage report.

Future work can pull a bunch of the other parts of it out. For example:
1. used modules can now be reported from the modules
2. outputs can also report their usage
3. same for executors

Currently all the above are still done in the usage report code, but
that is not necessary.

This also will allow additional usage reporting without the need to
propagate this data through getters to the usage report, and instead
just push it from the place it is used.

Allowing potentially reporting usages that we are interested to remove
in a more generic and easy way.
Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>
Co-authored-by: Oleg Bespalov <oleg.bespalov@grafana.com>
Co-authored-by: Joan López de la Franca Beltran <5459617+joanlopez@users.noreply.github.com>
@mstoykov mstoykov merged commit 791803f into master Sep 10, 2024
23 checks passed
@mstoykov mstoykov deleted the usageRefactor branch September 10, 2024 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants