-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add output option for csv format #1067
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1067 +/- ##
==========================================
- Coverage 72.79% 72.34% -0.45%
==========================================
Files 133 134 +1
Lines 9905 9966 +61
==========================================
Hits 7210 7210
- Misses 2278 2339 +61
Partials 417 417
Continue to review full report at Codecov.
|
2 similar comments
Codecov Report
@@ Coverage Diff @@
## master #1067 +/- ##
==========================================
- Coverage 72.79% 72.34% -0.45%
==========================================
Files 133 134 +1
Lines 9905 9966 +61
==========================================
Hits 7210 7210
- Misses 2278 2339 +61
Partials 417 417
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #1067 +/- ##
==========================================
- Coverage 72.79% 72.34% -0.45%
==========================================
Files 133 134 +1
Lines 9905 9966 +61
==========================================
Hits 7210 7210
- Misses 2278 2339 +61
Partials 417 417
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #1067 +/- ##
==========================================
- Coverage 72.79% 72.73% -0.06%
==========================================
Files 133 134 +1
Lines 9905 9995 +90
==========================================
+ Hits 7210 7270 +60
- Misses 2278 2303 +25
- Partials 417 422 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull request! I've noted some issues inline in the code, but besides fixing those, can you also add some unit tests to this PR? Since the New()
function accepts an afero.Fs
object, mocking the FS for them should be fairly easy.
stats/csv/collector.go
Outdated
func (c *Collector) Run(ctx context.Context) { | ||
log.WithField("filename", c.fname).Debug("CSV: Writing CSV metrics") | ||
<-ctx.Done() | ||
_ = c.outfile.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since nowhere in the Collect()
is it checked if the context is done, I'm not completely sure there isn't a race condition here. What happens if Collect()
is still writing in the file while we're closing it here?
stats/csv/collector.go
Outdated
row = append(row, fmt.Sprintf("%f", sample.Value)) | ||
sampleTags := sample.Tags.CloneTags() | ||
|
||
for _, tag := range resTags { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way of implementing the tags in the CSV format would mean that any custom extra tags that are attached to the metrics will be silently discarded, which would probably surprise and annoy a lot of users. conf.SystemTags
, which you pass to the constructor, is just a list of the keys for tags that k6 internally emits - users can add their own custom ones.
I can see two ways of fixing this:
- have an option in the constructor that specifically allows users to add extra columns with their custom metric tags
- have a final column "extra tags" that just contains any extra tags, either as a JSON value, or as an url-encoded
key1=val1&key2=map2
map (probably better, since quotes can be escaped)
I took a look at the original issue, but one line per metric sample (like how you've currently done it) seems the better approach to me. |
Codecov Report
@@ Coverage Diff @@
## master #1067 +/- ##
==========================================
+ Coverage 72.79% 72.86% +0.07%
==========================================
Files 133 135 +2
Lines 9905 10056 +151
==========================================
+ Hits 7210 7327 +117
- Misses 2278 2301 +23
- Partials 417 428 +11
Continue to review full report at Codecov.
|
@na-- , thank you for your input! I've made some changes and added some tests. I couldn't avoid using the "CloneTags" function, because there's no other way to get the extra-tags supplied by the user. Also @golangcibot has found some errors in my code, but I don't understand what they are and how to fix them, if you could point me in the right direction, that would be great. Thanks! |
@na--, thanks for the advice. I wrapped the code inside ranges into lambda, now the scopelint doesn't say anything about it. The only issue, that remains with it, is that I use a long string with csv data to test that collector writes correct csv to file. |
Split it with a concatenation, or add a You can install |
Fixed it. Thanks! |
Thank you as well for working on this! 🙂 I'll do another top to bottom review of the code in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay we were finishing and releasing 0.25.0, and soon 0.25.1 😭
Looks, good, but some changes are required IMO.
I noticed you decide that calling lambda is a good way of not getting your variables changed by a for cycle. While this will definitely work, it is preferable to just shadow the variable with the same such as varname := varname
this way you get a new variable that the for cycle won't change underneath, while not adding additional functional call and indentation.
I have commented on that in few places, just decided to add it here as well.
As a whole I like the PR it has the benefit over other outputs that we don't translate all the k6 metrics to some other format and than write 20k metrics at the same time, but just translate one at a time which is definitely better, especially given my testing around #1084 .
I did a quick "scientific" benchmark where I ran this script
import { Counter } from "k6/metrics";
var c = new Counter("awesome")
export default function() {
c.add(1);
}
With both this code and the json output. the results are that for 2seconds and 1 vu (this code generates A lot of metrics) I got :
csv.iterations 124940 62468.459215/s and 27M of a file and
json.iterations 60376 29844.35324/s and 36M of a file.
So around twice better in performance and more than twice better in storage performance.
The funny thing is that the iteration duration is not different (or not by enough) but it's just apparently much faster to write the csv files so it runs much better ... or maybe the json one is much buggier :(
While looking at this PR I started wondering if both the json and the csv won't benefit from a gzip(or other compression) on the fly, as it will lower the amount of data written to disk. @na-- , what do you think? a future PR ?
stats/csv/collector.go
Outdated
|
||
"github.com/loadimpact/k6/lib" | ||
"github.com/loadimpact/k6/stats" | ||
log "github.com/sirupsen/logrus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that you probably got this from the other collectors but we actually have an issue to not rename logrus
to log
#1016 , so I would prefer if the new code doesn't use the alias ;)
stats/csv/collector.go
Outdated
) | ||
|
||
const ( | ||
saveInterval = 1 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really prefer if this is configurable
stats/csv/collector.go
Outdated
if err != nil { | ||
log.WithField("filename", c.fname).Error("CSV: Error writing to file") | ||
} | ||
}(sample) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you making a lambda and than calling it ... Are you worried the value of sample
will change before SampleToRow finishes ? I don't think that is possible but if it was it would be much better to do
sample := sample
stats/csv/collector.go
Outdated
|
||
// Link returns a dummy string, it's only included to satisfy the lib.Collector interface | ||
func (c *Collector) Link() string { | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can return the fname
although not sure that will be all that useful
stats/csv/collector.go
Outdated
func SampleToRow(sample *stats.Sample, resTags []string, ignoredTags []string) []string { | ||
if sample == nil { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this even happen? and if it does, what will csvWriter.Write do ?
break | ||
} | ||
prev = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think extra
is needed at all :). You can probably try to use sort.SearchStrings on both the resTags
and ignoredTags
if you sort them as well. Not certain whether this will have beneficial results but it could be benchmarked
stats/csv/collector_test.go
Outdated
sort.Strings(collector.ignoredTags) | ||
assert.Equal(t, expected.ignoredTags, collector.ignoredTags) | ||
}) | ||
}(configs[i], expected[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just do config, expected := configs[i], expected[i]
instead of the function call.
It will both be much shorter and much easier to support - if you want to add a new field you will now need to add it twice
stats/csv/collector_test.go
Outdated
}), | ||
}, | ||
} | ||
t.Run("Collect", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to call t.Run if you are not going to make at least two subtests in a test. The same goes for the other times in this file where t.Run is just called once in a test. This is not a problem but it's just adding more indentation
stats/csv/collector_test.go
Outdated
} | ||
|
||
for testname, tags := range testdata { | ||
func(testname string, tags []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function call can be replaced by testname, tags := testname, tags
stats/csv/collector.go
Outdated
return nil | ||
} | ||
|
||
row := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do know how many columns you will have (3 + len(restags) + 1 for the the extra tags. You can do row := make([]string, 0, 3 + len(resTags) + 1)
and not change any of the other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On similar note you can probably reuse the same slice over and over again as you are populating it and than writing it. This will have a lot of performance gain IMHO as it will practically remove a big chunk of allocation
Thank you for code review! I've made some changes to the code. Please check it, when you have the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've added some more tests and fixed the error you pointed out. Please review it, when you have the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @na-- will need to take a look as well and we are probably not going to merge it for a while, but we are probably going to try to put it in a release with the #1114 and #1113 .
I did some benchmarking and it seems to be doing much better than the json output. Maybe in the future we can add gzip compression to the csv output as well. If you find time, maybe you can add it here but it's not needed.
If you decide you can also possibly remove all the append
s in SampleToRow
as you can calculate all the indexes and using append will (even with all the optimizations in golang) generate some amount of new slices even if it doesn't generate new underlying array .
I instantiated a slice with initial length and removed "append" functions. Should be good now. |
Thank you for all the hard work!!! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your contribution!
Remember to cleanup/squash the commits before the final merge (not necessarilly into one commit, but whatever makes sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've noticed that json output takes up a lot of disk space for long running tests. I've found this issue #321, the csv output option was discussed there and I've decided to implement it. It works similar to json output, writing one sample per row. It uses twice as less space than json, so I've decided to open a pull request for it.
Maybe someone can suggest a way, how to transform the samples, so it would be one line per http request. Then I will make the changes and reopen the pull request.
I'm new to Go language, so I'm open to criticism.