-
Notifications
You must be signed in to change notification settings - Fork 196
metrics: add fio storage metrics test #650
metrics: add fio storage metrics test #650
Conversation
usage=$(cat << EOF | ||
Usage: $0 [-h] [options] | ||
Description: | ||
This script gathers a number of metrics for use in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: odd initial indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a while to figure out what you mean..... you mean you'd rather see Description
with no indent, and everything below pulled left one tab (and the same for the Options
) ? Sure, if that is what you mean, I can do that.
I copied this from another metrics file I think (maybe launch_times.sh). I see other scripts are hard-left aligned (like the ci static checks). I have no particular preference...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult to know how it will display when you run it, but the word "This" in line 1 of your description is indented much further than the other lines "report" (first word of line 2), "configured" (first word of line 3), etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I don't see that on either github, vim or when I run it - but, now you point it out I looked at the raw github source view, and I think I see there is a space/tab disparity - thx!
metrics/report/grabdata.sh
Outdated
local OPTIND | ||
while getopts "adhnst" opt;do | ||
case ${opt} in | ||
a) # all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could prolly delete # all
as that's clear from the line below I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx - looks like I missed that one.
metrics/report/grabdata.sh
Outdated
export PAYLOAD_SLEEP="10" | ||
export PAYLOAD="mysql" | ||
PAYLOAD_ARGS=" --innodb_use_native_aio=0 --disable-log-bin" | ||
PAYLOAD_RUNTIME_ARGS=" -m 4G -e MYSQL_ALLOW_EMPTY_PASSWORD=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing these memory sizes, do we document the minimum machine spec somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. I'll see what I can add to the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also.... if you don't have the memory, then the tests will fairly gracefully fail, and just produce a '0 containers launched' for that item. I guess there is a chance the stats will then either fall foul to a divide-by-zero or maybe produce an 'infinite memory' error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aww - no fun. What about the https://github.com/tiagoad/suicide-linux/blob/master/bash.bashrc approach? :)
metrics/report/grabdata.sh
Outdated
# a default timeout of 300s - if KSM has not settled down | ||
# by then, just take the measurement. | ||
# 'auto' mode should detect when KSM has settled automatically. | ||
bash density/docker_memory_usage.sh 20 300 auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd try to avoid duplicating the argument values in the comments (here and below) as it'll require extra maintenance and vigilance to ensure they remain in-sync (and let's face it, they are bound to go out of sync one day ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will see if I can reword to drop the numbers from the comments.
metrics/report/grabdata.sh
Outdated
bash density/footprint_data.sh | ||
|
||
# elasticsearch - large container | ||
export PAYLOAD_SLEEP="10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Que?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll comment that we need to wait some time for elastic to come up and settle before we both take the measurement or launch another one.
metrics/report/grabdata.sh
Outdated
export PAYLOAD_SLEEP="10" | ||
export PAYLOAD="elasticsearch" | ||
PAYLOAD_ARGS=" " | ||
PAYLOAD_RUNTIME_ARGS=" -m 8G" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crikey - see comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, c'mon, you can almost run that on your lappy ;-) Yeah, Elastic is hungry, which is why we use it as the large container example. Even on a moderate sized desktop (32G), we only manage to run 3 or so.
|
||
library(ggplot2) # ability to plot nicely | ||
library(gridExtra) # So we can plot multiple graphs together | ||
suppressMessages(suppressWarnings(library(ggpubr))) # for ggtexttable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: odd comment indentation.
# 95 and 99 percentiles, and it is much easier to visually group those to | ||
# each runtime if we use this colourmap. | ||
scale_colour_brewer(palette="Paired") | ||
# it would be nice to use the same legend theme as the other plots on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a FIXME
? Can it be removed / put into an issue rather than leaving here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more here so that if/when somebody views a report and thinks 'why are the legends not all the same', and come to fix it in the source, they find the explanation sat here waiting.
I'd rather not Issue it, as nobody is likely to fix it, and in a year somebody will close the Issue as stale, so it's just noise out in an Issue.
It is sadly, afaict, quite a lot of fiddly work to get the legends overlaid in a nice way that adapts dynamically to how many items there are in the list - it's already a little 'wonky' as it is.
percentile=single_blocksize$percentile, | ||
blocksize=single_blocksize$blocksize, | ||
runtime=single_blocksize$runtime), | ||
FUN=mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to flow this differently, but didn't find something that really pleased my eye. Any thoughts - the obvious one is to pull the list
close )
onto its own line, but it sort of looked all lonely out there... is this any nicer to your eyes?
clat_line=aggregate(
single_blocksize$ms,
by=list(
percentile=single_blocksize$percentile,
blocksize=single_blocksize$blocksize,
runtime=single_blocksize$runtime
),
FUN=mean
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - I'd initially missed the closing bracket after runtime=
! But yeah, your new format looks much clearer to me - ta ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Since I hadn't seen that bracket, it appeared that FUN=mean
needed to be indented below runtime=
fwiw)
@grahamwhaley , I ran the test
Is this behavior expected? Also, I see that the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Json outputs should be fixed
Hi @GabyCT . I'll also check for the valid JSON syntax. My editors built in syntax checker didn't show anything when I did check, and R seemed happy to import and process the data - but, I'll hit it with some harder JSON verification tools and see what they say. Sadly, if this is something |
Hi @GabyCT - when you say the JSON is not valid - I ran yq and jsonlint-php (whatever ubuntu insatlled as 'jsonlint-php'), and neither of them complained or error'd. Was it these sort of lines that you noticed? : "99.990000" : 0,
"0.00" : 0,
"0.00" : 0,
"0.00" : 0
} Sadly, they come straight out of |
@grahamwhaley , yes I saw those lines, I see that with |
Hi @GabyCT Right. Yeah, that tool says 'syntax error', but reading around some more (a bit of stackoverflow for instance), it is:
So, I think I have to say, yes, it would be nice if we could fix this, but I don't think there is an easy way, and presently I believe that the duplicate entries are benign (they all have the same values anyhow). Sorry, I don't think I can fix this one. I will push a change for the fallocate error in a minute though :-) |
pushed. And let's see if the CIs are still grumpy after this |
2a77da1
to
a6cc2bc
Compare
@grahamwhaley ok sure not a problem. let me know when I can test again to see that the error is gone |
OK @GabyCT - looks like the push landed (github is having a chug right now). The CIs have restarted.
etc. I tested with 9p/dm/runc - but yes please, a re-test would be good. |
@grahamwhaley , I re-run the test and I see that the error is gone :) |
@grahamwhaley just one last question, seeing the |
Hi @GabyCT Fair question :-)
but I didn't really want to hard code those, and the data can be extracted moderately simply (but I admit, not quite trivially), using JSON tools. In the reportgen we use $ jq '.Raw | .[] | .jobs | map(.read) | map(.bw) | add' fio-randread-4k.json
304012 Yes, it took a bit of reading to figure that out (i've not stroked We could add cooked results, but tbh, I suspect we'd just invoke the |
a6cc2bc
to
853192d
Compare
Rebased and pushed as the CI fail was too old to rekick. |
Measure storage I/O bandwidth, latency and IOPS using | ||
this [test](https://github.com/kata-containers/tests/blob/master/metrics/storage/fio.sh). | ||
|
||
This test measures using separate randome read and write tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/randome/random?
metrics/report/README.md
Outdated
@@ -27,6 +27,10 @@ Repeat this process if you want to compare multiple sets of results. Note, the | |||
report generation scripts process all subfolders of `tests/metrics/results` when | |||
generating the report. | |||
|
|||
> **Note:** By default, the `grabdata.sh` script ties to launch some moderately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ties/tries
metrics/storage/README.md
Outdated
@@ -0,0 +1,20 @@ | |||
# Kata Containers storage I/O tests | |||
|
|||
The metrics tests in this directory are desinged to be used to utilised in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/designed/designed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/utilised/utilized :D
@intelkevinputnam @klynnrif can you please take a look at the |
metrics/storage/README.md
Outdated
# Kata Containers storage I/O tests | ||
|
||
The metrics tests in this directory are desinged to be used to utilised in the | ||
assessment of storage IO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 3 - "used to assess storage IO."
metrics/storage/README.md
Outdated
## `blogbench` test | ||
|
||
The `blogbench` script is based on the [blogbench program](https://www.pureftpd.org/project/blogbench) | ||
which is designed to emulate a busy blog web server, with a number of concurrent thresds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 19 - ", which emulates a busy blog server with a number of concurrent threads"
# By default run a time base test | ||
FIO_TIMEBASED=${FIO_TIMEBASED:-1} | ||
# And 60s seems a good balance to get us repeatable numbers in not too long a time. | ||
FIO_RUNTIME=${FIO_RUNTIME:-60} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider adding ramp_time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do. I know we have used it before in some fio
testing.
I've added it in as an over-rideable, and set the default to 0s.
It's unclear to me just what it 'warms up' and in what circumstances - mostly I can see it would be caches, and mostly we are interested in un-cached setups and results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all the other grammar edits made in addition to one other suggested edit. Thanks!
metrics/report/README.md
Outdated
@@ -27,6 +27,10 @@ Repeat this process if you want to compare multiple sets of results. Note, the | |||
report generation scripts process all subfolders of `tests/metrics/results` when | |||
generating the report. | |||
|
|||
> **Note:** By default, the `grabdata.sh` script ties to launch some moderately | |||
> large containers (8Gbyte RAM), so may fail to produce some results on a memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
large containers (i.e. 8Gbyte RAM) and might fail to produce some results on a memory ...
Add `fio` based storage IO metrics test. Allow some flexibility in configuraiton through the use of environment variables. Fixes: kata-containers#631 Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Allow configuration, via the commandline, of which tests the grabdata.sh script runs. By default it will run them all. Added as some of the tests can take a long time, and sometimes you are really only interested in one or a few of the tests. To save having to edit the test before running, add a test selection capability to the commandline. Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Add the fio test, in default mode, to the report gen grabdata script storage section. For reference, on my machine the default fio test (random read and write in direct mode with a small set of blocksizes) takes about 10minutes to run. Signed-off-by: Graham Whaley <graham.whaley@intel.com>
853192d
to
cbf49e6
Compare
all typos and grammatical issues should be addressed now. That was obviously not my best spelling day - iirc, I was pushing it out last thing on a Friday ;-) |
@grahamwhaley - yep I think that might have been an attack of Warsaw's Second Law. We should implement that! Now that github has removed the punchcard graph (:sob:) nobody would notice right? :smile: |
ooh, green green... anybody? (@jodh-intel ;-)) |
Add an
fio
based storage metrics test.By default it will perform uncached (O_DIRECT) read and write tests.
It can be configured through environment variables to modify a number of
its parameters (including direct/buffered, random, linear, r/w/rw tests, number of jobs etc.
The one fixed parameter is that it always performs the tests upon a single file.
Also add invocation of the test and use of the resulting data to the report generation scripts.