Fix InfluxDB using too much memory #1113

mstoykov · 2019-08-13T13:26:15Z

Text for release notes:

Influxdb refactoring:
the InfluxDB output has been updated to use less memory and try to send smaller and consistent chunks of data to InfluxDB in order to not drop packages and use less memory. This is primarily done by sending data in parallel as this seems to be better from performance perspective and more importantly queuing data in separate packages so we don't send the data for a very long period at a time. Also the library used was updated which also decreased the memory usage as well.

Two new options were added:
K6_INFLUXDB_PUSH_INTERVAL - configures at what interval the collected data is queued to be send to InfluxDB. By default this is "1s".
K6_INFLUXDB_CONCURRENT_WRITES - configures the amount of concurrent write calls to InfluxDB. If this limit is reached the next writes will be queued and made when a slot is freed. By default this is 10.

codecov-io · 2019-08-13T13:38:36Z

Codecov Report

Merging #1113 into master will increase coverage by 0.26%.
The diff coverage is 58.97%.

@@            Coverage Diff             @@
##           master    #1113      +/-   ##
==========================================
+ Coverage   73.39%   73.65%   +0.26%     
==========================================
  Files         142      142              
  Lines       10323    10353      +30     
==========================================
+ Hits         7577     7626      +49     
+ Misses       2303     2281      -22     
- Partials      443      446       +3

Impacted Files	Coverage Δ
stats/influxdb/util.go	`78.26% <ø> (ø)`	⬆️
stats/influxdb/config.go	`49.03% <33.33%> (-5.62%)`	⬇️
stats/influxdb/collector.go	`76.85% <88.88%> (+37.26%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14c8177...04e1379. Read the comment docs.

na--

LGTM, though please add something in the PR description we can use in the release notes later

imiric

LGTM, just a small typo fix missing.

stats/influxdb/collector.go

stats/influxdb/collector_test.go

imiric · 2019-08-29T09:16:13Z

stats/influxdb/config.go

+	Insecure         null.Bool          `json:"insecure,omitempty" envconfig:"INFLUXDB_INSECURE"`
+	PayloadSize      null.Int           `json:"payloadSize,omitempty" envconfig:"INFLUXDB_PAYLOAD_SIZE"`
+	PushInterval     types.NullDuration `json:"pushInterval,omitempty" envconfig:"INFLUXDB_PUSH_INTERVAL"`
+	ConcurrentWrites null.Int           `json:"concurrentWrites,omitempty" envconfig:"INFLUXDB_CONCURRENT_WRITES"`


Somewhat unrelated to this PR, but do we know if omitempty is correctly handled by guregu/null?

Their README mentions it as a bug and there's an open PR that apparently fixes it.

🤦‍♂️ yeah... this is problematic, though we probably have bigger issues than the influxdb config 😞 I think this might be the result of the same bug: #883 (comment)

It looks like it ...
maybe after #1070 we should fork the gurugu, merge that PR and use that through the modules ? cause that library looks pretty dead to me ... we can of course first ping the author on whether he will at least review the PR in question

Previously to this k6 will write to influxdb every second, but if that write took more than 1 second it won't start a second write but instead wait for it. This will generally lead to the write times getting bigger and bigger as more and more data is being written until the max body that influxdb will take is reached when it will return an error and k6 will drop that data. With this commit a configurable number of parallel writes (10 by default), starting again every 1 second (also now configurable). Additionally if we reach the 10 concurrent writes instead of sending all the data that accumulates we will just queue the samples that were generated. This should considerably help with no hitting the max body size of influxdb. I tested with a simple script, doing batch request for an empty local file with 40VUs. Without an output it was getting 8.1K RPS with 650MB of memory usage. Previous to this commit the usage of ram was ~5.7GB for 5736 rps and practically all the data gets lost if you don't up the max body and even than a lot of the data is lost while the memory usage goes up. After this commit the usage of ram was ~2.4GB(or less in some of the tests) with 6273 RPS and there was no lost of data. Even with this commit doing 2 hour of that simple script dies after 1hour and 35 minutes using around 15GB (the test system has 16). Can't be sure of lost of data, as influxdb eat 32GB of memory trying to visualize it and I had to kill it ;(. Some problems with this solution are that: 1. We use a lot of goroutines if things start slowing down - probably not a big problem, but still good idea to fix. 2. We can probably better batch stuff if we add/keep all the unsend samples together and cut them in let say 50k samples. 3. By far the biggest: because the writes are slow if the test is stopped (with Ctrl+C) or it finishes naturally, waiting for those writes can take considerable amount of time - in the above example the 4 minutes tests generally took around 5 minutes :( All of those can be better handled with some more sophisticated queueing code at later time. closes #1081, fixes #1100, fixes #182

This also upgrades it to the latest version and adds a benchmark which shows some less allocation: name old time/op new time/op delta Influxdb1Second-4 3.56ms ± 0% 3.56ms ± 1% ~ (p=0.841 n=5+5) Influxdb2Second-4 12.1ms ± 0% 12.1ms ± 0% ~ (p=0.095 n=5+5) Influxdb100Milliseconds-4 341µs ± 1% 333µs ± 1% -2.34% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Influxdb1Second-4 19.1kB ± 0% 16.6kB ± 0% -13.37% (p=0.008 n=5+5) Influxdb2Second-4 18.8kB ± 0% 16.2kB ± 0% -13.63% (p=0.016 n=5+4) Influxdb100Milliseconds-4 18.7kB ± 0% 16.1kB ± 0% -13.72% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Influxdb1Second-4 291 ± 0% 231 ± 0% -20.62% (p=0.008 n=5+5) Influxdb2Second-4 291 ± 0% 231 ± 0% -20.62% (p=0.008 n=5+5) Influxdb100Milliseconds-4 291 ± 0% 231 ± 0% -20.62% (p=0.008 n=5+5) The same test as in fce3884 gets 6909 RPS with 2.5GB of memory usage so a considerate bump in rps for small amount of memory.

mstoykov requested a review from na-- August 13, 2019 13:26

mstoykov mentioned this pull request Aug 14, 2019

Speed up JSON output #1114

Merged

na-- changed the title ~~Fix/influxdb using too much memory~~ Fix InfluxDB using too much memory Aug 14, 2019

mstoykov mentioned this pull request Aug 28, 2019

Gaps in reports #1100

Closed

na-- requested review from cuonglm and imiric August 28, 2019 13:49

cuonglm previously approved these changes Aug 29, 2019

View reviewed changes

na-- added this to the v0.26.0 milestone Aug 29, 2019

na-- previously approved these changes Aug 29, 2019

View reviewed changes

imiric reviewed Aug 29, 2019

View reviewed changes

mstoykov dismissed stale reviews from na-- and cuonglm via cf60f03 August 29, 2019 10:11

mstoykov force-pushed the fix/influxdbUsingTooMuchMemory branch from d5a653a to cf60f03 Compare August 29, 2019 10:11

mstoykov added 2 commits August 29, 2019 13:23

mstoykov force-pushed the fix/influxdbUsingTooMuchMemory branch from cf60f03 to 04e1379 Compare August 29, 2019 10:23

na-- approved these changes Aug 29, 2019

View reviewed changes

imiric self-requested a review August 29, 2019 11:01

imiric approved these changes Aug 29, 2019

View reviewed changes

mstoykov requested a review from cuonglm September 2, 2019 05:40

cuonglm approved these changes Sep 3, 2019

View reviewed changes

mstoykov merged commit d385166 into master Sep 3, 2019

mstoykov deleted the fix/influxdbUsingTooMuchMemory branch September 3, 2019 06:57

mstoykov mentioned this pull request Jul 9, 2020

Get memory usage down #1543

Closed

mstoykov mentioned this pull request Oct 3, 2020

Suspected memory leak #1549

Closed

na-- mentioned this pull request Jan 4, 2021

Running k6 with high number of VUs overloads InfluxDB #1060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix InfluxDB using too much memory #1113

Fix InfluxDB using too much memory #1113

mstoykov commented Aug 13, 2019 •

edited

Loading

codecov-io commented Aug 13, 2019 •

edited

Loading

na-- left a comment

imiric left a comment

imiric Aug 29, 2019

na-- Aug 29, 2019

mstoykov Aug 29, 2019

Fix InfluxDB using too much memory #1113

Fix InfluxDB using too much memory #1113

Conversation

mstoykov commented Aug 13, 2019 • edited Loading

codecov-io commented Aug 13, 2019 • edited Loading

Codecov Report

na-- left a comment

Choose a reason for hiding this comment

imiric left a comment

Choose a reason for hiding this comment

imiric Aug 29, 2019

Choose a reason for hiding this comment

na-- Aug 29, 2019

Choose a reason for hiding this comment

mstoykov Aug 29, 2019

Choose a reason for hiding this comment

mstoykov commented Aug 13, 2019 •

edited

Loading

codecov-io commented Aug 13, 2019 •

edited

Loading