fix(lua transform): Explicitly call GC in `lua` transform #1990

ghost · 2020-03-05T13:36:24Z

Closes #1721.

This PR adds explicit call to the garbage collector after each invocation of the lua transform. It fixes the issues with growing RAM consumption mentioned in #1721. As I understand it, Lua runtime sometimes didn't call GC automatically, that's why there was leak-like pattern of memory usage.

Now the RAM usage of Vector with lua transform is flat, even with high event rates.

This explicit call doesn't significantly change the benchmarks because GC is called not after each invocation of the transform, but only each 16th invocation of the transform.

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>

binarylogic · 2020-03-05T14:57:06Z

Nice find, I’m glad we’ve fixed this.

MOZGIII

Nice catch! LGTM!!

binarylogic · 2020-03-05T16:48:19Z

I’m curious why you chose 16? Would you expect a performance hit for high volume streams? Ex: >5k events per second.

lukesteensen

Were you able to reproduce the original issue and see that this addresses it? I'm a bit surprised Lua would have this issue.

I'd also be curious to measure the overhead here in high-volume pipelines. GC can definitely be time-consuming and we want to find a reasonable tradeoff between memory use and throughput. Running every 16 invocations might be a bit often.

ghost · 2020-03-05T19:28:12Z

@lukesteensen

Were you able to reproduce the original issue and see that this addresses it? I'm a bit surprised Lua would have this issue.

I was able to reproduce the issue with the following config:

data_dir = "/var/lib/vector/"
dns_servers = []

[log_schema]
message_key = "message"
timestamp_key = "timestamp"
host_key = "host"

[sources.source0]
max_length = 102400
type = "stdin"

[transforms.transform0]
inputs = ["source0"]
type = "lua"
source = """
  event["count"] = 1
"""

[sinks.sink0]
healthcheck = true
inputs = ["transform0"]
type = "console"
encoding = "json"

[sinks.sink0.buffer]
type = "memory"
max_events = 500
when_full = "block"

and the following command:

cat /dev/urandom | base64 | vector --config vector.toml > /dev/null

In a minute Vector's RAM consumption grew up to a few gigabytes.

I'd also be curious to measure the overhead here in high-volume pipelines. GC can definitely be time-consuming and we want to find a reasonable tradeoff between memory use and throughput. Running every 16 invocations might be a bit often.

@binarylogic

I’m curious why you chose 16? Would you expect a performance hit for high volume streams? Ex: >5k events per second.

I measured the performance on the same config using pv utility (each line produced by base64 has length 76 characters here):

cat /dev/urandom | base64 | vector --config vector.toml | head -n3000000 | pv > /dev/null

For master and this branch the results were identical, 3M events were processed in 28 seconds in master and 27 seconds in this branch, which is around 105k events per second.

So it seems to be reasonably fast even for 16, but it can be made larger too (or exposed to the user). One of the reasons I chose 16 is that it allowed to test the change using cargo bench, because with larger numbers the GC could have not been invoked in the benchmarks.

lukesteensen

Nice job! Now I'm curious why it didn't show up in my tests...

binarylogic · 2020-03-05T21:41:17Z

Once this is merged we should cut a point release (0.8.2) and I can notify affected users.

* fix(lua transform): Explicitly call GC in `lua` transform Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com> * Call GC not after each invocation Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>

fix(lua transform): Explicitly call GC in lua transform

d5c4f39

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>

ghost force-pushed the lua-gc branch from 0fa3dcc to e5f9455 Compare March 5, 2020 13:50

Call GC not after each invocation

252d355

Signed-off-by: Alexander Rodin <rodin.alexander@gmail.com>

ghost force-pushed the lua-gc branch from e5f9455 to 252d355 Compare March 5, 2020 13:51

binarylogic requested a review from MOZGIII March 5, 2020 14:56

binarylogic assigned MOZGIII Mar 5, 2020

MOZGIII approved these changes Mar 5, 2020

View reviewed changes

lukesteensen reviewed Mar 5, 2020

View reviewed changes

lukesteensen approved these changes Mar 5, 2020

View reviewed changes

ghost merged commit 65aa39f into master Mar 6, 2020

ghost mentioned this pull request Mar 6, 2020

chore: Prepare v0.8.2 release #1995

Merged

ghost deleted the lua-gc branch March 6, 2020 15:29

binarylogic mentioned this pull request Apr 10, 2020

Trying to understand what vector is doing with memory (0.6.0) #1496

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lua transform): Explicitly call GC in `lua` transform #1990

fix(lua transform): Explicitly call GC in `lua` transform #1990

ghost commented Mar 5, 2020 •

edited by ghost

Loading

binarylogic commented Mar 5, 2020

MOZGIII left a comment

binarylogic commented Mar 5, 2020

lukesteensen left a comment

ghost commented Mar 5, 2020 •

edited by ghost

Loading

lukesteensen left a comment

binarylogic commented Mar 5, 2020 •

edited

Loading

fix(lua transform): Explicitly call GC in lua transform #1990

fix(lua transform): Explicitly call GC in lua transform #1990

Conversation

ghost commented Mar 5, 2020 • edited by ghost Loading

binarylogic commented Mar 5, 2020

MOZGIII left a comment

Choose a reason for hiding this comment

binarylogic commented Mar 5, 2020

lukesteensen left a comment

Choose a reason for hiding this comment

ghost commented Mar 5, 2020 • edited by ghost Loading

lukesteensen left a comment

Choose a reason for hiding this comment

binarylogic commented Mar 5, 2020 • edited Loading

fix(lua transform): Explicitly call GC in `lua` transform #1990

fix(lua transform): Explicitly call GC in `lua` transform #1990

ghost commented Mar 5, 2020 •

edited by ghost

Loading

ghost commented Mar 5, 2020 •

edited by ghost

Loading

binarylogic commented Mar 5, 2020 •

edited

Loading