enabled the memory buffer causes high memory usage #8998

wgb1990 · 2021-09-02T03:02:29Z

Vector Version

0.16.1

Vector Configuration File

[api]
enabled = true
address = "0.0.0.0:8686"

[sources]

  [sources.metrics]
  type = "internal_metrics"
  scrape_interval_secs = 2.0

  [sources.apps_log]
  type = "kafka"
  bootstrap_servers = ""
  group_id = "dw-log-forward-vector-gid"
  topics = [
    "dw-log-forward"
  ]

[transforms]

  [transforms.remap_message]
  type = "remap"
  inputs = [
    "apps_log"
  ]
  source = ". = parse_json!(string!(.message))\n.eventTime = to_int!(.timestamp)\n"

  [transforms.output_loki]
  type = "remap"
  inputs = [
    "remap_message"
  ]
  source = ".aggregator = get_env_var!(\"POD_NAME\")\ndel(.domain)\ndel(.eventTime)\n"

  [transforms.output]
  type = "route"
  inputs = [
    "remap_message"
  ]

    [transforms.output.route]
    java_gc_log = ".collectorType == \"gc\""
    java_error_log = ".level == \"ERROR\""

[sinks]

  [sinks.prometheus_exporter]
  type = "prometheus_exporter"
  inputs = [
    "metrics"
  ]
  address = "0.0.0.0:9598"
  default_namespace = "service"

  [sinks.kafka_gc]
  type = "kafka"
  inputs = [
    "output.java_gc_log"
  ]
  bootstrap_servers = ""
  compression = "gzip"
  topic = "java_gc_log"

    [sinks.kafka_gc.encoding]
    codec = "json"
    timestamp = "unix"

    [sinks.kafka_gc.healthcheck]
    enabled = true

  [sinks.kafka_error]
  type = "kafka"
  inputs = [
    "output.java_error_log"
  ]
  bootstrap_servers = ""
  compression = "gzip"
  topic = "exception_log_prd"

    [sinks.kafka_error.encoding]
    codec = "json"
    timestamp = "unix"

    [sinks.kafka_error.healthcheck]
    enabled = true

  [sinks.loki]
  type = "loki"
  inputs = [
    "output_loki"
  ]
  endpoint = ""
  tenant_id = "{{ tenantId }}"
  remove_label_fields = true
  out_of_order_action = "rewrite_timestamp"

    [sinks.loki.batch]
    max_bytes = 30490000
    max_events = 7000
    timeout_secs = 1

    [sinks.loki.buffer]
    type = "memory"
    max_events = 10240000

    [sinks.loki.request]
    concurrency = 10240000

    [sinks.loki.labels]
    service = "{{ service }}"
    hostname = "{{ hostname }}"
    level = "{{ level }}"
    collectorType = "{{ collectorType }}"
    aggregator = "{{ aggregator }}"

    [sinks.loki.encoding]
    codec = "json"
    timestamp_format = "rfc3339"

    [sinks.loki.healthcheck]
    enabled = true

Debug Output

Expected Behavior

memory consumption is within the normal range, and events can be sent to Loki storage normally

Actual Behavior

events accumulate in the buffer and eventually cause the pod to run out of memory and restart

Example Data

Vector I play the role of aggregator and have three nodes to process about 6000 events per second

Additional INFO

memory begins to grow at some point. It should be that the memory buffer accumulates a large number of events, which eventually causes Loki to stop sending.

References

The text was updated successfully, but these errors were encountered:

wgb1990 · 2021-09-03T08:04:24Z

I feel that the input throughput of vector is greater than the output, resulting in oom. I expanded from 3 nodes to 9 nodes. This phenomenon does not appear.

jszwedko · 2021-11-09T22:10:58Z

Hi @wgb1990 !

Apologies for the long delay in response.

A few things jumped out from your configuration:

    [sinks.loki.batch]
    max_bytes = 30490000
    max_events = 7000
    timeout_secs = 1

This will cause Vector to create batches of up to 7000 events or ~30 MB. The number of concurrent batches will be related to the number of partitions. For the loki sink one partition is created per unique set of labels:

    [sinks.loki.labels]
    service = "{{ service }}"
    hostname = "{{ hostname }}"
    level = "{{ level }}"
    collectorType = "{{ collectorType }}"
    aggregator = "{{ aggregator }}"

For:

    [sinks.loki.buffer]
    type = "memory"
    max_events = 10240000

Depending on your average event size, this could end up allocating a large amount as well. For example, if we assume your average event is 1 Kb, this would mean the buffer could be up to ~ 10 GB.

Does this additional context help? Your graphs just show percentages so I can't tell what the RSS is of Vector in absolute terms.

jszwedko · 2022-12-29T16:32:16Z

Closing this due to lack of response to the last comment. Feel free to re-open though.

wgb1990 added the type: bug A code related bug. label Sep 2, 2021

wgb1990 mentioned this issue Sep 2, 2021

enhancement(loki sink): Partition HTTP requests to loki by stream & re-enable concurrency #8615

Merged

wgb1990 changed the title ~~enabled the memory buffer causes high memory consumption~~ enabled the memory buffer causes high memory usage Sep 6, 2021

wgb1990 mentioned this issue Oct 8, 2021

Vector high memory usage / memory leak #9207

Open

jszwedko added the domain: performance Anything related to Vector's performance label Nov 9, 2021

jszwedko closed this as completed Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enabled the memory buffer causes high memory usage #8998

enabled the memory buffer causes high memory usage #8998

wgb1990 commented Sep 2, 2021 •

edited

Loading

wgb1990 commented Sep 3, 2021

jszwedko commented Nov 9, 2021 •

edited

Loading

jszwedko commented Dec 29, 2022

enabled the memory buffer causes high memory usage #8998

enabled the memory buffer causes high memory usage #8998

Comments

wgb1990 commented Sep 2, 2021 • edited Loading

Vector Version

Vector Configuration File

Debug Output

Expected Behavior

Actual Behavior

Example Data

Additional INFO

References

wgb1990 commented Sep 3, 2021

jszwedko commented Nov 9, 2021 • edited Loading

jszwedko commented Dec 29, 2022

wgb1990 commented Sep 2, 2021 •

edited

Loading

jszwedko commented Nov 9, 2021 •

edited

Loading