Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage with NATS source on 0.21.2 #12834

Open
mikhailantoshkin opened this issue May 24, 2022 · 3 comments
Open

High memory usage with NATS source on 0.21.2 #12834

mikhailantoshkin opened this issue May 24, 2022 · 3 comments
Labels
meta: blocked Anything that is blocked to the point where it cannot be worked on. source: nats Anything `nats` source related type: bug A code related bug.

Comments

@mikhailantoshkin
Copy link
Contributor

mikhailantoshkin commented May 24, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I've been stress testing Vector to see how NATS + Vector will compare to our current setup of Rabbitmq + Logstash in terms of throughput and resources and observed weird memory usage pattern with NATS source. When hitting some arbitual RPS threshold (seems to be around 100k on my machine) memory usage shoots up until system hits swap and dies
image_2022-05-24_17-22-21

After stopping logs generation Vector takes a while to finish processing messages, but memory is not being reclaimed by the system

Configuration

healthchecks.require_healthy = true

[api]
enabled = true
address = "0.0.0.0:8686"
playground = true

[sources.nats]
type = "nats"
url = "nats://172.10.0.1:4222"
subject = "msg.test"
connection_name = "vector"


[sinks.bh]
type="blackhole"
inputs = ["nats"]

Version

vector 0.21.2 (x86_64-unknown-linux-gnu)

Debug Output

{"timestamp":"2022-05-24T13:09:08.417653Z","level":"INFO","message":"Log level is enabled.","level":"\"vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,kube=info\"","target":"vector::app"}
{"timestamp":"2022-05-24T13:09:08.417721Z","level":"INFO","message":"Loading configs.","paths":"[\"vector.toml\"]","target":"vector::app"}
{"timestamp":"2022-05-24T13:09:09.238561Z","level":"INFO","message":"Running healthchecks.","target":"vector::topology::running"}
{"timestamp":"2022-05-24T13:09:09.238597Z","level":"INFO","message":"Healthcheck: Passed.","target":"vector::topology::builder"}
{"timestamp":"2022-05-24T13:09:09.238609Z","level":"INFO","message":"All healthchecks passed.","target":"vector::topology::running"}
{"timestamp":"2022-05-24T13:09:09.238714Z","level":"INFO","message":"Vector has started.","debug":"false","version":"0.21.2","arch":"x86_64","build_id":"none","target":"vector"}
{"timestamp":"2022-05-24T13:09:09.238739Z","level":"INFO","message":"API server running.","address":"0.0.0.0:8686","playground":"http://0.0.0.0:8686/playground","target":"vector::internal_events::api"}
{"timestamp":"2022-05-24T13:09:09.239721Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:10.239852Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:11.240189Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:12.239427Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:13.239828Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:14.239980Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:15.239541Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:16.239452Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:17.239557Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:18.239804Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:19.240166Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:20.239335Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:21.239732Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:22.240085Z","level":"INFO","message":"Total events collected","events":0,"raw_bytes_collected":0,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:23.240066Z","level":"INFO","message":"Total events collected","events":27453,"raw_bytes_collected":25037136,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:24.239417Z","level":"INFO","message":"Total events collected","events":153416,"raw_bytes_collected":139915392,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:25.239620Z","level":"INFO","message":"Total events collected","events":317720,"raw_bytes_collected":289760640,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:26.239716Z","level":"INFO","message":"Total events collected","events":485151,"raw_bytes_collected":442457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:27.239869Z","level":"INFO","message":"Total events collected","events":651284,"raw_bytes_collected":593971008,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:28.239852Z","level":"INFO","message":"Total events collected","events":818455,"raw_bytes_collected":746430960,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:29.239525Z","level":"INFO","message":"Total events collected","events":984852,"raw_bytes_collected":898185024,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:30.239656Z","level":"INFO","message":"Total events collected","events":1151766,"raw_bytes_collected":1050410592,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:31.239845Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:32.240140Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:33.239604Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:34.239967Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:35.239670Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:36.239878Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:37.239480Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}
{"timestamp":"2022-05-24T13:09:38.239656Z","level":"INFO","message":"Total events collected","events":1297651,"raw_bytes_collected":1183457712,"target":"vector::sinks::blackhole::sink"}

Example Data

{"action":null,"additional_information":null,"box_id":"e5bd05e7-8972-4850-81a6-204a26ee4ca3","category":null,"create_time":"2022-05-23T21:04:13.013361","event_id":1070002,"justification":null,"host_id":"e5bd05e7-8972-4850-81a6-204a26ee4ca3","message":"test_message","protocol":null,"receive_time":"2022-05-24T08:04:25.330940","request_id":null,"severity":"low","source_ip":null,"source_mac":null,"source_id":null,"source_hostname":null,"source_port":null,"target_ip":null,"target_mac":null,"target_id":null,"target_hostname":null,"target_port":null,"user":null}

Additional Context

Log output with RUST_BACKTRACE and -vvv is 64M compressed (4.9G uncompressed), so I've put it on my Google Drive, it's available here

I'm building vector from source and running it in docker. Build command is

    CFLAGS="" \
    CHANNEL="stable" \
    TARGET=x86_64-unknown-linux-gnu \
    FEATURES="api,sources-demo_logs,sources-http,sources-syslog,sources-nats,transforms-filter,transforms-remap,transforms-reduce,transforms-route,sinks-console,sinks-elasticsearch,sinks-blackhole" \
    scripts/build.sh ; \

If there is anything I can provide to help further - please let me know

Thank you

References

No response

@mikhailantoshkin
Copy link
Contributor Author

UPD: seems like it is not a problem with Vector, but with nats crate, related issue

@jszwedko jszwedko added the source: nats Anything `nats` source related label May 25, 2022
@jszwedko
Copy link
Member

Thanks for digging into that and isolating the issue @mikhailantoshkin ! I'll follow along on that nats issue.

@jszwedko jszwedko added the meta: blocked Anything that is blocked to the point where it cannot be worked on. label Dec 23, 2022
@kong62
Copy link

kong62 commented May 29, 2023

filebeat -> kafka -> vector -> kafka

When encountering a large number of logs caused by a large number of QPS for the first time, the memory rises sharply, and the memory is never released afterwards, Final OOMKillerd

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: blocked Anything that is blocked to the point where it cannot be worked on. source: nats Anything `nats` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants