Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syslog source AWS NLB healthcheck tcp memory leak #17923

Closed
christophemorio opened this issue Jul 10, 2023 · 2 comments
Closed

syslog source AWS NLB healthcheck tcp memory leak #17923

christophemorio opened this issue Jul 10, 2023 · 2 comments
Labels
type: bug A code related bug.

Comments

@christophemorio
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

On AWS EKS,
Syslog tcp source exposed over a LoadBalancer of type NLB slowly increases memory usage continuously.
Increase rate is correlated to number of nodes.

In a nutshell, by default healthcheck is requested every 30s for each node
NLB --> all nodes Kubeproxy tcp/31xxx --> vector pods tcp/9514

It sounds the TCP healthcheck made by AWS NLB genarates a memory leak.
As a workaround, kubeproxy-less override forces healthcheck out of syslog tcp, and then memory usage still flat:

apiVersion: v1
kind: Service
spec:
      externalTrafficPolicy: Local
      ...

Configuration

---
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/name: vector-test
  name: vector-test-config
  namespace: default
data:
  main-config.yml: |
    ---
      sources:
        syslog_source:
          type: syslog
          address: 0.0.0.0:9514
          mode: tcp
      sinks:
        debug_file:
          type: file
          inputs:
            - syslog_source
          encoding:
            codec: json
          path: /tmp/syslog.log

---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app.kubernetes.io/name: vector-test
    vector.dev/exclude: "true"
  name: vector-test
  namespace: default
spec:
  containers:
  - args:
    - --config-dir
    - /etc/vector/
    env:
    - name: VECTOR_LOG
      value: debug
    image: docker.io/timberio/vector:0.31.0-debian
    name: vector
    ports:
    - containerPort: 9514
      name: syslog
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: 128Mi
      requests:
        cpu: "1"
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
    volumeMounts:
    - mountPath: /etc/vector/
      name: config
      readOnly: true
  volumes:
  - name: config
    projected:
      sources:
      - configMap:
          name: vector-test-config

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  labels:
    app.kubernetes.io/name: vector-test
  name: vector-test
  namespace: default
spec:
  # Workaround, set traffic policy to 'Local' to avoid memory increase:
  # externalTrafficPolicy: Local
  ports:
  - name: syslog
    port: 9514
    protocol: TCP
    targetPort: 9514
  selector:
    app.kubernetes.io/name: vector-test
  type: LoadBalancer

Version

0.31.0

Debug Output

2023-07-10T07:55:38.618432Z DEBUG source{component_kind="source" component_id=syslog_source component_type=syslog component_name=syslog_source}:connection{peer_addr=172.16.22.59:21598}: vector::sources::util::net::tcp: Accepted a new connection. peer_addr=172.16.22.59:21598
2023-07-10T07:55:38.618505Z DEBUG source{component_kind="source" component_id=syslog_source component_type=syslog component_name=syslog_source}:connection{peer_addr=172.16.22.59:21598}: vector::sources::util::net::tcp: Connection closed.


### Example Data

_No response_

### Additional Context

_No response_

### References

_No response_
@christophemorio christophemorio added the type: bug A code related bug. label Jul 10, 2023
@dsmith3197
Copy link
Contributor

This was likely due to the peer_addr tag that's added to internal metrics. Can you try upgrading to the latest version of Vector (v0.35.0) and let us know if that resolves the issue (#18982).

@christophemorio
Copy link
Author

Thanks for the heads up, indeed we are currently testing latest version and issue is gone.
🙇🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants