Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Inconsistent Timestamp error on sumologic-metrics-collector pods #3601

Open
vignesh-codes opened this issue Mar 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@vignesh-codes
Copy link

Describe the bug

I have been testing out your new OtelCol for metrics replacing prometheus. It is scraping the required metrics and are working as expected. However, I found the following error messages in pod logs.

024-03-18T10:51:06.034Z	warn	internal/transaction.go:149	failed to add datapoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "inconsistent timestamps on metric points for metric container_file_descriptors", "metric_name": "container_file_descriptors", "labels": "{__name__=\"container_file_descriptors\", endpoint=\"https-metrics\", instance=\"ip-10-130-247-135.ec2.internal\", job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", node=\"ip-10-130-247-135.ec2.internal\", service=\"kube-prometheus-stack-kubelet\"}"}
2024-03-18T10:51:10.536Z	warn	internal/transaction.go:149	failed to add datapoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "inconsistent timestamps on metric points for metric container_tasks_state", "metric_name": "container_tasks_state", "labels": "{__name__=\"container_tasks_state\", endpoint=\"https-metrics\", instance=\"ip-10-130-150-118.ec2.internal\", job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", node=\"ip-10-130-150-118.ec2.internal\", service=\"kube-prometheus-stack-kubelet\", state=\"uninterruptible\"}"}
2024-03-18T10:51:12.904Z	warn	internal/transaction.go:123	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1710759072826, "target_labels": "{__name__=\"up\", instance=\"10.130.230.84:8181\", job=\"kubernetes-pods\"}"}

Configuration Configuration used for Collection e.g. user-values.yaml for helm.

sumologic:
  metrics:
      enabled: true
      remoteWriteProxy:
        enabled: false
      collector:
        ### Otel metrics collector. Replaces Prometheus.
        ## To enable, you need opentelemetry-operator enabled as well.
        otelcol:
          enabled: true

          ## Configure image for Opentelemetry Collector
          image:
            # tag: "0.85.0-sumo-0-fips"
            pullPolicy: IfNotPresent
            imagePullSecrets:
              - name: artifactory-secret

          ## Default scrape interval
          scrapeInterval: 30s

          ## Option to turn autoscaling on for otelcol and specify params for HPA.
          ## Autoscaling needs metrics-server to access cpu metrics.
          autoscaling:
            # enabled: false
            minReplicas: 2
            maxReplicas: 10
            targetCPUUtilizationPercentage: 70
            targetMemoryUtilizationPercentage: 70

          nodeSelector: {}

          ## Add custom annotations only to merics otelcol sts pods
          podAnnotations: {}
          ## Add custom labels only to metrics otelcol sts pods
          podLabels: {}

          ## Option to define priorityClassName to assign a priority class to pods.
          priorityClassName:

          replicaCount: 1
          tolerations:
          - key: kaas.xxx.io/pool
            operator: Equal
            value: management
          resources:
            limits:
              memory: 16Gi
              cpu: 1
            requests:
              memory: 3Gi
              cpu: 500m

          livenessProbe:
            failureThreshold: 10

          ## Selector for ServiceMonitors used for target discovery. By default, this selects resources created by this Chart.
          ## See https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr
          serviceMonitorSelector:
            k8c.collection.enabled: "true"

          ## Selector for PodMonitors used for target discovery. By default, this selects resources created by this Chart.
          ## See https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr
          # podMonitorSelector:

          securityContext:
            ## The group ID of all processes in the statefulset containers. This can be anything, but it does need to be set.
            ## The default is 0 (root), and containers don't have write permissions for volumes in that case.
            fsGroup: 999
          tolerations: []

          affinity: {}

          ## Configuration for kubelet metrics
          kubelet:
            enabled: false
          ## Configuration for cAdvisor metrics
          cAdvisor:
            enabled: false

          ## Enable collection of metrics from Pods annotated with prometheus.io/* keys.
          ## See https://help.sumologic.com/docs/send-data/kubernetes/collecting-metrics#application-metrics-are-exposed-one-endpoint-scenario for more information.
          annotatedPods:
            enabled: true

          ## Allocation strategy for the scrape target allocator. Valid values are: least-weighted and consistent-hashing.
          ## See: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocator
          # allocationStrategy: least-weighted

          config:
            ## Directly alter the OT configuration. The value of this key should be a dictionary, that will
            ## be directly merged with the generated configuration, overriding existing values.
            ## For example:
            # override:
            #   processors:
            #     batch:
            #       send_batch_size: 512
            ## will change the batch size of the pipeline.
            ##
            ## WARNING: This field is not subject to backwards-compatibility guarantees offered by the rest
            ## of this chart. It involves implementation details that may change even in minor versions.
            ## Use with caution, and consider opening an issue, so your customization can be added in a safer way.
            merge:
              receivers:
                prometheus:
                  config:
                    scrape_configs:
                    - job_name: teleport
                      metrics_path: '/metrics'
                      kubernetes_sd_configs:
                        - role: pod
                          selectors:
                            - role: pod
                              label: "application=teleport-agent"
                      relabel_configs:
                        - source_labels: [__meta_kubernetes_pod_container_port_number]
                          action: keep
                          regex: 3000
                        - source_labels: [__meta_kubernetes_namespace]
                          action: replace
                          target_label: namespace
                        - source_labels: [__meta_kubernetes_pod_name]
                          action: replace
                          target_label: pod
                      metric_relabel_configs:
                        - source_labels: [__name__]
                          regex: "(?:rx|tx|go_.*|process_.*|promhttp_metric_handler_requests_.*)"
                          action: drop
                    - job_name: 'kubernetes-pods'
                      kubernetes_sd_configs:
                        - role: pod
                      relabel_configs:
                        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                          action: keep
                          regex: true
                        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                          action: replace
                          target_label: __metrics_path__
                          regex: (.+)
                    - job_name: otc
                      metrics_path: '/metrics'
                      kubernetes_sd_configs:
                        - role: pod
                          selectors:
                            - role: pod
                              label: "app=sumologic-otelcol-metrics"
                      relabel_configs:
                        - source_labels: [__meta_kubernetes_pod_container_port_number]
                          action: keep
                          regex: 8888
                        - source_labels: [__meta_kubernetes_namespace]
                          action: replace
                          target_label: namespace
                        - source_labels: [__meta_kubernetes_pod_name]
                          action: replace
                          target_label: pod
                    - job_name: kube-api-blackbox
                      scrape_interval: 1m
                      scrape_timeout: 30s
                      metrics_path: '/probe'
                      params:
                        module: [ http_post_2xx ]
                      http_sd_configs:
                        - url: 'http://perfmon.xxx-perfmon.svc.cluster.local:8081'
                      relabel_configs:
                        - source_labels: [ __address__ ]
                          target_label: __param_target
                        - source_labels: [ __param_target ]
                          target_label: instance
                        - target_label: __address__
                          replacement: prometheus-blackbox-exporter.xxx-perfmon.svc.cluster.local:9116
                        - source_labels: [__param_target]
                          regex: '(.+)site=(.+)'
                          target_label: __site__
                          replacement: '${2}'
                        - source_labels: [__site__]
                          regex: '(.+)&connect(.+)'
                          target_label: site
                        - source_labels: [__param_target]
                          regex: '(.+)realm=(.+)'
                          target_label: __realm__
                          replacement: '${2}'
                        - source_labels: [__realm__]
                          regex: '(.+)&domain=(.+)'
                          target_label: realm
                        - source_labels: [__param_target]
                          regex: '(.+)domain=(.+)'
                          target_label: domain
                          replacement: '${2}'
                      metric_relabel_configs:
                        - source_labels: [ __name__ ]
                          regex: "(?:probe_success|probe_http_status_code|probe_duration_seconds|scrape_duration_seconds|up)"
                          action: keep
                    - job_name: pod-annotations
                      kubernetes_sd_configs:
                      - role: pod
                      relabel_configs:
                      - action: keep
                        regex: true
                        source_labels:
                        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
                      - action: replace
                        regex: (.+)
                        source_labels:
                        - __meta_kubernetes_pod_annotation_prometheus_io_path
                        target_label: __metrics_path__
                      - action: replace
                        regex: ([^:]+)(?::\d+)?;(\d+)
                        replacement: $1:$2
                        source_labels:
                        - __address__
                        - __meta_kubernetes_pod_annotation_prometheus_io_port
                        target_label: __address__
                      - action: replace
                        regex: (.*)
                        replacement: $1
                        separator: ;
                        source_labels:
                        - __metrics_path__
                        target_label: endpoint
                      - action: replace
                        source_labels:
                        - __meta_kubernetes_namespace
                        target_label: namespace
                      - action: labelmap
                        regex: __meta_kubernetes_pod_label_(.+)
                      - action: replace
                        regex: (.*)
                        replacement: $1
                        separator: ;
                        source_labels:
                        - __meta_kubernetes_pod_name
                        target_label: pod
                    - authorization:
                        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                      honor_labels: true
                      job_name: cadvisor
                      kubernetes_sd_configs:
                      - role: node
                      metric_relabel_configs:
                      - action: replace
                        regex: .*
                        replacement: kubelet
                        source_labels:
                        - __name__
                        target_label: job
                      - action: keep
                        regex: (?:container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_fs_usage_bytes|container_fs_limit_bytes|container_cpu_cfs_throttled_seconds_total|container_network_receive_bytes_total|container_network_transmit_bytes_total)
                        source_labels:
                        - __name__
                      - action: drop
                        regex: (?:container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_fs_usage_bytes|container_fs_limit_bytes);$
                        source_labels:
                        - __name__
                        - container
                      - action: labelmap
                        regex: container_name
                        replacement: container
                      - action: drop
                        regex: POD
                        source_labels:
                        - container
                      - action: labeldrop
                        regex: (id|name)
                      metrics_path: /metrics/cadvisor
                      relabel_configs:
                      - replacement: https-metrics
                        target_label: endpoint
                      - action: replace
                        source_labels:
                        - __metrics_path__
                        target_label: metrics_path
                      - action: replace
                        source_labels:
                        - __address__
                        target_label: instance
                      scheme: https
                      tls_config:
                        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                        insecure_skip_verify: true
            ## Completely override existing config and replace it with the contents of this value.
            ## The value of this key should be a dictionary, that will replace the normal configuration.
            ## This is an advanced feature, use with caution, and review the generated configuration first.
            override: {}

          ## Configuraton specific for target allocator
          targetAllocator:
            resources: {}

To Reproduce
Just the default scrape rules under otelcol causes the issue. I used sumologic.metrics.collector.otelcol.config.merge which has my custom jobs and the default cadvisor.

Expected behavior
That error should not be thrown

Environment

% kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.26.8
Kustomize Version: v4.5.7
Server Version: v1.26.13-eks-508b6b3
OTELCOL version

Version: 4.4.0
image: public.ecr.aws/sumologic/sumologic-otel-collector:0.92.0-sumo-0

  • Collection version (e.g. helm ls -n sumologic):
  • Kubernetes version (e.g. kubectl version):
  • Cloud provider: AWS
  • Others:

Anything else do we need to know
I am not entirely sure what is causing "inconsistent timestamps on metric points for metric container_file_descriptors" errors in logs. Does it cause loss of metrics data? Is there anything wrong with my configurations ?

@vignesh-codes vignesh-codes added the bug Something isn't working label Mar 18, 2024
@aboguszewski-sumo aboguszewski-sumo self-assigned this Apr 3, 2024
@aboguszewski-sumo
Copy link
Contributor

@vignesh-codes
After investigating the receiver's code, it seems that probably there are two metrics with the same name and different timestamp in the scraped data. Can you check whether it's true?

@aboguszewski-sumo aboguszewski-sumo removed their assignment Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants