You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been testing out your new OtelCol for metrics replacing prometheus. It is scraping the required metrics and are working as expected. However, I found the following error messages in pod logs.
Configuration Configuration used for Collection e.g. user-values.yaml for helm.
sumologic:
metrics:
enabled: true
remoteWriteProxy:
enabled: false
collector:
### Otel metrics collector. Replaces Prometheus.
## To enable, you need opentelemetry-operator enabled as well.
otelcol:
enabled: true
## Configure image for Opentelemetry Collector
image:
# tag: "0.85.0-sumo-0-fips"
pullPolicy: IfNotPresent
imagePullSecrets:
- name: artifactory-secret
## Default scrape interval
scrapeInterval: 30s
## Option to turn autoscaling on for otelcol and specify params for HPA.
## Autoscaling needs metrics-server to access cpu metrics.
autoscaling:
# enabled: false
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 70
nodeSelector: {}
## Add custom annotations only to merics otelcol sts pods
podAnnotations: {}
## Add custom labels only to metrics otelcol sts pods
podLabels: {}
## Option to define priorityClassName to assign a priority class to pods.
priorityClassName:
replicaCount: 1
tolerations:
- key: kaas.xxx.io/pool
operator: Equal
value: management
resources:
limits:
memory: 16Gi
cpu: 1
requests:
memory: 3Gi
cpu: 500m
livenessProbe:
failureThreshold: 10
## Selector for ServiceMonitors used for target discovery. By default, this selects resources created by this Chart.
## See https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr
serviceMonitorSelector:
k8c.collection.enabled: "true"
## Selector for PodMonitors used for target discovery. By default, this selects resources created by this Chart.
## See https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr
# podMonitorSelector:
securityContext:
## The group ID of all processes in the statefulset containers. This can be anything, but it does need to be set.
## The default is 0 (root), and containers don't have write permissions for volumes in that case.
fsGroup: 999
tolerations: []
affinity: {}
## Configuration for kubelet metrics
kubelet:
enabled: false
## Configuration for cAdvisor metrics
cAdvisor:
enabled: false
## Enable collection of metrics from Pods annotated with prometheus.io/* keys.
## See https://help.sumologic.com/docs/send-data/kubernetes/collecting-metrics#application-metrics-are-exposed-one-endpoint-scenario for more information.
annotatedPods:
enabled: true
## Allocation strategy for the scrape target allocator. Valid values are: least-weighted and consistent-hashing.
## See: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocator
# allocationStrategy: least-weighted
config:
## Directly alter the OT configuration. The value of this key should be a dictionary, that will
## be directly merged with the generated configuration, overriding existing values.
## For example:
# override:
# processors:
# batch:
# send_batch_size: 512
## will change the batch size of the pipeline.
##
## WARNING: This field is not subject to backwards-compatibility guarantees offered by the rest
## of this chart. It involves implementation details that may change even in minor versions.
## Use with caution, and consider opening an issue, so your customization can be added in a safer way.
merge:
receivers:
prometheus:
config:
scrape_configs:
- job_name: teleport
metrics_path: '/metrics'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: "application=teleport-agent"
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: 3000
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
metric_relabel_configs:
- source_labels: [__name__]
regex: "(?:rx|tx|go_.*|process_.*|promhttp_metric_handler_requests_.*)"
action: drop
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- job_name: otc
metrics_path: '/metrics'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: "app=sumologic-otelcol-metrics"
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: 8888
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- job_name: kube-api-blackbox
scrape_interval: 1m
scrape_timeout: 30s
metrics_path: '/probe'
params:
module: [ http_post_2xx ]
http_sd_configs:
- url: 'http://perfmon.xxx-perfmon.svc.cluster.local:8081'
relabel_configs:
- source_labels: [ __address__ ]
target_label: __param_target
- source_labels: [ __param_target ]
target_label: instance
- target_label: __address__
replacement: prometheus-blackbox-exporter.xxx-perfmon.svc.cluster.local:9116
- source_labels: [__param_target]
regex: '(.+)site=(.+)'
target_label: __site__
replacement: '${2}'
- source_labels: [__site__]
regex: '(.+)&connect(.+)'
target_label: site
- source_labels: [__param_target]
regex: '(.+)realm=(.+)'
target_label: __realm__
replacement: '${2}'
- source_labels: [__realm__]
regex: '(.+)&domain=(.+)'
target_label: realm
- source_labels: [__param_target]
regex: '(.+)domain=(.+)'
target_label: domain
replacement: '${2}'
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: "(?:probe_success|probe_http_status_code|probe_duration_seconds|scrape_duration_seconds|up)"
action: keep
- job_name: pod-annotations
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: replace
regex: (.*)
replacement: $1
separator: ;
source_labels:
- __metrics_path__
target_label: endpoint
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
regex: (.*)
replacement: $1
separator: ;
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
honor_labels: true
job_name: cadvisor
kubernetes_sd_configs:
- role: node
metric_relabel_configs:
- action: replace
regex: .*
replacement: kubelet
source_labels:
- __name__
target_label: job
- action: keep
regex: (?:container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_fs_usage_bytes|container_fs_limit_bytes|container_cpu_cfs_throttled_seconds_total|container_network_receive_bytes_total|container_network_transmit_bytes_total)
source_labels:
- __name__
- action: drop
regex: (?:container_cpu_usage_seconds_total|container_memory_working_set_bytes|container_fs_usage_bytes|container_fs_limit_bytes);$
source_labels:
- __name__
- container
- action: labelmap
regex: container_name
replacement: container
- action: drop
regex: POD
source_labels:
- container
- action: labeldrop
regex: (id|name)
metrics_path: /metrics/cadvisor
relabel_configs:
- replacement: https-metrics
target_label: endpoint
- action: replace
source_labels:
- __metrics_path__
target_label: metrics_path
- action: replace
source_labels:
- __address__
target_label: instance
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
## Completely override existing config and replace it with the contents of this value.
## The value of this key should be a dictionary, that will replace the normal configuration.
## This is an advanced feature, use with caution, and review the generated configuration first.
override: {}
## Configuraton specific for target allocator
targetAllocator:
resources: {}
To Reproduce
Just the default scrape rules under otelcol causes the issue. I used sumologic.metrics.collector.otelcol.config.merge which has my custom jobs and the default cadvisor.
Expected behavior
That error should not be thrown
Environment
% kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.26.8
Kustomize Version: v4.5.7
Server Version: v1.26.13-eks-508b6b3
OTELCOL version
Version: 4.4.0
image: public.ecr.aws/sumologic/sumologic-otel-collector:0.92.0-sumo-0
Collection version (e.g. helm ls -n sumologic):
Kubernetes version (e.g. kubectl version):
Cloud provider: AWS
Others:
Anything else do we need to know
I am not entirely sure what is causing "inconsistent timestamps on metric points for metric container_file_descriptors" errors in logs. Does it cause loss of metrics data? Is there anything wrong with my configurations ?
The text was updated successfully, but these errors were encountered:
@vignesh-codes
After investigating the receiver's code, it seems that probably there are two metrics with the same name and different timestamp in the scraped data. Can you check whether it's true?
Describe the bug
I have been testing out your new OtelCol for metrics replacing prometheus. It is scraping the required metrics and are working as expected. However, I found the following error messages in pod logs.
Configuration Configuration used for Collection e.g. user-values.yaml for helm.
To Reproduce
Just the default scrape rules under otelcol causes the issue. I used
sumologic.metrics.collector.otelcol.config.merge
which has my custom jobs and the default cadvisor.Expected behavior
That error should not be thrown
Environment
helm ls -n sumologic
):kubectl version
):Anything else do we need to know
I am not entirely sure what is causing "inconsistent timestamps on metric points for metric container_file_descriptors" errors in logs. Does it cause loss of metrics data? Is there anything wrong with my configurations ?
The text was updated successfully, but these errors were encountered: