Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus is crashing after sidecar injection #107

Closed
Pamir opened this issue Mar 20, 2019 · 19 comments
Closed

prometheus is crashing after sidecar injection #107

Pamir opened this issue Mar 20, 2019 · 19 comments
Assignees
Labels
question Further information is requested

Comments

@Pamir
Copy link
Contributor

Pamir commented Mar 20, 2019

error message : Tailing WAL failed: retrieve last checkpoint: open /data/wal: no such file or directory

export KUBE_NAMESPACE=monitoring
export GCP_PROJECT=<project_name>
export GCP_REGION=us-central1
export KUBE_CLUSTER=standard-cluster-1
export SIDECAR_IMAGE_TAG=release-0.4.0

prometheus operator values.yaml

# Default values for prometheus-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

## Provide a name in place of prometheus-operator for `app:` labels
##
nameOverride: ""

## Provide a name to substitute for the full names of resources
##
fullnameOverride: ""

## Labels to apply to all resources
##
commonLabels: {}
# scmhash: abc123
# myLabel: aakkmd

## Create default rules for monitoring the cluster
##
defaultRules:
  create: true
  ## Labels for default rules
  labels: {}
  ## Annotations for default rules
  annotations: {}

##
global:
  rbac:
    create: true
    pspEnabled: true

  ## Reference to one or more secrets to be used when pulling images
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  ##
  imagePullSecrets: []
  # - name: "image-pull-secret"

## Configuration for alertmanager
## ref: https://prometheus.io/docs/alerting/alertmanager/
##
alertmanager:

  ## Deploy alertmanager
  ##
  enabled: true

  ## Service account for Alertmanager to use.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
  ##
  serviceAccount:
    create: true
    name: ""

  ## Configure pod disruption budgets for Alertmanager
  ## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget
  ## This configuration is immutable once created and will require the PDB to be deleted to be changed
  ## https://github.com/kubernetes/kubernetes/issues/45398
  ##
  podDisruptionBudget:
    enabled: false
    minAvailable: 1
    maxUnavailable: ""

  ## Alertmanager configuration directives
  ## ref: https://prometheus.io/docs/alerting/configuration/#configuration-file
  ##      https://prometheus.io/webtools/alerting/routing-tree-editor/
  ##
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
      routes:
      - match:
          alertname: DeadMansSwitch
        receiver: 'null'
    receivers:
    - name: 'null'

  ## Alertmanager template files to format alerts
  ## ref: https://prometheus.io/docs/alerting/notifications/
  ##      https://prometheus.io/docs/alerting/notification_examples/
  ##
  templateFiles: {}
  #
  # An example template:
  #   template_1.tmpl: |-
  #       {{ define "cluster" }}{{ .ExternalURL | reReplaceAll ".*alertmanager\\.(.*)" "$1" }}{{ end }}
  #
  #       {{ define "slack.myorg.text" }}
  #       {{- $root := . -}}
  #       {{ range .Alerts }}
  #         *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
  #         *Cluster:*  {{ template "cluster" $root }}
  #         *Description:* {{ .Annotations.description }}
  #         *Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:>
  #         *Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:>
  #         *Details:*
  #           {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
  #           {{ end }}

  ingress:
    enabled: false

    annotations: {}

    labels: {}

    ## Hosts must be provided if Ingress is enabled.
    ##
    hosts: []
      # - alertmanager.domain.com

    ## TLS configuration for Alertmanager Ingress
    ## Secret must be manually created in the namespace
    ##
    tls: []
    # - secretName: alertmanager-general-tls
    #   hosts:
    #   - alertmanager.example.com

  ## Configuration for Alertmanager service
  ##
  service:
    annotations: {}
    labels: {}
    clusterIP: ""

  ## Port to expose on each node
  ## Only used if service.type is 'NodePort'
  ##
    nodePort: 30903
  ## List of IP addresses at which the Prometheus server service is available
  ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
  ##
    externalIPs: []
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    ## Service type
    ##
    type: ClusterIP

  ## If true, create a serviceMonitor for alertmanager
  ##
  serviceMonitor:
    selfMonitor: true

  ## Settings affecting alertmanagerSpec
  ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#alertmanagerspec
  ##
  alertmanagerSpec:
    ## Standard object’s metadata. More info: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#metadata
    ## Metadata Labels and Annotations gets propagated to the Alertmanager pods.
    ##
    podMetadata: {}

    ## Image of Alertmanager
    ##
    image:
      repository: quay.io/prometheus/alertmanager
      tag: v0.15.3

    ## Secrets is a list of Secrets in the same namespace as the Alertmanager object, which shall be mounted into the
    ## Alertmanager Pods. The Secrets are mounted into /etc/alertmanager/secrets/.
    ##
    secrets: []

    ## ConfigMaps is a list of ConfigMaps in the same namespace as the Alertmanager object, which shall be mounted into the Alertmanager Pods.
    ## The ConfigMaps are mounted into /etc/alertmanager/configmaps/.
    ##
    configMaps: []

    ## Log level for Alertmanager to be configured with.
    ##
    logLevel: info

    ## Size is the expected size of the alertmanager cluster. The controller will eventually make the size of the
    ## running cluster equal to the expected size.
    replicas: 1

    ## Time duration Alertmanager shall retain data for. Default is '120h', and must match the regular expression
    ## [0-9]+(ms|s|m|h) (milliseconds seconds minutes hours).
    ##
    retention: 120h

    ## Storage is the definition of how storage will be used by the Alertmanager instances.
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md
    ##
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    


    ## 	The external URL the Alertmanager instances will be available under. This is necessary to generate correct URLs. This is necessary if Alertmanager is not served from root of a DNS name.	string	false
    ##
    externalUrl:

    ## 	The route prefix Alertmanager registers HTTP handlers for. This is useful, if using ExternalURL and a proxy is rewriting HTTP routes of a request, and the actual ExternalURL is still true,
    ## but the server serves requests under a different route prefix. For example for use with kubectl proxy.
    ##
    routePrefix: /

    ## If set to true all actions on the underlying managed objects are not going to be performed, except for delete actions.
    ##
    paused: false

    ## Define which Nodes the Pods are scheduled on.
    ## ref: https://kubernetes.io/docs/user-guide/node-selection/
    ##
    nodeSelector: {}

    ## Define resources requests and limits for single Pods.
    ## ref: https://kubernetes.io/docs/user-guide/compute-resources/
    ##
    resources: {}
    # requests:
    #   memory: 400Mi

    ## Pod anti-affinity can prevent the scheduler from placing Prometheus replicas on the same node.
    ## The default value "soft" means that the scheduler should *prefer* to not schedule two replica pods onto the same node but no guarantee is provided.
    ## The value "hard" means that the scheduler is *required* to not schedule two replica pods onto the same node.
    ## The value "" will disable pod anti-affinity so that no anti-affinity rules will be configured.
    ##
    podAntiAffinity: ""

    ## If anti-affinity is enabled sets the topologyKey to use for anti-affinity.
    ## This can be changed to, for example, failure-domain.beta.kubernetes.io/zone
    ##
    podAntiAffinityTopologyKey: kubernetes.io/hostname

    ## If specified, the pod's tolerations.
    ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
    ##
    tolerations: []
    # - key: "key"
    #   operator: "Equal"
    #   value: "value"
    #   effect: "NoSchedule"

    ## SecurityContext holds pod-level security attributes and common container settings.
    ## This defaults to non root user with uid 1000 and gid 2000.	*v1.PodSecurityContext	false
    ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
    ##
    securityContext:
      runAsNonRoot: true
      runAsUser: 1000
      fsGroup: 2000

    ## ListenLocal makes the Alertmanager server listen on loopback, so that it does not bind against the Pod IP.
    ## Note this is only for the Alertmanager UI, not the gossip communication.
    ##
    listenLocal: false

    ## Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to an Alertmanager pod.
    ##
    containers: []

    ## Priority class assigned to the Pods
    ##
    priorityClassName: ""

    ## AdditionalPeers allows injecting a set of additional Alertmanagers to peer with to form a highly available cluster.
    ##
    additionalPeers: []

## Using default values from https://github.com/helm/charts/blob/master/stable/grafana/values.yaml
##
grafana:
  enabled: true

  ## Deploy default dashboards.
  ##
  defaultDashboardsEnabled: true

  adminPassword: prom-operator

  ingress:
    ## If true, Prometheus Ingress will be created
    ##
    enabled: false

    ## Annotations for Prometheus Ingress
    ##
    annotations: {}
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"

    ## Labels to be added to the Ingress
    ##
    labels: {}

    ## Hostnames.
    ## Must be provided if Ingress is enable.
    ##
    # hosts:
    #   - prometheus.domain.com
    hosts: []

    ## TLS configuration for prometheus Ingress
    ## Secret must be manually created in the namespace
    ##
    tls: []
    # - secretName: prometheus-general-tls
    #   hosts:
    #   - prometheus.example.com

  sidecar:
    dashboards:
      enabled: true
      label: grafana_dashboard
    datasources:
      enabled: true
      label: grafana_datasource

  extraConfigmapMounts: []
  # - name: certs-configmap
  #   mountPath: /etc/grafana/ssl/
  #   configMap: certs-configmap
  #   readOnly: true


## Component scraping the kube api server
##
kubeApiServer:
  enabled: true
  tlsConfig:
    serverName: kubernetes
    insecureSkipVerify: false

  serviceMonitor:
    jobLabel: component
    selector:
      matchLabels:
        component: apiserver
        provider: kubernetes

## Component scraping the kubelet and kubelet-hosted cAdvisor
##
kubelet:
  enabled: true
  namespace: kube-system

  serviceMonitor:
    ## Enable scraping the kubelet over https. For requirements to enable this see
    ## https://github.com/coreos/prometheus-operator/issues/926
    ##
    https: true

## Component scraping the kube controller manager
##
kubeControllerManager:
  enabled: true

  ## If your kube controller manager is not deployed as a pod, specify IPs it can be found on
  ##
  endpoints: []
  # - 10.141.4.22
  # - 10.141.4.23
  # - 10.141.4.24

  ## If using kubeControllerManager.endpoints only the port and targetPort are used
  ##
  service:
    port: 10252
    targetPort: 10252
    selector:
      k8s-app: kube-controller-manager
## Component scraping coreDns. Use either this or kubeDns
##
coreDns:
  enabled: true
  service:
    port: 9153
    targetPort: 9153
    selector:
      k8s-app: coredns

## Component scraping kubeDns. Use either this or coreDns
##
kubeDns:
  enabled: false
  service:
    selector:
      k8s-app: kube-dns
## Component scraping etcd
##
kubeEtcd:
  enabled: true

  ## If your etcd is not deployed as a pod, specify IPs it can be found on
  ##
  endpoints: []
  # - 10.141.4.22
  # - 10.141.4.23
  # - 10.141.4.24

  ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used
  ##
  service:
    port: 4001
    targetPort: 4001
    selector:
      k8s-app: etcd-server

  ## Configure secure access to the etcd cluster by loading a secret into prometheus and
  ## specifying security configuration below. For example, with a secret named etcd-client-cert
  ##
  ## serviceMonitor:
  ##   scheme: https
  ##   insecureSkipVerify: false
  ##   serverName: localhost
  ##   caFile: /etc/prometheus/secrets/etcd-client-cert/etcd-ca
  ##   certFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client
  ##   keyFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key
  ##
  serviceMonitor:
    scheme: http
    insecureSkipVerify: false
    serverName: ""
    caFile: ""
    certFile: ""
    keyFile: ""


## Component scraping kube scheduler
##
kubeScheduler:
  enabled: true

  ## If your kube scheduler is not deployed as a pod, specify IPs it can be found on
  ##
  endpoints: []
  # - 10.141.4.22
  # - 10.141.4.23
  # - 10.141.4.24

  ## If using kubeScheduler.endpoints only the port and targetPort are used
  ##
  service:
    port: 10251
    targetPort: 10251
    selector:
      k8s-app: kube-scheduler

## Component scraping kube state metrics
##
kubeStateMetrics:
  enabled: true

## Configuration for kube-state-metrics subchart
##
kube-state-metrics:
  rbac:
    create: true
  podSecurityPolicy:
    enabled: true

## Deploy node exporter as a daemonset to all nodes
##
nodeExporter:
  enabled: true

  ## Use the value configured in prometheus-node-exporter.podLabels
  ##
  jobLabel: jobLabel

## Configuration for prometheus-node-exporter subchart
##
prometheus-node-exporter:
  podLabels:
    ## Add the 'node-exporter' label to be used by serviceMonitor to match standard common usage in rules and grafana dashboards
    ##
    jobLabel: node-exporter
  extraArgs:
    - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
    - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$

## Manages Prometheus and Alertmanager components
##
prometheusOperator:
  enabled: true

  ## Service account for Alertmanager to use.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
  ##
  serviceAccount:
    create: true
    name: ""

  ## Configuration for Prometheus operator service
  ##
  service:
    annotations: {}
    labels: {}
    clusterIP: ""

  ## Port to expose on each node
  ## Only used if service.type is 'NodePort'
  ##
    nodePort: 38080


  ## Loadbalancer IP
  ## Only use if service.type is "loadbalancer"
  ##
    loadBalancerIP: ""
    loadBalancerSourceRanges: []

  ## Service type
  ## NodepPort, ClusterIP, loadbalancer
  ##
    type: ClusterIP

    ## List of IP addresses at which the Prometheus server service is available
    ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
    ##
    externalIPs: []

  ## Deploy CRDs used by Prometheus Operator.
  ##
  createCustomResource: true

  ## Customize CRDs API Group
  crdApiGroup: monitoring.coreos.com

  ## Attempt to clean up CRDs created by Prometheus Operator.
  ##
  cleanupCustomResource: false

  ## Labels to add to the operator pod
  ##
  podLabels: {}

  ## Assign a PriorityClassName to pods if set
  # priorityClassName: ""

  ## If true, the operator will create and maintain a service for scraping kubelets
  ## ref: https://github.com/coreos/prometheus-operator/blob/master/helm/prometheus-operator/README.md
  ##
  kubeletService:
    enabled: true
    namespace: kube-system

  ## Create a servicemonitor for the operator
  ##
  serviceMonitor:
    selfMonitor: true

  ## Resource limits & requests
  ##
  resources: {}
  # limits:
  #   cpu: 200m
  #   memory: 200Mi
  # requests:
  #   cpu: 100m
  #   memory: 100Mi

  ## Define which Nodes the Pods are scheduled on.
  ## ref: https://kubernetes.io/docs/user-guide/node-selection/
  ##
  nodeSelector: {}

  ## Tolerations for use with node taints
  ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  ##
  tolerations: []
  # - key: "key"
  #   operator: "Equal"
  #   value: "value"
  #   effect: "NoSchedule"

  ## Assign the prometheus operator to run on specific nodes
  ## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  ##
  affinity: {}
  # requiredDuringSchedulingIgnoredDuringExecution:
  #   nodeSelectorTerms:
  #   - matchExpressions:
  #     - key: kubernetes.io/e2e-az-name
  #       operator: In
  #       values:
  #       - e2e-az1
  #       - e2e-az2

  securityContext:
    runAsNonRoot: true
    runAsUser: 65534

  ## Prometheus-operator image
  ##
  image:
    repository: quay.io/coreos/prometheus-operator
    tag: v0.26.0
    pullPolicy: IfNotPresent

  ## Configmap-reload image to use for reloading configmaps
  ##
  configmapReloadImage:
    repository: quay.io/coreos/configmap-reload
    tag: v0.0.1

  ## Prometheus-config-reloader image to use for config and rule reloading
  ##
  prometheusConfigReloaderImage:
    repository: quay.io/coreos/prometheus-config-reloader
    tag: v0.26.0

  ## Hyperkube image to use when cleaning up
  ##
  hyperkubeImage:
    repository: k8s.gcr.io/hyperkube
    tag: v1.12.1
    pullPolicy: IfNotPresent

## Deploy a Prometheus instance
##
prometheus:

  enabled: true

  ## Service account for Prometheuses to use.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
  ##
  serviceAccount:
    create: true
    name: ""

  ## Configuration for Prometheus service
  ##
  service:
    annotations: {}
    labels: {}
    clusterIP: ""

    ## List of IP addresses at which the Prometheus server service is available
    ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
    ##
    externalIPs: []

    ## Port to expose on each node
    ## Only used if service.type is 'NodePort'
    ##
    nodePort: 39090

    ## Loadbalancer IP
    ## Only use if service.type is "loadbalancer"
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    ## Service type
    ##
    type: ClusterIP

  rbac:
    ## Create role bindings in the specified namespaces, to allow Prometheus monitoring
    ## a role binding in the release namespace will always be created.
    ##
    roleNamespaces:
      - kube-system

  ## Configure pod disruption budgets for Prometheus
  ## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget
  ## This configuration is immutable once created and will require the PDB to be deleted to be changed
  ## https://github.com/kubernetes/kubernetes/issues/45398
  ##
  podDisruptionBudget:
    enabled: false
    minAvailable: 1
    maxUnavailable: ""

  ingress:
    enabled: false
    annotations: {}
    labels: {}

    ## Hostnames.
    ## Must be provided if Ingress is enabled.
    ##
    # hosts:
    #   - prometheus.domain.com
    hosts: []

    ## TLS configuration for Prometheus Ingress
    ## Secret must be manually created in the namespace
    ##
    tls: []
      # - secretName: prometheus-general-tls
      #   hosts:
      #     - prometheus.example.com

  serviceMonitor:
    selfMonitor: true

  ## Settings affecting prometheusSpec
  ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
  ##
  prometheusSpec:

    ## Interval between consecutive scrapes.
    ##
    scrapeInterval: ""

    ## Interval between consecutive evaluations.
    ##
    evaluationInterval: ""

    ## ListenLocal makes the Prometheus server listen on loopback, so that it does not bind against the Pod IP.
    ##
    listenLocal: false

    ## Image of Prometheus.
    ##
    image:
      repository: quay.io/prometheus/prometheus
      tag: v2.5.0

    #  repository: quay.io/coreos/prometheus
    #  tag: v2.5.0

    ## Tolerations for use with node taints
    ## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
    ##
    tolerations: []
    #  - key: "key"
    #    operator: "Equal"
    #    value: "value"
    #    effect: "NoSchedule"

    ## Alertmanagers to which alerts will be sent
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#alertmanagerendpoints
    ##
    ## Default configuration will connect to the alertmanager deployed as part of this release
    ##
    alertingEndpoints: []
    # - name: ""
    #   namespace: ""
    #   port: http
    #   scheme: http

    ## External labels to add to any time series or alerts when communicating with external systems
    ##
    externalLabels: {}

    ## External URL at which Prometheus will be reachable.
    ##
    externalUrl: ""

    ## Define which Nodes the Pods are scheduled on.
    ## ref: https://kubernetes.io/docs/user-guide/node-selection/
    ##
    nodeSelector: {}

    ## Secrets is a list of Secrets in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.
    ## The Secrets are mounted into /etc/prometheus/secrets/. Secrets changes after initial creation of a Prometheus object are not
    ## reflected in the running Pods. To change the secrets mounted into the Prometheus Pods, the object must be deleted and recreated
    ## with the new list of secrets.
    ##
    secrets: []

    ## ConfigMaps is a list of ConfigMaps in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.
    ## The ConfigMaps are mounted into /etc/prometheus/configmaps/.
    ##
    configMaps: []

    ## Namespaces to be selected for PrometheusRules discovery.
    ## If unspecified, only the same namespace as the Prometheus object is in is used.
    ##
    ruleNamespaceSelector: {}

    ## If true, a nil or {} value for prometheus.prometheusSpec.ruleSelector will cause the
    ## prometheus resource to be created with selectors based on values in the helm deployment,
    ## which will also match the PrometheusRule resources created
    ##
    ruleSelectorNilUsesHelmValues: true

    ## Rules CRD selector
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/design.md
    ## If unspecified the release `app` and `release` will be used as the label selector
    ## to load rules
    ##
    ruleSelector: {}
    ## Example which select all prometheusrules resources
    ## with label "prometheus" with values any of "example-rules" or "example-rules-2"
    # ruleSelector:
    #   matchExpressions:
    #     - key: prometheus
    #       operator: In
    #       values:
    #         - example-rules
    #         - example-rules-2
    #
    ## Example which select all prometheusrules resources with label "role" set to "example-rules"
    # ruleSelector:
    #   matchLabels:
    #     role: example-rules

    ## If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector will cause the
    ## prometheus resource to be created with selectors based on values in the helm deployment,
    ## which will also match the servicemonitors created
    ##
    serviceMonitorSelectorNilUsesHelmValues: true

    ## serviceMonitorSelector will limit which servicemonitors are used to create scrape
    ## configs in Prometheus. See serviceMonitorSelectorUseHelmLabels
    ##
    serviceMonitorSelector: {}

    # serviceMonitorSelector: {}
    #   matchLabels:
    #     prometheus: somelabel

    ## serviceMonitorNamespaceSelector will limit namespaces from which serviceMonitors are used to create scrape
    ## configs in Prometheus. By default all namespaces will be used
    ##
    serviceMonitorNamespaceSelector: {}

    ## How long to retain metrics
    ##
    retention: 10d

    ## If true, the Operator won't process any Prometheus configuration changes
    ##
    paused: false

    ## Number of Prometheus replicas desired
    ##
    replicas: 1

    ## Log level for Prometheus be configured in
    ##
    logLevel: info

    ## Prefix used to register routes, overriding externalUrl route.
    ## Useful for proxies that rewrite URLs.
    ##
    routePrefix: /

    ## Standard object’s metadata. More info: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#metadata
    ## Metadata Labels and Annotations gets propagated to the prometheus pods.
    ##
    podMetadata: {}
    # labels:
    #   app: prometheus
    #   k8s-app: prometheus

    ## Pod anti-affinity can prevent the scheduler from placing Prometheus replicas on the same node.
    ## The default value "soft" means that the scheduler should *prefer* to not schedule two replica pods onto the same node but no guarantee is provided.
    ## The value "hard" means that the scheduler is *required* to not schedule two replica pods onto the same node.
    ## The value "" will disable pod anti-affinity so that no anti-affinity rules will be configured.
    podAntiAffinity: ""

    ## If anti-affinity is enabled sets the topologyKey to use for anti-affinity.
    ## This can be changed to, for example, failure-domain.beta.kubernetes.io/zone
    ##
    podAntiAffinityTopologyKey: kubernetes.io/hostname

    ## The remote_read spec configuration for Prometheus.
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#remotereadspec
    remoteRead: {}
    # - url: http://remote1/read

    ## The remote_write spec configuration for Prometheus.
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#remotewritespec
    remoteWrite: {}
      # remoteWrite:
      #   - url: http://remote1/push

    ## Resource limits & requests
    ##
    resources: {}
    # requests:
    #   memory: 400Mi

    ## Prometheus StorageSpec for persistent data
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md
    ##
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    

    ## AdditionalScrapeConfigs allows specifying additional Prometheus scrape configurations. Scrape configurations
    ## are appended to the configurations generated by the Prometheus Operator. Job configurations must have the form
    ## as specified in the official Prometheus documentation:
    ## https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<scrape_config>. As scrape configs are
    ## appended, the user is responsible to make sure it is valid. Note that using this feature may expose the possibility
    ## to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible
    ## scrape configs are going to break Prometheus after the upgrade.
    ##
    ## The scrape configuraiton example below will find master nodes, provided they have the name .*mst.*, relabel the
    ## port to 2379 and allow etcd scraping provided it is running on all Kubernetes master nodes
    ##
    additionalScrapeConfigs: []
    # - job_name: kube-etcd
    #   kubernetes_sd_configs:
    #     - role: node
    #   scheme: https
    #   tls_config:
    #     ca_file:   /etc/prometheus/secrets/etcd-client-cert/etcd-ca
    #     cert_file: /etc/prometheus/secrets/etcd-client-cert/etcd-client
    #     key_file:  /etc/prometheus/secrets/etcd-client-cert/etcd-client-key
    #   relabel_configs:
    #   - action: labelmap
    #     regex: __meta_kubernetes_node_label_(.+)
    #   - source_labels: [__address__]
    #     action: replace
    #     target_label: __address__
    #     regex: ([^:;]+):(\d+)
    #     replacement: ${1}:2379
    #   - source_labels: [__meta_kubernetes_node_name]
    #     action: keep
    #     regex: .*mst.*
    #   - source_labels: [__meta_kubernetes_node_name]
    #     action: replace
    #     target_label: node
    #     regex: (.*)
    #     replacement: ${1}
    #   metric_relabel_configs:
    #   - regex: (kubernetes_io_hostname|failure_domain_beta_kubernetes_io_region|beta_kubernetes_io_os|beta_kubernetes_io_arch|beta_kubernetes_io_instance_type|failure_domain_beta_kubernetes_io_zone)
    #     action: labeldrop


    ## AdditionalAlertManagerConfigs allows for manual configuration of alertmanager jobs in the form as specified
    ## in the official Prometheus documentation https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<alertmanager_config>.
    ## AlertManager configurations specified are appended to the configurations generated by the Prometheus Operator.
    ## As AlertManager configs are appended, the user is responsible to make sure it is valid. Note that using this
    ## feature may expose the possibility to break upgrades of Prometheus. It is advised to review Prometheus release
    ## notes to ensure that no incompatible AlertManager configs are going to break Prometheus after the upgrade.
    ##
    additionalAlertManagerConfigs: []
    # - consul_sd_configs:
    #   - server: consul.dev.test:8500
    #     scheme: http
    #     datacenter: dev
    #     tag_separator: ','
    #     services:
    #       - metrics-prometheus-alertmanager

    ## AdditionalAlertRelabelConfigs allows specifying Prometheus alert relabel configurations. Alert relabel configurations specified are appended
    ## to the configurations generated by the Prometheus Operator. Alert relabel configurations specified must have the form as specified in the
    ## official Prometheus documentation: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alert_relabel_configs.
    ## As alert relabel configs are appended, the user is responsible to make sure it is valid. Note that using this feature may expose the
    ## possibility to break upgrades of Prometheus. It is advised to review Prometheus release notes to ensure that no incompatible alert relabel
    ## configs are going to break Prometheus after the upgrade.
    ##
    additionalAlertRelabelConfigs: []
    # - separator: ;
    #   regex: prometheus_replica
    #   replacement: $1
    #   action: labeldrop

    ## SecurityContext holds pod-level security attributes and common container settings.
    ## This defaults to non root user with uid 1000 and gid 2000.
    ## https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md
    ##
    securityContext:
      runAsNonRoot: true
      runAsUser: 1000
      fsGroup: 2000

    ## 	Priority class assigned to the Pods
    ##
    priorityClassName: ""

    ## Thanos configuration allows configuring various aspects of a Prometheus server in a Thanos environment.
    ## This section is experimental, it may change significantly without deprecation notice in any release.
    ## This is experimental and may change significantly without backward compatibility in any release.
    ## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#thanosspec
    ##
    thanos: {}

    ## Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to a Prometheus pod.
    ##
    containers: []

    ## Enable additional scrape configs that are managed externally to this chart. Note that the prometheus
    ## will fail to provision if the correct secret does not exist.
    ##
    additionalScrapeConfigsExternal: false

  additionalServiceMonitors: []
  ## Name of the ServiceMonitor to create
  ##
  # - name: ""

    ## Additional labels to set used for the ServiceMonitorSelector. Together with standard labels from
    ## the chart
    ##
    # additionalLabels: {}

    ## Service label for use in assembling a job name of the form <label value>-<port>
    ## If no label is specified, the service name is used.
    ##
    # jobLabel: ""

    ## Label selector for services to which this ServiceMonitor applies
    ##
    # selector: {}

    ## Namespaces from which services are selected
    ##
    # namespaceSelector:
      ## Match any namespace
      ##
      # any: false

      ## Explicit list of namespace names to select
      ##
      # matchNames: []

    ## Endpoints of the selected service to be monitored
    ##
    # endpoints: []
      ## Name of the endpoint's service port
      ## Mutually exclusive with targetPort
      # - port: ""

      ## Name or number of the endpoint's target port
      ## Mutually exclusive with port
      # - targetPort: ""

      ## File containing bearer token to be used when scraping targets
      ##
      #   bearerTokenFile: ""

      ## Interval at which metrics should be scraped
      ##
      #   interval: 30s

      ## HTTP path to scrape for metrics
      ##
      #   path: /metrics

      ## HTTP scheme to use for scraping
      ##
      #   scheme: http

      ## TLS configuration to use when scraping the endpoint
      ##
      #   tlsConfig:

          ## Path to the CA file
          ##
          # caFile: ""

          ## Path to client certificate file
          ##
          # certFile: ""

          ## Skip certificate verification
          ##
          # insecureSkipVerify: false

          ## Path to client key file
          ##
          # keyFile: ""

          ## Server name used to verify host name
          ##
          # serverName: ""
@jkohen
Copy link
Contributor

jkohen commented Mar 20, 2019

The sidecar requires access to the Prometheus WAL, and it's stopping (crashing) because it can't find it. /data/wal is the default location. Please follow the installation instructions to make sure they match what your Prometheus instance is using: https://cloud.google.com/monitoring/kubernetes-engine/prometheus#configuration

Let me know if there is something else I can do to help.

@Pamir
Copy link
Contributor Author

Pamir commented Mar 20, 2019

Hi @jkohen
I deleted the sidecar from Prometheus crd. I double checked it, there is a directory in the path /prometheus/wal.
I have injected the sidecar with patch-operated.sh. PVC is mounted to /data path successfully and /data/wal was there.

sidecar mount

    - mountPath: /data
      name: prometheus-prometheus-prometheus-oper-prometheus-db

prometheus mount

    - mountPath: /prometheus
      name: prometheus-prometheus-prometheus-oper-prometheus-db

We have changed the argument in the script to line below

  • --prometheus.wal-directory=/data

this time prometheus pod initialized successfully but it stuck like the issue #83 on level=info ts=2019-01-26T15:29:42.652116635Z caller=manager.go:150 component="Prometheus reader" msg="Starting Prometheus reader..."

@jkohen
Copy link
Contributor

jkohen commented Mar 20, 2019

Glad it helped. The sidecar may take a few seconds to start, and then it will actually go silent, so those logs could be fine. Have you looked in Metrics Explorer whether metrics are showing up? See this https://cloud.google.com/monitoring/kubernetes-engine/prometheus#viewing_metrics

Also make sure that your Prometheus version is in the compatibility matrix: https://github.com/Stackdriver/stackdriver-prometheus-sidecar#compatibility

@Pamir
Copy link
Contributor Author

Pamir commented Mar 20, 2019

Hi @jkohen
Yes i have already looked the compatibility matrix. We are using prometheus 2.6 with sidecar 4.0-release.

In the metrics explorer i searched external/xxx but no chance.

@jkohen
Copy link
Contributor

jkohen commented Mar 20, 2019

@StevenYCChou you looked into #91 Can you help us diagnose this?

@Pamir can you include full logs from the sidecar? Can you share with us your project id and cluster name so we can take a second look? If you have a Cloud Support contract, please also contact us through that channel to ensure we have all the important information.

@Pamir
Copy link
Contributor Author

Pamir commented Mar 20, 2019

Hi @StevenYCChou @jkohen
I have just created a repo and deploy the sidecar from scratch with the same problem. Prometheus values.yaml and all necessary information is noted in the readme file

https://github.com/Pamir/stackdriver-prometheus-sidecar-configuration

@StevenYCChou
Copy link
Contributor

Thanks @Pamir for creating the repo with files. Let me look into your files above and I will get back to you if I have any question.

@StevenYCChou
Copy link
Contributor

Can you double check the version of prometheus you use? Why I ask this is because I see the repo you provided use prometheus-operator v0.26.0, and it uses prometheus v2.5.0 based on the promehtheus-operator release page.

Besides that, does your Prometheus server scrape the sidecar for metrics? Could you check the metrics
prometheus_sidecar_samples_processed and prometheus_sidecar_samples_produced report over time? That would tell us whether WAL is being processed.

If you can provide logs from prometheus server, that helps me to understand more about how Prometheus server is doing.

@Pamir
Copy link
Contributor Author

Pamir commented Mar 20, 2019

Hi,
I changed the prometheus tag to 2.6.1. It has to be compatible with other prometheus related stuff.

    image:
      repository: quay.io/prometheus/prometheus
      tag: v2.6.1

In prometheus there is no such metric.
prometheus logs

level=info ts=2019-03-20T16:06:29.250909879Z caller=main.go:243 msg="Starting Prometheus" version="(version=2.6.1, branch=HEAD, revision=b639fe140c1f71b2cbad3fc322b17efe60839e7e)"
level=info ts=2019-03-20T16:06:29.251305026Z caller=main.go:244 build_context="(go=go1.11.4, user=root@4c0e286fe2b3, date=20190115-19:12:04)"
level=info ts=2019-03-20T16:06:29.251463899Z caller=main.go:245 host_details="(Linux 4.14.91+ #1 SMP Wed Jan 23 21:34:58 PST 2019 x86_64 prometheus-prometheus-prometheus-oper-prometheus-0 (none))"
level=info ts=2019-03-20T16:06:29.251626047Z caller=main.go:246 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-03-20T16:06:29.251789764Z caller=main.go:247 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-03-20T16:06:29.264520992Z caller=main.go:561 msg="Starting TSDB ..."
level=info ts=2019-03-20T16:06:29.265462333Z caller=web.go:429 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-03-20T16:06:30.490804858Z caller=main.go:571 msg="TSDB started"
level=info ts=2019-03-20T16:06:30.490989808Z caller=main.go:631 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-03-20T16:06:30.49719739Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.498149038Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.498954619Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.499850854Z caller=kubernetes.go:201 component="discovery manager notify" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.527900759Z caller=main.go:657 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-03-20T16:06:30.527959489Z caller=main.go:530 msg="Server is ready to receive web requests."
level=info ts=2019-03-20T16:06:30.528009778Z caller=main.go:631 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-03-20T16:06:30.533542907Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.5345548Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.535641172Z caller=kubernetes.go:201 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-03-20T16:06:30.536791463Z caller=kubernetes.go:201 component="discovery manager notify" discovery=k8s msg="Using pod service account via in-cluster config"
level=error ts=2019-03-20T16:06:30.541168106Z caller=endpoints.go:130 component="discovery manager scrape" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=error ts=2019-03-20T16:06:30.541435562Z caller=endpoints.go:130 component="discovery manager scrape" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=error ts=2019-03-20T16:06:30.54164013Z caller=endpoints.go:130 component="discovery manager scrape" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=error ts=2019-03-20T16:06:30.546152267Z caller=endpoints.go:130 component="discovery manager notify" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=info ts=2019-03-20T16:06:30.564805845Z caller=main.go:657 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=warn ts=2019-03-20T16:06:38.618351787Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:38.620312289Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:38.621778884Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:38.623832773Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.335365054Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.336208356Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_cpu_utilisation:avg1m\nexpr: 1 - avg by(node) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.337040823Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_cpu_saturation_load1:'\nexpr: sum by(node) (node_load1{job=\"node-exporter\"} * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / node:node_num_cpu:sum\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.338014615Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_available:sum\nexpr: sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.338354231Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_total:sum\nexpr: sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"} * on(namespace,\n  pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.339454185Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_memory_utilisation:'\nexpr: 1 - sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.340136082Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_swap_io_bytes:sum_rate\nexpr: 1000 * sum by(node) ((rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m]) + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.340867685Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_utilisation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.341675329Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_saturation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.344275331Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_utilisation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:06:52.345621614Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_saturation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:08.617163993Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:08.618591981Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:08.619682874Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:08.620913891Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.336626325Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.337767863Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_cpu_utilisation:avg1m\nexpr: 1 - avg by(node) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.338931338Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_cpu_saturation_load1:'\nexpr: sum by(node) (node_load1{job=\"node-exporter\"} * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / node:node_num_cpu:sum\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.340594234Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_available:sum\nexpr: sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.341176062Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_total:sum\nexpr: sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"} * on(namespace,\n  pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.342860135Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_memory_utilisation:'\nexpr: 1 - sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.343732941Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_swap_io_bytes:sum_rate\nexpr: 1000 * sum by(node) ((rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m]) + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.344867066Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_utilisation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.345811613Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_saturation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.348830137Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_utilisation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:22.350438228Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_saturation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:38.618872594Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:38.620872902Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:38.622064202Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:38.623839436Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.335473188Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.336240421Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_cpu_utilisation:avg1m\nexpr: 1 - avg by(node) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.337014063Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_cpu_saturation_load1:'\nexpr: sum by(node) (node_load1{job=\"node-exporter\"} * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / node:node_num_cpu:sum\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.338037011Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_available:sum\nexpr: sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.338426698Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_total:sum\nexpr: sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"} * on(namespace,\n  pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.339497793Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_memory_utilisation:'\nexpr: 1 - sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.340179805Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_swap_io_bytes:sum_rate\nexpr: 1000 * sum by(node) ((rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m]) + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.341018583Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_utilisation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.341783284Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_saturation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.344472136Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_utilisation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:07:52.345879711Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_saturation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:08.617967287Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:08.619685384Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:08.62074718Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:08.62211876Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.336260835Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.337413798Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_cpu_utilisation:avg1m\nexpr: 1 - avg by(node) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.338575162Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_cpu_saturation_load1:'\nexpr: sum by(node) (node_load1{job=\"node-exporter\"} * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / node:node_num_cpu:sum\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.340188813Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_available:sum\nexpr: sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.340730892Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_total:sum\nexpr: sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"} * on(namespace,\n  pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.342336147Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_memory_utilisation:'\nexpr: 1 - sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.343190297Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_swap_io_bytes:sum_rate\nexpr: 1000 * sum by(node) ((rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m]) + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.344194325Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_utilisation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.345211592Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_saturation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.348256717Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_utilisation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:22.349838199Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_saturation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:38.617674911Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:38.619520163Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:38.620360174Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:38.621410921Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.336051488Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.33700368Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_cpu_utilisation:avg1m\nexpr: 1 - avg by(node) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.338175164Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_cpu_saturation_load1:'\nexpr: sum by(node) (node_load1{job=\"node-exporter\"} * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / node:node_num_cpu:sum\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.339700398Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_available:sum\nexpr: sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.340089576Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_total:sum\nexpr: sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"} * on(namespace,\n  pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.341625543Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_memory_utilisation:'\nexpr: 1 - sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.342535342Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_swap_io_bytes:sum_rate\nexpr: 1000 * sum by(node) ((rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m]) + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.343451628Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_utilisation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.34436346Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_saturation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.347085982Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_utilisation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:08:52.348515036Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_saturation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:08.617010359Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:08.618331414Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:08.619087583Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:08.620094535Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.336411127Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.337585007Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_cpu_utilisation:avg1m\nexpr: 1 - avg by(node) (rate(node_cpu_seconds_total{job=\"node-exporter\",mode=\"idle\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.338723522Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_cpu_saturation_load1:'\nexpr: sum by(node) (node_load1{job=\"node-exporter\"} * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / node:node_num_cpu:sum\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.340377835Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_available:sum\nexpr: sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.340997528Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_bytes_total:sum\nexpr: sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"} * on(namespace,\n  pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.342647043Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: 'node:node_memory_utilisation:'\nexpr: 1 - sum by(node) ((node_memory_MemFree_bytes{job=\"node-exporter\"} + node_memory_Cached_bytes{job=\"node-exporter\"}\n  + node_memory_Buffers_bytes{job=\"node-exporter\"}) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:) / sum by(node) (node_memory_MemTotal_bytes{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.343794386Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_memory_swap_io_bytes:sum_rate\nexpr: 1000 * sum by(node) ((rate(node_vmstat_pgpgin{job=\"node-exporter\"}[1m]) + rate(node_vmstat_pgpgout{job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.344854866Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_utilisation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.345756683Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_disk_saturation:avg_irate\nexpr: avg by(node) (irate(node_disk_io_time_weighted_seconds_total{device=~\"nvme.+|rbd.+|sd.+|vd.+|xvd.+\",job=\"node-exporter\"}[1m])\n  / 1000 * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.34840504Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_utilisation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_bytes_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:22.350135452Z caller=manager.go:414 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_net_saturation:sum_irate\nexpr: sum by(node) ((irate(node_network_receive_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m])\n  + irate(node_network_transmit_drop_total{device!~\"veth.+\",job=\"node-exporter\"}[1m]))\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:)\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:38.61839528Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:38.61995211Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:38.620874972Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:09:38.622114543Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:08.618317398Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:08.620081391Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:08.621141484Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:08.622398953Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:38.61840961Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_cpu_usage_seconds_total:sum_rate\nexpr: sum by(namespace, label_name) (sum by(namespace, pod_name) (rate(container_cpu_usage_seconds_total{container_name!=\"\",image!=\"\",job=\"kubelet\"}[5m]))\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:38.619893969Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:container_memory_usage_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(pod_name, namespace) (container_memory_usage_bytes{container_name!=\"\",image!=\"\",job=\"kubelet\"})\n  * on(namespace, pod_name) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:38.620818598Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_memory_bytes:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_memory_bytes{job=\"kube-state-metrics\"})\n  * on(namespace, pod) group_left(label_name) label_replace(kube_pod_labels{job=\"kube-state-metrics\"},\n  \"pod_name\", \"$1\", \"pod\", \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:10:38.622523861Z caller=manager.go:414 component="rule manager" group=k8s.rules msg="Evaluating rule failed" rule="record: namespace_name:kube_pod_container_resource_requests_cpu_cores:sum\nexpr: sum by(namespace, label_name) (sum by(namespace, pod) (kube_pod_container_resource_requests_cpu_cores{job=\"kube-state-metrics\"}\n  and on(pod) kube_pod_status_scheduled{condition=\"true\"}) * on(namespace, pod) group_left(label_name)\n  label_replace(kube_pod_labels{job=\"kube-state-metrics\"}, \"pod_name\", \"$1\", \"pod\",\n  \"(.*)\"))\n" err="many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2019-03-20T16:19:00.622211142Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 3211 (3992)"
level=warn ts=2019-03-20T16:22:26.629778436Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 3227 (4637)"
level=warn ts=2019-03-20T16:23:34.603738421Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 3227 (4854)"
level=warn ts=2019-03-20T16:32:23.605593754Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 5578 (6515)"
level=warn ts=2019-03-20T16:33:13.609910007Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 6438 (6673)"
level=warn ts=2019-03-20T16:45:53.658919976Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 8102 (9067)"
level=warn ts=2019-03-20T16:46:57.665994312Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 8744 (9270)"
level=warn ts=2019-03-20T16:48:47.667374914Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 8744 (9615)"
level=warn ts=2019-03-20T16:55:53.680380316Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 10856 (10953)"
level=warn ts=2019-03-20T17:00:17.676745847Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 10657 (11784)"
level=warn ts=2019-03-20T17:03:43.696953478Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 11201 (12428)"
level=warn ts=2019-03-20T17:04:52.690288969Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 12542 (12644)"
level=warn ts=2019-03-20T17:12:29.720012592Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 14020 (14085)"
level=warn ts=2019-03-20T17:16:29.720175616Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 14235 (14837)"
level=warn ts=2019-03-20T17:18:24.711108172Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 13368 (15198)"
level=warn ts=2019-03-20T17:25:51.737177395Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 15669 (16602)"
level=warn ts=2019-03-20T17:31:05.733672162Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 16784 (17591)"
level=warn ts=2019-03-20T17:39:43.842440249Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 19009 (19225)"
level=warn ts=2019-03-20T17:42:55.781859788Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 19009 (19828)"
level=warn ts=2019-03-20T17:46:25.762412517Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 19185 (20488)"
level=warn ts=2019-03-20T17:48:18.91702912Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 20812 (20843)"
level=warn ts=2019-03-20T17:55:00.776387192Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 22075 (22102)"
level=warn ts=2019-03-20T17:57:18.803126854Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 21414 (22538)"
level=warn ts=2019-03-20T18:08:04.80116027Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 23697 (24577)"
level=warn ts=2019-03-20T18:08:53.164768568Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 24683 (24736)"
level=warn ts=2019-03-20T18:12:08.838306619Z caller=glog.go:86 component=k8s_client_runtime func=Warningf msg="/app/discovery/kubernetes/kubernetes.go:300: watch of *v1.Endpoints ended with: too old resource version: 24683 (25351)"

@Pamir
Copy link
Contributor Author

Pamir commented Mar 20, 2019

Hi @StevenYCChou
With the latest version on the stable charts on github prometheus-operator repo with the configuration below again it does not effect the output of sidecar.

[pamir@Pamirs-MBP prom-sd (⎈ |gke_academic-pier-217920_us-central1-a_standard-cluster-1:monitoring)]$ kubectl get pods -o yaml prometheus-prometheus-prometheus-oper-prometheus-0  | grep image
    image: quay.io/prometheus/prometheus:v2.6.1
    imagePullPolicy: IfNotPresent
    image: quay.io/coreos/prometheus-config-reloader:v0.26.0
    imagePullPolicy: IfNotPresent
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:release-0.4.0
    imagePullPolicy: Always
    image: quay.io/coreos/configmap-reload:v0.0.1
    imagePullPolicy: IfNotPresent
    image: quay.io/prometheus/prometheus:v2.6.1
    imageID: docker-pullable://quay.io/prometheus/prometheus@sha256:4c011102738c6f61d51a1beea141756463c8257ab3a8175d60a09d1bc1a15c89
    image: quay.io/coreos/prometheus-config-reloader:v0.26.0
    imageID: docker-pullable://quay.io/coreos/prometheus-config-reloader@sha256:1c4f36961c5296e5189d89ed8603791222ecfa9ed90d5dc910036ff3e3a462b8
    image: quay.io/coreos/configmap-reload:v0.0.1
    imageID: docker-pullable://quay.io/coreos/configmap-reload@sha256:e2fd60ff0ae4500a75b80ebaa30e0e7deba9ad107833e8ca53f0047c42c5a057
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:release-0.4.0
    imageID: docker-pullable://gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar@sha256:c9a724f6d70403996700acc88d93feedb7941d41cb6637e50c436f6bb7823406

@jkohen
Copy link
Contributor

jkohen commented Mar 20, 2019

I see a lot of errors in your Prometheus logs. Is Prometheus working well? What are some working metrics?

Do you have any metrics that aren't recording rules? There are some limitations for recording rules at the time: https://cloud.google.com/monitoring/kubernetes-engine/prometheus#prometheus_integration_issues

@Pamir
Copy link
Contributor Author

Pamir commented Mar 22, 2019

Hi @jkohen
Recording rules come from stable/prometheus-operator. I am going to fix it and tell you the result.

@StevenYCChou
Copy link
Contributor

Hi @Pamir, we added debug logs with #110, and it is included in release v0.4.1 . If there is no data shown up in Stackdriver, you can turn on the debug logging by following instruction in the section "No data shows up in Stackdriver" of https://cloud.google.com/monitoring/kubernetes-engine/prometheus#prometheus_integration_issues, and feel free to share the debugging info with us.

@matthewgoslett
Copy link

I also ran into the retrieve last checkpoint: open /prometheus/wal error today with the pod crashing.

I'm running the full https://github.com/coreos/kube-prometheus stack on GKE. I've fixed my version to v2.6.1 and have mounted a PVC at /prometheus. The wal definitely exists in there and contains data.

For abbrevity, I've only included my prometheus/k8s custom resource.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
retention: 30d
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.6.1
  storage:
    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: standard
  containers:
  - name: stackdriver
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.4.3
    imagePullPolicy: Always
    args: [
      "--stackdriver.project-id=xxxxxxxxxxx",
      "--prometheus.wal-directory=/prometheus/wal",
      "--stackdriver.kubernetes.location=europe-west1",
      "--stackdriver.kubernetes.cluster-name=xxxxxxxxxxx"
    ]
    ports:
      - name: stackdriver
        containerPort: 9091
    volumeMounts:
      - name: prometheus-k8s-db
        mountPath: /prometheus

If I change the wal-directory to /prometheus instead of /prometheus/wal, the container boots just fine, although I suspect this is probably incorrect.

Here is a a log from the container

➜  manifests git:(master) ✗ kubectl logs -f prometheus-k8s-0 stackdriver -n monitoring
level=info ts=2019-06-29T21:01:57.11620962Z caller=main.go:296 msg="Starting Stackdriver Prometheus sidecar" version="(version=HEAD, branch=master, revision=453838cff46ee8a17f7675696a97256475bb39e7)"
level=info ts=2019-06-29T21:01:57.116309721Z caller=main.go:297 build_context="(go=go1.12, user=kbuilder@kokoro-gcp-ubuntu-prod-1535194210, date=20190520-14:47:15)"
level=info ts=2019-06-29T21:01:57.116345497Z caller=main.go:298 host_details="(Linux 4.14.127+ #1 SMP Tue Jun 18 18:32:10 PDT 2019 x86_64 prometheus-k8s-0 (none))"
level=info ts=2019-06-29T21:01:57.116373467Z caller=main.go:299 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-06-29T21:01:57.13540026Z caller=main.go:564 msg="Web server started"
level=info ts=2019-06-29T21:01:57.141948721Z caller=main.go:545 msg="Stackdriver client started"
level=info ts=2019-06-29T21:03:00.169674295Z caller=manager.go:153 component="Prometheus reader" msg="Starting Prometheus reader..."

I tried launching with -log.level=debug as per the docs, but it crashes with '-l' no short hand argument. I imagine this should be --log.level=debug? although this doesn't add any additional information to the log output.

I'm not able to see any prom data reporting into SD.

@StevenYCChou
Copy link
Contributor

Hi @matthewgoslett,

Thanks for the report. I haven't used kube-prometheus yet, so I need to gather some further information.

  1. How do you specific where Prometheus write to its storage, TSDB? prometheus.wal-directory should be Prometheus' storage.tsdb.path + '/wal'. The default tsdb path is data if you do not specify the config for Prometheus(see https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects).

Can you check where your Prometheus stores data?

  1. Can you give me more detail where do you apply -log.level=debug? I expect it be one of the parameter for sidecar. For example:
- name: stackdriver
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.4.3
    imagePullPolicy: Always
    args: [
      "--stackdriver.project-id=xxxxxxxxxxx",
      "--prometheus.wal-directory=/prometheus/wal",
      "--stackdriver.kubernetes.location=europe-west1",
      "--stackdriver.kubernetes.cluster-name=xxxxxxxxxxx"
      "-log.level=debug"
    ]
   ....

@jackge007
Copy link

Hi @StevenYCChou
--log.level=debug
--prometheus.wal-directory=/prometheus/wal

with Prometheus Image: quay.io/prometheus/prometheus:v2.7.1

level=info ts=2019-07-29T09:21:09.70048798Z caller=main.go:298 host_details="(Linux 4.14.91+ #1 SMP Wed Jan 23 21:34:58 PST 2019 x86_64 prometheus-monitoring-prometheus-oper-prometheus-0 (none))"
level=info ts=2019-07-29T09:21:09.70050804Z caller=main.go:299 fd_limits="(soft=1048576, hard=1048576)"
level=error ts=2019-07-29T09:21:09.703322543Z caller=main.go:385 msg="Tailing WAL failed" err="retrieve last checkpoint: open /prometheus/wal: no such file or directory"

@jkohen jkohen added p1 Priority P1 question Further information is requested labels Aug 6, 2019
@dgdevops
Copy link

dgdevops commented Apr 29, 2020

Hello @Pamir,
I stumbled upon this ticket while I was struggling setting up the stackdriver-sidecar.
I also had exactly the same issue ( /prometheus/wal: no such file or directory") and I successfully found the solution (accidentally) and my prometheus-operator pod with all containers including the sidecar came up properly and I could see the metrics in Google Stackdriver right away.

In my prometheus-operator HELM config I had 'subPath' with value 'prometheus-db' specified at the volumeMounts section of the configuration and due to that the 'prometheus-prometheus-operator-prometheus-db' volume was mounted to /prometheus/prometheus-db so in the wal directory variable I had to specify the extended path like this: --prometheus.wal-directory=/prometheus/prometheus-db/wal

After changing the wal directory path from /prometheus/wal to /prometheus/prometheus-db/wal the sidecar came up perfectly and the metrics from Prometheus were sent to the Stackdriver.

As I mentioned I found this out completely by accident as I was trying to 'shell' into the sidecar container while running sidecar version 0.7.3, however I could not as 'sh' was not found in the container so as a test I did a re-patch but with using way older sidecar version (ex: 0.3.2).
After the patching was complete and the new sidecar with version 0.3.2 was deployed I could successfully 'shell' into the sidecar container and check the prometheus volume.
Voila! I could see the prometheus volume being mounted to /prometheus/prometheus-db/ with its wal directory in it.

Hope it helps!
Daniel

@StevenYCChou StevenYCChou removed the p1 Priority P1 label May 6, 2020
@nagarjuna90
Copy link

Hi

My Prometheus Pod and sidecar container are running but I don't see any metrics in stack diver related to Prometheus.
When I run logs against the pod with side car container I get below

level=info ts=2020-07-14T16:23:49.975Z caller=main.go:293 msg="Starting Stackdriver Prometheus sidecar" version="(version=0.7.5, branch=master, revision=c8c0bfb1a5e22f5838eb6bb86608b29ef0eca0ef)"
level=info ts=2020-07-14T16:23:49.975Z caller=main.go:294 build_context="(go=go1.12, user=kbuilder@kokoro-gcp-ubuntu-prod-289585118, date=20200616-14:24:15)"
level=info ts=2020-07-14T16:23:49.975Z caller=main.go:295 host_details="(Linux 4.14.138+ #1 SMP Tue Sep 3 02:58:08 PDT 2019 x86_64 prometheus-655f94f86c-cg6pm (none))"
level=info ts=2020-07-14T16:23:49.975Z caller=main.go:296 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-07-14T16:23:49.979Z caller=main.go:598 msg="Web server started"
level=info ts=2020-07-14T16:23:49.980Z caller=main.go:567 msg="Stopping Prometheus reader..."
level=info ts=2020-07-14T16:23:49.980Z caller=queue_manager.go:221 component=queue_manager msg="Stopping remote storage..."

@igorpeshansky
Copy link
Member

The prometheus sidecar is no longer a recommended Google Cloud solution. It has been superseded by Google Cloud Managed Service for Prometheus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants