Skip to content

Commit

Permalink
Merge branch 'master' into add-aliases
Browse files Browse the repository at this point in the history
* master:
  Refactor cluster info collector (#536)
  Fix linting that was missed in CI run (#568)
  Grafana dashboard: use new node exporter metric names (#501)
  publish total shards on a node (#535)
  Add additional collector for SLM stats (#558)
  Update common Prometheus files (#565)
  Update build (#562)

Signed-off-by: Steven Cipriano <cipriano@squareup.com>
  • Loading branch information
bobo333 committed May 18, 2022
2 parents 4232fd7 + 9ece896 commit 9ec16c7
Show file tree
Hide file tree
Showing 18 changed files with 1,045 additions and 26 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ executors:
# This must match .promu.yml.
golang:
docker:
- image: circleci/golang:1.17
- image: cimg/go:1.18
jobs:
test:
executor: golang
Expand Down
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "gomod"
directory: "/"
schedule:
interval: "monthly"
3 changes: 3 additions & 0 deletions .github/workflows/golangci-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ jobs:
uses: actions/setup-go@v2
with:
go-version: 1.18.x
- name: Install snmp_exporter/generator dependencies
run: sudo apt-get update && sudo apt-get -y install libsnmp-dev
if: github.repository == 'prometheus/snmp_exporter'
- name: Lint
uses: golangci/golangci-lint-action@v3.1.0
with:
Expand Down
2 changes: 1 addition & 1 deletion .promu.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
go:
# This must match .circle/config.yml.
version: 1.17
version: 1.18
repository:
path: github.com/prometheus-community/elasticsearch_exporter
build:
Expand Down
28 changes: 28 additions & 0 deletions .yamllint
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
extends: default

rules:
braces:
max-spaces-inside: 1
level: error
brackets:
max-spaces-inside: 1
level: error
commas: disable
comments: disable
comments-indentation: disable
document-start: disable
indentation:
spaces: consistent
indent-sequences: consistent
key-duplicates:
ignore: |
config/testdata/section_key_dup.bad.yml
line-length: disable
truthy:
ignore: |
.github/workflows/codeql-analysis.yml
.github/workflows/funcbench.yml
.github/workflows/fuzzing.yml
.github/workflows/prombench.yml
.github/workflows/golangci-lint.yml
4 changes: 2 additions & 2 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## Prometheus Community Code of Conduct
# Prometheus Community Code of Conduct

Prometheus follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
Prometheus follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md).
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ elasticsearch_exporter --help
| es.no-aliases | 1.0.4rc1 | If true, exclude informational aliases metrics. | false |
| es.shards | 1.0.3rc1 | If true, query stats for all indices in the cluster, including shard-level stats (implies `es.indices=true`). | false |
| es.snapshots | 1.0.4rc1 | If true, query stats for the cluster snapshots. | false |
| es.slm | | If true, query stats for SLM. | false |
| es.timeout | 1.0.2 | Timeout for trying to get stats from Elasticsearch. (ex: 20s) | 5s |
| es.ca | 1.0.2 | Path to PEM file that contains trusted Certificate Authorities for the Elasticsearch connection. | |
| es.client-private-key | 1.0.2 | Path to PEM file that contains the private key for client auth when connecting to Elasticsearch. | |
Expand Down Expand Up @@ -87,6 +88,7 @@ es.indices | `indices` `monitor` (per index or `*`) | All actions that are requi
es.indices_settings | `indices` `monitor` (per index or `*`) |
es.shards | not sure if `indices` or `cluster` `monitor` or both |
es.snapshots | `cluster:admin/snapshot/status` and `cluster:admin/repository/get` | [ES Forum Post](https://discuss.elastic.co/t/permissions-for-backup-user-with-x-pack/88057)
es.slm | `read_slm`

Further Information
- [Build in Users](https://www.elastic.co/guide/en/elastic-stack-overview/7.3/built-in-users.html)
Expand Down Expand Up @@ -222,6 +224,23 @@ Further Information
| elasticsearch_clusterinfo_last_retrieval_success_ts | gauge | 1 | Timestamp of the last successful cluster info retrieval
| elasticsearch_clusterinfo_up | gauge | 1 | Up metric for the cluster info collector
| elasticsearch_clusterinfo_version_info | gauge | 6 | Constant metric with ES version information as labels
| elasticsearch_slm_stats_up | gauge | 0 | Up metric for SLM collector
| elasticsearch_slm_stats_total_scrapes | counter | 0 | Number of scrapes for SLM collector
| elasticsearch_slm_stats_json_parse_failures | counter | 0 | JSON parse failures for SLM collector
| elasticsearch_slm_stats_retention_runs_total | counter | 0 | Total retention runs
| elasticsearch_slm_stats_retention_failed_total | counter | 0 | Total failed retention runs
| elasticsearch_slm_stats_retention_timed_out_total | counter | 0 | Total retention run timeouts
| elasticsearch_slm_stats_retention_deletion_time_seconds | gauge | 0 | Retention run deletion time
| elasticsearch_slm_stats_total_snapshots_taken_total | counter | 0 | Total snapshots taken
| elasticsearch_slm_stats_total_snapshots_failed_total | counter | 0 | Total snapshots failed
| elasticsearch_slm_stats_total_snapshots_deleted_total | counter | 0 | Total snapshots deleted
| elasticsearch_slm_stats_total_snapshots_failed_total | counter | 0 | Total snapshots failed
| elasticsearch_slm_stats_snapshots_taken_total | counter | 1 | Snapshots taken by policy
| elasticsearch_slm_stats_snapshots_failed_total | counter | 1 | Snapshots failed by policy
| elasticsearch_slm_stats_snapshots_deleted_total | counter | 1 | Snapshots deleted by policy
| elasticsearch_slm_stats_snapshot_deletion_failures_total | counter | 1 | Snapshot deletion failures by policy
| elasticsearch_slm_stats_operation_mode | gauge | 1 | SLM operation mode (Running, stopping, stopped)


### Alerts & Recording Rules

Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
The Prometheus security policy, including how to report vulnerabilities, can be
found here:

https://prometheus.io/docs/operating/security/
<https://prometheus.io/docs/operating/security/>
109 changes: 109 additions & 0 deletions collector/cluster_info.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
// Copyright 2022 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package collector

import (
"context"
"encoding/json"
"io/ioutil"
"net/http"
"net/url"

"github.com/blang/semver"
"github.com/go-kit/log"
"github.com/prometheus/client_golang/prometheus"
)

func init() {
registerCollector("cluster-info", defaultEnabled, NewClusterInfo)
}

type ClusterInfoCollector struct {
logger log.Logger
u *url.URL
hc *http.Client
}

func NewClusterInfo(logger log.Logger, u *url.URL, hc *http.Client) (Collector, error) {
return &ClusterInfoCollector{
logger: logger,
u: u,
hc: hc,
}, nil
}

var clusterInfoDesc = map[string]*prometheus.Desc{
"version": prometheus.NewDesc(
prometheus.BuildFQName(namespace, "", "version"),
"Elasticsearch version information.",
[]string{
"cluster",
"cluster_uuid",
"build_date",
"build_hash",
"version",
"lucene_version",
},
nil,
),
}

// ClusterInfoResponse is the cluster info retrievable from the / endpoint
type ClusterInfoResponse struct {
Name string `json:"name"`
ClusterName string `json:"cluster_name"`
ClusterUUID string `json:"cluster_uuid"`
Version VersionInfo `json:"version"`
Tagline string `json:"tagline"`
}

// VersionInfo is the version info retrievable from the / endpoint, embedded in ClusterInfoResponse
type VersionInfo struct {
Number semver.Version `json:"number"`
BuildHash string `json:"build_hash"`
BuildDate string `json:"build_date"`
BuildSnapshot bool `json:"build_snapshot"`
LuceneVersion semver.Version `json:"lucene_version"`
}

func (c *ClusterInfoCollector) Update(ctx context.Context, ch chan<- prometheus.Metric) error {
resp, err := c.hc.Get(c.u.String())
if err != nil {
return err
}
defer resp.Body.Close()
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
return err
}
var info ClusterInfoResponse
err = json.Unmarshal(b, &info)
if err != nil {
return err
}

ch <- prometheus.MustNewConstMetric(
clusterInfoDesc["version"],
prometheus.GaugeValue,
1,
info.ClusterName,
info.ClusterUUID,
info.Version.BuildDate,
info.Version.BuildHash,
info.Version.Number.String(),
info.Version.LuceneVersion.String(),
)

return nil
}
127 changes: 122 additions & 5 deletions collector/collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ package collector

import (
"context"
"errors"
"fmt"
"net/http"
"net/url"
"sync"
Expand All @@ -24,10 +26,26 @@ import (
"github.com/go-kit/log"
"github.com/go-kit/log/level"
"github.com/prometheus/client_golang/prometheus"
"gopkg.in/alecthomas/kingpin.v2"
)

// Namespace defines the common namespace to be used by all metrics.
const namespace = "elasticsearch"
const (
// Namespace defines the common namespace to be used by all metrics.
namespace = "elasticsearch"

defaultEnabled = true
// defaultDisabled = false
)

type factoryFunc func(logger log.Logger, u *url.URL, hc *http.Client) (Collector, error)

var (
factories = make(map[string]factoryFunc)
initiatedCollectorsMtx = sync.Mutex{}
initiatedCollectors = make(map[string]Collector)
collectorState = make(map[string]*bool)
forcedCollectors = map[string]bool{} // collectors which have been explicitly enabled or disabled
)

var (
scrapeDurationDesc = prometheus.NewDesc(
Expand All @@ -50,16 +68,92 @@ type Collector interface {
Update(context.Context, chan<- prometheus.Metric) error
}

func registerCollector(name string, isDefaultEnabled bool, createFunc factoryFunc) {
var helpDefaultState string
if isDefaultEnabled {
helpDefaultState = "enabled"
} else {
helpDefaultState = "disabled"
}

// Create flag for this collector
flagName := fmt.Sprintf("collector.%s", name)
flagHelp := fmt.Sprintf("Enable the %s collector (default: %s).", name, helpDefaultState)
defaultValue := fmt.Sprintf("%v", isDefaultEnabled)

flag := kingpin.Flag(flagName, flagHelp).Default(defaultValue).Action(collectorFlagAction(name)).Bool()
collectorState[name] = flag

// Register the create function for this collector
factories[name] = createFunc
}

type ElasticsearchCollector struct {
Collectors map[string]Collector
logger log.Logger
esURL *url.URL
httpClient *http.Client
}

type Option func(*ElasticsearchCollector) error

// NewElasticsearchCollector creates a new ElasticsearchCollector
func NewElasticsearchCollector(logger log.Logger, httpClient *http.Client, esURL *url.URL) (*ElasticsearchCollector, error) {
func NewElasticsearchCollector(logger log.Logger, filters []string, options ...Option) (*ElasticsearchCollector, error) {
e := &ElasticsearchCollector{logger: logger}
// Apply options to customize the collector
for _, o := range options {
if err := o(e); err != nil {
return nil, err
}
}

f := make(map[string]bool)
for _, filter := range filters {
enabled, exist := collectorState[filter]
if !exist {
return nil, fmt.Errorf("missing collector: %s", filter)
}
if !*enabled {
return nil, fmt.Errorf("disabled collector: %s", filter)
}
f[filter] = true
}
collectors := make(map[string]Collector)
initiatedCollectorsMtx.Lock()
defer initiatedCollectorsMtx.Unlock()
for key, enabled := range collectorState {
if !*enabled || (len(f) > 0 && !f[key]) {
continue
}
if collector, ok := initiatedCollectors[key]; ok {
collectors[key] = collector
} else {
collector, err := factories[key](log.With(logger, "collector", key), e.esURL, e.httpClient)
if err != nil {
return nil, err
}
collectors[key] = collector
initiatedCollectors[key] = collector
}
}

e.Collectors = collectors

return e, nil
}

return &ElasticsearchCollector{Collectors: collectors, logger: logger}, nil
func WithElasticsearchURL(esURL *url.URL) Option {
return func(e *ElasticsearchCollector) error {
e.esURL = esURL
return nil
}
}

func WithHTTPClient(hc *http.Client) Option {
return func(e *ElasticsearchCollector) error {
e.httpClient = hc
return nil
}
}

// Describe implements the prometheus.Collector interface.
Expand Down Expand Up @@ -89,7 +183,11 @@ func execute(ctx context.Context, name string, c Collector, ch chan<- prometheus
var success float64

if err != nil {
_ = level.Error(logger).Log("msg", "collector failed", "name", name, "duration_seconds", duration.Seconds(), "err", err)
if IsNoDataError(err) {
_ = level.Debug(logger).Log("msg", "collector returned no data", "name", name, "duration_seconds", duration.Seconds(), "err", err)
} else {
_ = level.Error(logger).Log("msg", "collector failed", "name", name, "duration_seconds", duration.Seconds(), "err", err)
}
success = 0
} else {
_ = level.Debug(logger).Log("msg", "collector succeeded", "name", name, "duration_seconds", duration.Seconds())
Expand All @@ -98,3 +196,22 @@ func execute(ctx context.Context, name string, c Collector, ch chan<- prometheus
ch <- prometheus.MustNewConstMetric(scrapeDurationDesc, prometheus.GaugeValue, duration.Seconds(), name)
ch <- prometheus.MustNewConstMetric(scrapeSuccessDesc, prometheus.GaugeValue, success, name)
}

// collectorFlagAction generates a new action function for the given collector
// to track whether it has been explicitly enabled or disabled from the command line.
// A new action function is needed for each collector flag because the ParseContext
// does not contain information about which flag called the action.
// See: https://github.com/alecthomas/kingpin/issues/294
func collectorFlagAction(collector string) func(ctx *kingpin.ParseContext) error {
return func(ctx *kingpin.ParseContext) error {
forcedCollectors[collector] = true
return nil
}
}

// ErrNoData indicates the collector found no data to collect, but had no other error.
var ErrNoData = errors.New("collector returned no data")

func IsNoDataError(err error) bool {
return err == ErrNoData
}
Loading

0 comments on commit 9ec16c7

Please sign in to comment.