Add Elasticsearch 5.x output #2332

lpic10 · 2017-01-27T20:20:55Z

This PR is based on #1875 with several changes:

It supports only ES 5.x
Management of shards/replicas removed
A list of hosts to connect can now be used (similar to influxdb config)
Allows a bit better template management
Index name per time frame is now working properly, based on the metric timestamp
Tags are now keyword fields inside a tag object
Metrics are now fields inside an object named as the input plugin
Basic HTTP auth support (not tested)
Uses ES bulk API for indexing

Comments/reviews are welcome

Required for all PRs:

CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
Sign CLA (if not already signed)
README.md updated (if adding a new plugin)

sparrc · 2017-01-27T23:30:20Z

plugins/outputs/elasticsearch/elasticsearch.go

+  # username = "telegraf"
+  # password = "mypassword"
+
+  # Index Config


when is this evaluated? only when telegraf start? or each metric "batch" under the same timestamp gets added to it's own index? (add that to the doc)

Well, the evaluation is done currently per metric inside the batch... there is a comment in the code about that.
But... is it always the case that a batch of metrics all have same timestamp? If yes, I will update this to do the evaluation once per batch only... and a note in the doc

nope, that is definitely not guaranteed, please leave it as-is!

but also please document that the index is set per-metric in the README and sample config

Ok, I will send later today

I've added additional information, let me know if it is enough.

sparrc · 2017-01-27T23:31:46Z

plugins/outputs/elasticsearch/elasticsearch.go

+  # %H - hour (00..23)
+  index_name = "telegraf-%Y.%m.%d" # required.
+
+  ## Template Config


can you provide more information on the template management? doesn't have to be here, maybe in the README would be sufficient.

links to elasticsearch documentation would be extra good.

Yes, template management is about creating a proper template for the index name provided. I will add more info in the doc... These config option names are inspired on similar logstash config options that should be familiar to who already uses ES

sparrc · 2017-01-27T23:32:23Z

plugins/outputs/elasticsearch/elasticsearch.go

+	// warn about ES version
+	if i, err := strconv.Atoi(strings.Split(esVersion, ".")[0]); err == nil {
+		if i < 5 {
+			log.Println("W! Elasticsearch version not supported: " + esVersion)


should this be an error? should we continue with the plugin or just return here?

Not sure here, I didn't test to really know it is not going to work on older versions. Maybe someone wants to take the risk, so there's the warning. But I agree to return if you think it is safer.

I'd prefer to return and not bother trying to support older versions of elastic

sparrc · 2017-01-28T00:45:33Z

@lpic10 Forgive an ES noob if this doesn't make sense, but should we be taking advantage of elasticsearch types? should we have a configurable _type field added to each metric?

lpic10 · 2017-01-28T01:11:54Z

I thought about keeping a different _type per input, but I don't really see an advantage.
In a similar way, metricbeat also does not use different _type fields in its metrics.

(update)
The _type field is not useful for having different mappings in same index
https://www.elastic.co/blog/index-vs-type

berglh · 2017-01-30T04:28:48Z

Just a couple of things I noticed here. In the documentation example you give field [tag]. In elasticsearch, the [tags] field is an array for adding tags. I think this is kind of confusing, considering I think of tags in elasticsearch as an array of strings and here you are providing key value pairs. Perhaps a better name for disambiguity? Perhaps putting it in an object called "[telegraf]"? elasticsearch arrays (tags)

Another convention in any elasticsearch output for log shippers is to have the ability to add arbitrary fields in the config. So the user can specify [type] => "flying_goats_counter", "[environment]" => "Production", "[host] => "telegraf_hostname". In the example of filebeat, a lightweight shipper written in go, adding fields help reduce complexity further down the line. I guess another consideration is that some people might want to actually send their metrics via logstash for further enrichment, but that is the topic of another output plugin perhaps. Anyway, an example in filebeat filebeat fields

Great work.

lpic10 · 2017-01-30T09:26:29Z

Hi @berglh , thanks for the interest, hope you find this plugin useful.

In elasticsearch, the [tags] field is an array for adding tags.

Actually, tags array comes from logstash. I don't see this being an issue because logstash and telegraf will use different indexes.

As you mentioned, telegraf tags model is a key value structure.

Perhaps a better name for disambiguity? Perhaps putting it in an object called "[telegraf]"?

I considered other names and I choose tag name for its simplicity. It already says what is inside (a tag) and it is short to not annoy me when creating a query, eg. a "host" template in grafafa:
{"find": "terms", "field": "tag.host"}

If I set some global tags on telegraf, let's say "cluster": "ha-cluster1" and "dc": "west" I think it makes sense a query like: tag.cluster: ha-cluster1 AND tag.dc: west. If you replace tag by something else I think it gets more confusing.

I can make the name of this tag object configurable if this is needed, but I don't think it would be a good idea because it would make more difficult to share grafana dashboards as the queries would differ.

Another convention in any elasticsearch output for log shippers is to have the ability to add arbitrary fields in the config. So the user can specify [type] => "flying_goats_counter", "[environment]" => "Production", "[host] => "telegraf_hostname

In telegraf you can add tags in the [global_tags] config declaration. In the example above, this would be:

[global_tags]
  dc = "west"
  cluster = "ha-cluster1"

I guess another consideration is that some people might want to actually send their metrics via logstash for further enrichment

Probably this is already possible, as logstash supports a graylog input: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-gelf.html, or it can read from a telegraf file output.

berglh · 2017-01-30T09:41:13Z

Thanks for the verbose reply and clarity on add fields, makes sense from an influxdata perspective. Looking forward to this feature, cheers.

…

On 30 Jan 2017 19:26, "lpic" ***@***.***> wrote: Hi @berglh <https://github.com/berglh> , thanks for the interest, hope you find this plugin useful. In elasticsearch, the [tags] field is an array for adding tags. Actually, tags array comes from logstash. I don't see this being an issue because logstash and telegraf will use different indexes. As you mentioned, telegraf tags model is a key value structure. Perhaps a better name for disambiguity? Perhaps putting it in an object called "[telegraf]"? I considered other names and I choose tag name for its simplicity. It already says what is inside (a tag) and it is short to not annoy me when creating a query, eg. a "host" template in grafafa: {"find": "terms", "field": "tag.host"} If I set some global tags on telegraf, let's say "cluster": "ha-cluster1" and "dc": "west" I think it makes sense a query like: tag.cluster: ha-cluster1 AND tag.dc: west. If you replace tag by something else I think it gets more confusing. I can make the name of this tag object configurable if this is needed, but I don't think it would be a good idea because it would make more difficult to share grafana dashboards as the queries would differ. Another convention in any elasticsearch output for log shippers is to have the ability to add arbitrary fields in the config. So the user can specify [type] => "flying_goats_counter", "[environment]" => "Production", "[host] => "telegraf_hostname In telegraf you can add tags in the [global_tags] config declaration. In the example above, this would be: [global_tags] dc = "west" cluster = "ha-cluster1" I guess another consideration is that some people might want to actually send their metrics via logstash for further enrichment Probably this is already possible, as logstash supports a graylog input: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-gelf.html, or it can read from a telegraf file output. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2332 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AL4NqsJYmpYqevwiR4xxwpZcNVDqd42-ks5rXazLgaJpZM4LwLIF> .

sparrc

questions about the template

sparrc · 2017-01-30T17:48:49Z

plugins/outputs/elasticsearch/elasticsearch.go

+		if (a.OverwriteTemplate) || (!templateExists) {
+			// Create or update the template
+			tmpl := fmt.Sprintf(`
+			{ "template":"%s*",


should you include in the documentation what this template is? how do you know that this template will work for everyone? should the template be configurable?

The template is kind of optional, the plugin will work without it and ES would detect correctly most of the types, and even if not optimal, it could still work for querying and graphing the data.

One of the drawbacks to not use this template is to have the tags analyzed by ES. In this case, using the default mapping from ES, instead of using tag.host one would have to use tag.host.keyword to access the non-analyzed field.

The idea is that the template provided will have sane defaults/types. Everything can be override by another template if the ES admin wants, or have this template updated manually.

Maybe I can add in the doc that it is possible to check the template created by issuing curl http://localhost:9200/_template/telegraf?pretty (considering the template name is telegraf)

BTW I need to check few additional things about the mapping of the number fields. I saw today some fields being mapped as float and others as long, and I need to check if this is correct and/or it causes any trouble.

I've made few updates to the template and added more information in the README.md file.

sparrc · 2017-01-30T19:10:24Z

plugins/outputs/elasticsearch/README.md

+```json
+{
+  "@timestamp": "2017-01-01T00:00:00+00:00",
+  "input_plugin": "cpu",


I'm not super familiar with ES best practices, but this seems a bit redundant to me. Why put "input_plugin": "cpu" in the top-level of the metric if you already have the name of the plugin in "cpu": ...?

That's to make possible/easier to query/filter for metrics from a particular input. It is not so easy/convenient to query for field names in ES. It can be done by issuing a terms query on _field_names, but I don't know how to do this in grafana/kibana for example (or even if it is possible to do).

In that case, shouldn't we put a "measurement_name" field in the ES metric? seems like this is more useful than the plugin name. There are many plugins that write more than one measurement name.

Yes, makes sense. Actually I think the current "input_plugin" is already the measurement name, it comes from metric.Name(). I will change the name of the field.

ajaybhatnagar · 2017-02-08T21:54:02Z

Is this branch ready for test ? In docs and and output plugins list , Elasticsearch is not visible. What is correct link to download and build from source ?

sparrc · 2017-02-08T22:03:28Z

plugins/outputs/elasticsearch/README.md

+```json
+{
+  "@timestamp": "2017-01-01T00:00:00+00:00",
+  "input_plugin": "cpu",


In that case, shouldn't we put a "measurement_name" field in the ES metric? seems like this is more useful than the plugin name. There are many plugins that write more than one measurement name.

lpic10 · 2017-02-09T09:33:57Z

Hi @ajaybhatnagar , this is working, feel free to test this by either cloning and building from the branch I'm pulling from or by downloading this PR as a patch and applying on top of master.

But be aware things may break as changes will happen until this is merged.

lpic10 · 2017-02-15T15:31:26Z

I have pending some optimizations in the dynamic template, a bugfix for the index name and a bit more information about the bulk request when there are some indexing errors. I should update this PR soon.

fabMrc · 2017-02-21T13:22:31Z

Is a another way to inject data to elasticSearch in older version (I am using ES 2.4 as current Springboot data version supports it)

lpic10 · 2017-02-21T22:31:39Z

@fabMrc I have no plans to work on ES 2.x support, possibly there are ways to inject telegraf data into Elasticsearch using logstash or something like that.

lpic10 · 2017-02-28T13:27:39Z

Oh, sorry, my mistake. Yes, it works if I manually set the field mapping to "double".

Still, the dynamic mapping does not work, thus for a newly created index I need to insert a document with a smaller value first for ES to create the index mapping and setting that field to "double".

sparrc · 2017-02-28T14:49:05Z

@lpic10 what do you think is the best solution to solve dynamic mapping of large values?

It's such an edge-case that I think it's OK to truncate the values, but I'll defer to what you think is best.

ajaybhatnagar · 2017-02-28T17:33:45Z

I think these links should be helpful in deciding the approach by using dynamic template custom rules.
See the numeric detection part in the links.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html

lpic10 · 2017-03-02T00:46:59Z

I decided to keep for now the mappings with "float" instead of "double" and truncate too big or too small values before sending to ES.

sparrc · 2017-03-02T08:50:19Z

plugins/outputs/elasticsearch/elasticsearch.go

+			switch x := v.(type) {
+			// Truncate values too big/small for Elasticsearch to not complain when creating a dynamic field mapping
+			case float64:
+				if (x > 0 && x > math.MaxFloat32) || (x < 0 && x < math.SmallestNonzeroFloat32) {


the 0 comparison is redundant, and MaxFloat32 is larger than the maximum "long" value in ES. You need to be comparing against math.MaxInt64 and math.MinInt64.

and you should also be checking float32 fields.

The 0 comparison I added because I saw truncation occurring when values were 0 (or -0, not sure), I will double check this later today.

Isn't "float32" in go the same as "float" in ES? Because it seems ES can handle fine these. I will check again if ES dynamic field mapping does not blow with MaxFloat32 and SmallestNonzeroFloat32 (and maybe add a new test).

this part of the error message suggests that int64 is the limit?:

"reason": "Numeric value (-9223372036854776000) out of range of long (-9223372036854775808 - 9223372036854775807)

That happened when I mistakenly changed the template to map all fields as "long".

I just noticed that comparison is complete non-sense, will try to check this out later.

lpic10 · 2017-03-03T02:12:10Z

I couldn't figure out yet exactly what is the issue with this dynamic field mapping, but it seems to happen when sending large int values only, probably because of the way these are sent to ES via JSON. For example, a MaxFloat64 can be sent to ES without issues, possibly because it is sent as a scientific notation in the JSON.

On ES side, the dynamic field detection seems to be done by the jackson JSON parser, and AFAICT it relies on java native objects for that, but I couldn't go much further on it.

sparrc · 2017-03-03T09:10:21Z

plugins/outputs/elasticsearch/elasticsearch.go

-					log.Printf("W! Elasticsearch output metric %s truncated (value %v is too big or too small, truncating to %v)", k, x, v)
-				}
+			switch {
+			case v.(float64) > math.MaxInt64:


does this work if v is a float32?

Not sure I understand how this compiles, since it's comparing a float64 to an int64?

sparrc · 2017-03-03T09:11:01Z

plugins/outputs/elasticsearch/elasticsearch.go

+			case v.(float64) < math.MinInt64:
+				v = math.MinInt64
+			default:
+				v = float32(v.(float64))


casting everything to float32?

reviewed changed code

sparrc · 2017-03-03T09:13:59Z

I decided to keep for now the mappings with "float" instead of "double" and truncate too big or too small values before sending to ES.

So are we not doing this anymore?

lpic10 · 2017-03-03T09:17:21Z

So are we not doing this anymore?

Yes, I want to, but I will have to understand better what is causing the mapping issue in first place. From what I saw it is not exactly the size (in bytes) of the field sent, but maybe its representation in the JSON.

sparrc · 2017-03-08T11:56:08Z

@lpic10 what's the final conclusion on handling large numbers?

should we leave as-is and make a note of it? should we filter out fields with numbers that are too small or too large?

lpic10 · 2017-03-20T02:07:29Z

@sparrc for this issue with large integers, I've added a note in the README. It is still not clear how/if it should be fixed.

It is a combination of issues between Go encoding, Jackson on the Java side, and ES trying to figure out a data type.

ajaybhatnagar · 2017-03-20T15:38:37Z

I am seeing possibly another mapping related issue where for system load data need to be stored as float but it seems to mapped as long and can not be converted: Should I create another ticket for it ?

[2017-03-20T15:27:00,628][DEBUG][o.e.a.b.TransportShardBulkAction] [esdbmonm01] [es5dbmon_200317][0] failed to execute bulk item (index) index {[es5dbmon_200317][metrics][AVrsVB8zSoM5CnqaP7FL], source[{"@timestamp":"2017-03-20T15:27:00Z","measurement_name":"system","system":{"load1":0.08,"load15":0.02,"load5":0.07,"n_cpus":8,"n_users":0},"tag":{"host":"esdbmonm04"}}]}
java.lang.IllegalArgumentException: mapper [system.load15] cannot be changed from type [long] to [float]
at org.elasticsearch.index.mapper.MappedFieldType.checkTypeName(MappedFieldType.java:147) ~[elasticsearch-5.2.2.jar:5.2.2]

lpic10 · 2017-03-20T15:57:10Z

@ajaybhatnagar do you have system.load5 fields mapped to long in your index mapping?

It is possible that you are using an old version of the template that could cause this issue. You can try to set overwrite_template = true on telegraf to update your template, but that will affect only new indexes.

ajaybhatnagar · 2017-03-20T18:34:25Z

Thanks. template overwrite, fixed it.

sparrc · 2017-03-20T20:31:41Z

I think this PR looks more or less good to go, @danielnelson do you want to give it a final review?

fabMrc · 2017-05-12T13:06:30Z

Is it released ?

sparrc · 2017-05-12T13:50:41Z

@fabMrc it'll be in 1.3, see https://github.com/influxdata/telegraf/blob/master/CHANGELOG.md

fabMrc · 2017-05-12T14:27:41Z

Ok nice ! Soon or later ?

danielnelson · 2017-05-12T22:19:13Z

@fabMrc Soon, you can try the latest release candidate, links are over here #2733

lpic10 added 3 commits January 27, 2017 20:49

Add Elasticsearch 5.x output

648e50b

Minor change on README.md

9ea4d6a

Use time.Format() instead

7e494b5

sparrc suggested changes Jan 27, 2017

View reviewed changes

lpic10 added 2 commits January 28, 2017 15:04

Fix issue with template generation

d1951a8

Updates to README.md and config sample

e0ebc34

sparrc suggested changes Jan 30, 2017

View reviewed changes

sparrc reviewed Jan 30, 2017

View reviewed changes

lpic10 added 6 commits February 5, 2017 20:06

Return on not supported ES version

d786c6b

Fix ES template

bf75c21

Use UTC for index creation

8f56697

Small changes

14d6ed2

Merge branch 'master' of telegraf into es_support

866ce6c

Update index template doc

2f94c95

amandahla mentioned this pull request Feb 7, 2017

Add ElasticSearch versions 2.X and 5 Output. #1875

Closed

1 task

sparrc suggested changes Feb 8, 2017

View reviewed changes

lpic10 added 2 commits February 9, 2017 22:53

Change field input_plugin to measurement_name

8dd92ab

few improvements

959638a

lpic10 mentioned this pull request Feb 10, 2017

elasticsearch output plugin #782

Closed

Several updates, added timeout on ES connection

a6cfae2

Truncate big values before inserting

4506936

sparrc suggested changes Mar 2, 2017

View reviewed changes

lpic10 added 2 commits March 3, 2017 02:31

Attempt to fix issues with ES dynamic field mapping

0e07a9b

Reverting to previous implementation

d9ee5d9

sparrc previously requested changes Mar 3, 2017

View reviewed changes

Improved tests & README

beb9390

danielnelson merged commit bb28fb2 into influxdata:master Mar 21, 2017

ssorathia pushed a commit to ssorathia/telegraf that referenced this pull request Mar 25, 2017

Add Elasticsearch 5.x output (influxdata#2332)

837caf1

calerogers pushed a commit to calerogers/telegraf that referenced this pull request Apr 5, 2017

Add Elasticsearch 5.x output (influxdata#2332)

7edc1cb

vlamug pushed a commit to vlamug/telegraf that referenced this pull request May 30, 2017

Add Elasticsearch 5.x output (influxdata#2332)

f66d02c

lpic10 deleted the es_support branch November 23, 2017 09:48

maxunt pushed a commit that referenced this pull request Jun 26, 2018

Add Elasticsearch 5.x output (#2332)

a7e8bc1

Add Elasticsearch 5.x output #2332

Add Elasticsearch 5.x output #2332

Conversation

lpic10 commented Jan 27, 2017 • edited Loading

Required for all PRs:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sparrc commented Jan 28, 2017

lpic10 commented Jan 28, 2017 • edited Loading

berglh commented Jan 30, 2017 • edited Loading

lpic10 commented Jan 30, 2017

berglh commented Jan 30, 2017 via email

sparrc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajaybhatnagar commented Feb 8, 2017

Choose a reason for hiding this comment

lpic10 commented Feb 9, 2017

lpic10 commented Feb 15, 2017

fabMrc commented Feb 21, 2017

lpic10 commented Feb 21, 2017

lpic10 commented Feb 28, 2017

sparrc commented Feb 28, 2017

ajaybhatnagar commented Feb 28, 2017

lpic10 commented Mar 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lpic10 Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

lpic10 commented Mar 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sparrc commented Mar 3, 2017

lpic10 commented Mar 3, 2017

sparrc commented Mar 8, 2017

lpic10 commented Mar 20, 2017

ajaybhatnagar commented Mar 20, 2017

lpic10 commented Mar 20, 2017

ajaybhatnagar commented Mar 20, 2017

sparrc commented Mar 20, 2017

fabMrc commented May 12, 2017

sparrc commented May 12, 2017

fabMrc commented May 12, 2017

danielnelson commented May 12, 2017

lpic10 commented Jan 27, 2017 •

edited

Loading

lpic10 commented Jan 28, 2017 •

edited

Loading

berglh commented Jan 30, 2017 •

edited

Loading

lpic10 Mar 2, 2017 •

edited

Loading