Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto creation of template #639

Merged
merged 2 commits into from
Jan 28, 2016
Merged

Auto creation of template #639

merged 2 commits into from
Jan 28, 2016

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Jan 6, 2016

Problem

Each beat loads structured documents into elasticsearch. To make sure every document has the correct mapping, before starting a beat the predefined mapping should be loaded. Currently this is a manual step and is often forgotten. This can lead to problems as elasticsearch automatically assumes types. A mapping can't be changed anymore at a later stage.

See also https://github.com/elastic/libbeat/issues/62

Versioning

An additional problem is, that the templates can change with the different versions of a beat. This means having data of 2 different beat versions can lead to problems.

Logstash Template

Logstash provides its own template to elasticsearch as part of the logstash-output-elasticsearch. This is a generic template and does not necessarly cover all cases from the beats. The template applies to all indices starting with logstash-, by default the beats templates apply to all indices starting with beatname-

Proposed solution

To not have to implement all possibilities with the first version and better understand on how the feature will be used, I suggest to split it up in two phases:

Phase 1 - Manual phase / opt in

Phase 1 will be fully backward compatible. It allows the user to load the template if he configures it so. By default, the template will not be loaded.

  • Manually configure template in config file
  • No default behaviour
  • Getting started guide on how to do it

Phase 2 - Automated / versioning

Phase 2 always loads the template on startup and adds versioning for each template, so no conflicts between the different template versions happen.

  • Automatic loading of templates
  • Versioning of templates
  • Disabling by user needed if does not want to have behaviour

Configuration

The configuration will look as following. It is part of the elasticsearch output as a direct connection to elasticsearch is needed to apply the template. People using Logstash for example for filebeat must apply the manually or use the logstasth-elasticsearch-output.

output:
  elasticsearch:

     # A template is used ot set the mapping in elasticsearch
     # By default in phase 1, if this is commented out, no template is loaded
     # These settings can be adjusted to load your own template or overwrite existing ones
     template:

     # Template name. By default the template name is the same as the beatname
     name: "beatname"

     # Path to template file
     # TODO: make this platform specific?
     path:

     # Overwrite existing template
     overwrite: false

This configuration is taken from Logstash: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template

Versioning

To support the different version of templates, each template must be versioned. In the first version, by default templates would not be applied. In a second version when versioning of templates is in place, templates should be loaded automatically. Versioning is required for the automated behaviour as otherwise this would lead very soon to conflicts. With versioning in place, it is possible to keep using older beats running and writing to indices with the old mapping and having a newer beat running at the same time, writing in new indices with the new mapping.

To solve this problem, it is probably not only required to version the templates but also the indices, so the correct templates can be applied. It should be checked with Kibana / Marvel / Logstash on how they are solving this problem.

Logstash

Logstash has with the elasticsearch-logstash-output its own plugin which already applies a template to elasticsearch. This template is a general template and is intended for all indices starting with logstash-. Especially filebeat is normally connected to LS and sends data to ES over the output plugin. Depending on the LS configuration, either the index pattern logstash- or filebeat- is used. The question is which mapping should be used when sending data over LS.

Option 1

  • Only use default logstash mapping
  • Use the logstash- index
  • Add filebeat specific offset field to ls mapping

The disadvantage of option 1 is, that it only applies to filebeat and not the other beats. The assumptionis made, that the other beats send data directly to ES.

 Option 2

  • Apply Filebeat Mapping manually
  • Configure filebeat- index pattern in LS

The main disadvantage here is that the mapping has to be applied manually. People updating a beat will probably not apply the updated (versioned) mapping. The advantage is that it works with all beats.

Suggestion

I would suggest to go with option 2, as filebeat has one of the simplest patterns of all beats, means there will be very rare BC breaks and in the long term, also nodeingest could be used for filebeat.

 Note

Similar to logstash, there should be a generic base pattern provided by libbeat which applies to all beats.


func (beat *Beat) loadTemplate() error {

// TODO: Fin a way to check if flag was even set. If it was set but no path, default path should be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that just template != nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that doesn't work for string, ints etc. because of the default which is a string "". I don't think I can set nil as default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I was hoping it sets the pointer to nil if the flag is not used, but that doesn't seem to be the case. Maybe we'd be better with having the setting in the configuration file? I find that more friendly to packaging, config management, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. In general I also prefer to all the settings in the config file. The only "issue" here is that this should be done only once. But as it does not have any affect if it is applied multiple times, this should also not be an issue.

As this config belongs to the elasticsearch output, I suggest to also place it there. Something like

output:
  elasticsearch:
    #template: path

If template is not configure, default path is taken. If template is uncommented an no path is provided, template will not be loaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsg One thing that came to my mind is that the template will be loaded multiple times in most cases anyways, as there are multiple beats on different machines ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkroh That's a good idea. I actually have to check if this could be done with one call. As in case lets say 20 beats start up and every beat sends the template, it is quite likely that between a HEAD request and sending the template, and other one already did it.

Also the idea came up to use versioning for the templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get it right, this is also what LS does: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-manage_template We might want to check the source code to make sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the beats that are sending data to the same Elasticsearch have different versions and different templates? If only one beat is loading the template, it might be that an older beat is expecting an older template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to make sure all the beats are running the same version. Otherwise raise an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monicasarbu The idea here was to version the templates. But I still need to create our concept here.

@ruflin
Copy link
Member Author

ruflin commented Jan 12, 2016

I had a look at the template for logstash-output-elasticsearch and there would be two things which would have to be added:

  • offset type long
  • support for also filebeat-* templates

Currently it is assume, that all templates start with logstash-*. We could recommend people who use filebeat with logstash to use the ls index to solve this issue as I think it is not possible to set 2 template patterns.

reader := bytes.NewReader(content)


client.LoadTemplate(config.Template.Name, reader)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of just forwarding as is, do we want to do some validation?

@tsg
Copy link
Contributor

tsg commented Jan 12, 2016

@ruflin I was thinking we'll just tell people to configure the Beats templates in the logstash-elasticsearch-output. When using the beats with logstash, you need a pretty special logstash configuration anyway.

@ruflin
Copy link
Member Author

ruflin commented Jan 12, 2016

@tsg If we go this route it would probably make sense to move the more generic stuff from the logstash template also into the filebeat template. I assume people using LS will do some transformations etc., means having fields which are not part of our template yet.

@@ -40,7 +40,22 @@ output:
# Scheme and port can be left out and will be set to the default (http and 9200)
# In case you specify and additional path, the scheme is required: http://localhost:9200/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:9200
hosts: ["localhost:9200"]
hosts: ["192.168.99.100:9200"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will be removed again. Only for testing.

@tsg
Copy link
Contributor

tsg commented Jan 12, 2016

@ruflin right, we might want to integrate pretty much all of LS template into Filebeat's template, also in preparation for the ingest node. This would be worth a new discuss ticket?

logp.Debug("test", "Test: %v", config.Template)

// Check if template already exist or should be overwritten
if !esClient.CheckTemplate(config.Template.Name) && config.Template.Overwrite {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition with CheckTemplate and LoadTemplate when using multiple Beats (potentially of different versions) but I don't know of any way to work-around it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think there is a potential conflict. For different versions the solution is probably to version the templates and indices. I recently updated the issue and add a note about the version problem: #639 The initial solution would be to not make it automatic.

@tsg
Copy link
Contributor

tsg commented Jan 13, 2016

Regarding the versioning issue, it sounds to me like the only upgrade-safe way to do it is to encode the "schema" version in the index names, like Marvel does.

I wonder if in Phase 1 we shouldn't load the template by default with overwrite: false. This has the benefit of simplifying the first experience, while still being safe enough against user provided templates.

Regarding the Logstash setup, how about option 3, Filebeat manages the template directly with Elasticsearch. Architecture wise, we said before that it's fine if in a *beat -> Logstash -> Elasticsearch, *beat queries Elasticsearch directly for configuration, topology information, own metrics, etc. I see the template management similar to that. I see a challenge with this option is how to organize the configuration file.

@ruflin
Copy link
Member Author

ruflin commented Jan 13, 2016

I think it is a good idea to go in Phase with overwrite: false but having a warning / error message in log.

The challenge with option 3 is as you mentioned the config file as it cannot depend on the elasticsearch output as it would then also send events to it. But as we have to think about this anyways with configuration and metrics lets add this to it. I like the idea and will check if I can come up with a good solution.

@ruflin ruflin added the in progress Pull request is currently in progress. label Jan 25, 2016
@ruflin
Copy link
Member Author

ruflin commented Jan 25, 2016

We have decided to go with Phase 1 first and overwrite: false by default. PR will be updated.

@ruflin ruflin force-pushed the load-template branch 2 times, most recently from 689f236 to 37dcc78 Compare January 27, 2016 08:40
@ruflin ruflin changed the title Prototype of auto creation of template. Auto creation of template. Jan 27, 2016
@ruflin ruflin changed the title Auto creation of template. Auto creation of template Jan 27, 2016
@ruflin
Copy link
Member Author

ruflin commented Jan 27, 2016

For changing the documentation I created a follow up issue here: #862

@ruflin ruflin added review v2.0.0 libbeat and removed in progress Pull request is currently in progress. labels Jan 27, 2016
@@ -58,6 +58,7 @@ https://github.com/elastic/beats/compare/v1.1.0...master[Check the HEAD diff]
- Update builds to Golang version 1.5.3
- Add ability to override configuration settings using environment variables {issue}114[114]
- Libbeat now always exits through a single exit method for proper cleanup and control {pull}736[736]
- Possibility to create elasticsearch mapping on startup {pull}639[639]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest changing to: Add ability to create Elasticsearch mapping on startup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@ruflin
Copy link
Member Author

ruflin commented Jan 28, 2016

@dedemorton Doc adjustments done.

@@ -151,6 +151,9 @@ filebeat:
# Event count spool threshold - forces network flush if exceeded
#spool_size: 2048

# Enable async publisher pipeline in filebeat (Experimental!)
#publish_async: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this option here by mistake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was automatically generated, so I think it went missing in one of the previous updates.

@monicasarbu
Copy link
Contributor

LGTM.

monicasarbu added a commit that referenced this pull request Jan 28, 2016
@monicasarbu monicasarbu merged commit 2e9b161 into elastic:master Jan 28, 2016
@andrewkroh
Copy link
Member

LGTM. The organization of a few of the imports don't match up with how goimports organizes them.

@ruflin
Copy link
Member Author

ruflin commented Jan 29, 2016

@andrewkroh Will clean that up in a later PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants