Add EC2, GCE, or DigitalOcean metadata to events #2728

andrewkroh · 2016-10-07T17:52:35Z

This introduces a new processor called add_cloud_metadata that detects
the hosting provider, caches instance metadata, and enriches each event
with the data. There is one configuration option named timeout that
configures the maximum amount of time metadata fetching can run for. The
default value is 3s.

Sample config:

processors:
- add_cloud_metadata:
    #timeout: 3s

Sample data from the providers:

{
  "meta": {
    "cloud": {
      "availability_zone": "us-east-1c",
      "instance_id": "i-4e123456",
      "machine_type": "t2.medium",
      "provider": "ec2",
      "region": "us-east-1"
    }
  }
}

{
  "meta": {
    "cloud": {
      "instance_id": "1234567",
      "provider": "digitalocean",
      "region": "nyc2"
    }
  }
}

{
  "meta": {
    "cloud": {
      "availability_zone": "projects/1234567890/zones/us-east1-b",
      "instance_id": "1234556778987654321",
      "machine_type": "projects/1234567890/machineTypes/f1-micro",
      "project_id": "my-dev",
      "provider": "gce"
    }
  }
}

monicasarbu · 2016-10-07T20:48:24Z

libbeat/processors/actions/cloud_metadata.go

+	go func() { c <- fetchJSON("gce", gceHeaders, gceMetadataURL, gceSchema) }()
+
+	var results []result
+	timeout := time.After(5 * time.Second)


I would use a variable like ProviderTimeout instead of using the number 5, in order to make it somehow configurable.

The timeout is now configurable and defaults to 3s. In practice when running on EC2 the request is completed in 2ms.

monicasarbu · 2016-10-07T21:08:50Z

libbeat/processors/actions/cloud_metadata.go

+				Timeout:   2 * time.Second,
+				KeepAlive: 0, // We are only making a single request.
+			}).Dial,
+			ResponseHeaderTimeout: 2 * time.Second,


@andrewkroh How did you choose 2 here? Should be 2 < 5, right?

I chose 5 to allow the individual requests to either complete or timeout on their own. In the worst case, timeout could take ~4 seconds (2 seconds for connect timeout + 2 seconds for response header timeout). I left a 1 second buffer since it's executing three requests in parallel and they might not all be scheduled immediately.

I changed the timeout implementation to make it configuration so this code is now different and relies on the http.Client.Timeout.

monicasarbu · 2016-10-07T21:19:40Z

@andrewkroh This is a great PR 👍

I only have a few minor comments:
I would suggest to add an example in the .full.yml with cloud_metadata processor, and maybe mention experimental.
As the other actions are drop_* or include_*, starting with a verb, I suggest to change the name from cloud_metadata to something similar, maybe add_metadata with an option cloud: true or just add_cloud_metadata?

andrewkroh · 2016-10-08T16:37:27Z

I renamed the processor to add_cloud_metadata. I added it to the config.full.yml.

ruflin

Great addition. Few thoughts:

I would suggest we either split this one processor up into 4 processors or make the type configurable. We could have a processor for each cloud type + 1 one which does the auto detection. Currently as far as I understand it checks every time all 3 options? We could also have a config option inside the processor which defines the type
Is there a specific reason for the 3s default timeout? If not I suggest we use the same as we use for other "metrics" in Metricbeat as default which is 10s.
Namespace: This processor reserves the namespace cloud. I'm kind of worried that we this could conflict with other things like a potential cloud module in metricbeat or some data in other beats. As we will face the same problem with other processors we could use a namespace where we add meta in general to prevent similar future conflicts. This namespace could be meta. The same namespace could be used in filebeat for data added by reader as there we face a similar issue: Add line number counter for multiline #2279 In addition we could allow for processors to define one which field data should be added
This definitively needs a CHANGELOG.md entry :-)
Do you plan to add docs for this in an other PR?

ruflin · 2016-10-10T07:03:50Z

libbeat/_meta/fields.yml

+      description: >
+        Name of the cloud provider. Possible values are ec2, gce, or digitalocean.
+
+    - name: cloud.instance_id


We should use cloud.instance.id and cloud.instance.type here to follow our naming schema.

After further thought, I changed instance_type to machine_type (follow GCE's naming). So I don't think we need to make it instance.id now.

ruflin · 2016-10-10T07:04:15Z

libbeat/processors/actions/add_cloud_metadata.go

+
+	digitaloceanMetadataURL = "http://" + digitaloceanMetadataHost + digitaloceanMetadataURI
+	digitaloceanSchema      = s.Schema{
+		"instance_id": c.StrFromNum("droplet_id"),


instance.id

ruflin · 2016-10-10T07:04:32Z

libbeat/processors/actions/add_cloud_metadata.go

+
+		if instance, ok := m["instance"].(map[string]interface{}); ok {
+			s.Schema{
+				"instance_id":       c.StrFromNum("id"),


See above for the naming.

ruflin · 2016-10-10T07:07:09Z

metricbeat/schema/mapstriface/mapstriface_test.go

@@ -4,6 +4,8 @@ import (
 	"testing"
 	"time"

+	"encoding/json"


remove newline on top

ruflin · 2016-10-10T07:08:16Z

packetbeat/packetbeat.full.yml

+# provider about the host machine. It works on EC2, GCE, and DigitalOcean.
+#
+#processors:
+#-add_cloud_metadata:


space after -

monicasarbu · 2016-10-10T09:39:07Z

I agree with @ruflin and have either a config option under add_cloud_metadata to choose the provider or do autodiscovery where the provider is detected automatically. Also, I agree that exporting all these information under cloud is a bit too generic, and I suggest meta.cloud.

urso · 2016-10-10T11:59:11Z

This PR introduces some kind of lookup processor as x-exec-lookup branch does. For this we introduced some namespacing on filters. e.g. x-exec-lookup registers to processor lookup.exec. For consistency reasons I'd propose to register to lookup.cloud_metadata.

andrewkroh · 2016-10-11T14:17:32Z

@monicasarbu @ruflin thanks for reviewing.

Is there a specific reason for the 3s default timeout? If not I suggest we use the same as we use for other "metrics" in Metricbeat as default which is 10s.

This processor runs once at startup and auto-detects the cloud provider using three HTTP requests executed in parallel. If the Beat is running in the cloud then it can usually reach a disposition in ~2ms. If it's not running in the cloud then these requests will usually timeout because because there is no route to the metadata services which run on a special link-local IP address.

I don't think there is a need to increase the timeout. It should be able to reliably reach a disposition in that 3s window (probably even shorter would be fine). This would allow you to add the processor to all of your deployments whether they be on-prem or in the cloud without much of a penalty.

I would suggest we either split this one processor up into 4 processors or make the type configurable. We could have a processor for each cloud type + 1 one which does the auto detection. Currently as far as I understand it checks every time all 3 options? We could also have a config option inside the processor which defines the type

I really don't think this is necessary. Can we put this into master without these feature, let it get used a bit, then see if these is necessary?

Namespace: This processor reserves the namespace cloud. I'm kind of worried that we this could conflict with other things like a potential cloud module in metricbeat or some data in other beats.

This can definitely be a problem. After looking at the exec lookup feature I think I will default this to writing the data under fields, add a fields_under_root option, and provide a way to configure the key which defaults to cloud. I am also going to namespace the processor as lookup.cloud_metadata as suggested by @urso.

Do you plan to add docs for this in an other PR?

I'll write the docs in a second PR after the code and behavior is finalized.

ruflin · 2016-10-11T14:37:26Z

@andrewkroh Thanks for the details.

This processor runs once at startup

That is the part I missed. I confused the timeout with how often it runs. I thought it updates the meta data every 3s. In this case 3s or lower makes total sense.

This also answers the second part about the config options. If it is only run once, there is very low overhead of running all 3.

I will default this to writing the data under fields

I don't think we should mix manually added data from the user and machine generated data. That is why I would prefer NOT to put it under fields but find an other namespace. Also I would not provide a fields_under_root option as this will only lead to problems with overwriting fields and will invalidate our predefined templates. I think I don't fully understand the advantage of having fields_under_root option. We could put it under lookup.cloud_metadata namespace. This makes a logical connection between the processor and the data itself. Having auto generation of templates from processors in mind (long term idea) this would make things easier ;-)

For me the only blocker to discuss for this PR is the namespace where the data will be written to.

andrewkroh · 2016-10-11T21:40:06Z

I pushed a change to rename the processor as lookup.cloud_metadata. We just need to discuss where the data shall go.

andrewkroh · 2016-10-12T22:12:15Z

This PR has been updated based on our discussions.

Processor name is add_cloud_metadata.
The data is added to events under meta.cloud. See PR description (top) for full examples.

ruflin · 2016-10-13T12:50:35Z

LGTM. I think we should also add this change to the CHANGELOG

This introduces a new processor called `add_cloud_metadata` that detects the hosting provider, caches instance metadata, and enriches each event with the data. There is one configuration option named `timeout` that configures the maximum amount of time metadata fetching can run for. The default value is 3s. Sample config: ``` processors: - add_cloud_metadata: #timeout: 3s ```

andrewkroh · 2016-10-13T13:04:29Z

Added this to the CHANGELOG.

urso · 2016-10-17T12:30:39Z

I'm not sure I agree with the chosen namespaces here.

Why use add_cloud_metadata instead of lookup.cloud_metadata?

Why choose meta.cloud? I'd consider meta namespace to be quite common regarding filebeat+json or possibly some future metricset.

Considering more processors being added in future + having some more options to add custom fields in beats (e.g. filebeat prospector, packetbeat modules, processors) I'd opt for some general guidelines regarding namespacing here. Not saying one naming is better or worse then another, but mostly striving for consistency and some general agreement here.

Instead of using fields_under_root or some non-configurable top-level name, how about making the namespace for additional fields to be added configurable namespace? This option can be reused for fields settings as well as for lookup like processors.

tsg · 2016-10-17T13:06:09Z

We've discussed these things on Wednesday and the discussion was about to go long (like it usually does on things like this), so we decided to go with one of the options, knowing that we still have time to change it before this sees the light of day in 5.1.

So lets continue the discussion, although this PR is probably not the best place for it.

andrewkroh added enhancement libbeat :Processors review labels Oct 7, 2016

monicasarbu approved these changes Oct 7, 2016

View reviewed changes

andrewkroh force-pushed the feature/aws-metadata-processor branch from 0ca4c19 to cd00d89 Compare October 8, 2016 16:33

ruflin suggested changes Oct 10, 2016

View reviewed changes

andrewkroh force-pushed the feature/aws-metadata-processor branch from cd00d89 to 97b7db1 Compare October 11, 2016 21:34

andrewkroh force-pushed the feature/aws-metadata-processor branch 2 times, most recently from 259afb3 to b48acfa Compare October 12, 2016 21:28

ruflin approved these changes Oct 13, 2016

View reviewed changes

andrewkroh force-pushed the feature/aws-metadata-processor branch from b48acfa to 2918621 Compare October 13, 2016 13:03

monicasarbu merged commit f24f925 into elastic:master Oct 13, 2016

monicasarbu deleted the feature/aws-metadata-processor branch October 13, 2016 13:06

andrewkroh mentioned this pull request Oct 17, 2016

Document add_cloud_metadata processor #2791

Closed

andrewkroh added the v5.1.0 label Nov 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EC2, GCE, or DigitalOcean metadata to events #2728

Add EC2, GCE, or DigitalOcean metadata to events #2728

andrewkroh commented Oct 7, 2016 •

edited

Loading

monicasarbu Oct 7, 2016 •

edited

Loading

andrewkroh Oct 8, 2016

monicasarbu Oct 7, 2016

andrewkroh Oct 8, 2016 •

edited

Loading

andrewkroh Oct 8, 2016

monicasarbu commented Oct 7, 2016 •

edited

Loading

andrewkroh commented Oct 8, 2016

ruflin left a comment

ruflin Oct 10, 2016

andrewkroh Oct 11, 2016

ruflin Oct 10, 2016

ruflin Oct 10, 2016

ruflin Oct 10, 2016

andrewkroh Oct 11, 2016

ruflin Oct 10, 2016

andrewkroh Oct 11, 2016

monicasarbu commented Oct 10, 2016

urso commented Oct 10, 2016

andrewkroh commented Oct 11, 2016

ruflin commented Oct 11, 2016 •

edited

Loading

andrewkroh commented Oct 11, 2016

andrewkroh commented Oct 12, 2016

ruflin commented Oct 13, 2016

andrewkroh commented Oct 13, 2016

urso commented Oct 17, 2016

tsg commented Oct 17, 2016 •

edited

Loading

Add EC2, GCE, or DigitalOcean metadata to events #2728

Add EC2, GCE, or DigitalOcean metadata to events #2728

Conversation

andrewkroh commented Oct 7, 2016 • edited Loading

monicasarbu Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewkroh Oct 8, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monicasarbu commented Oct 7, 2016 • edited Loading

andrewkroh commented Oct 8, 2016

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monicasarbu commented Oct 10, 2016

urso commented Oct 10, 2016

andrewkroh commented Oct 11, 2016

ruflin commented Oct 11, 2016 • edited Loading

andrewkroh commented Oct 11, 2016

andrewkroh commented Oct 12, 2016

ruflin commented Oct 13, 2016

andrewkroh commented Oct 13, 2016

urso commented Oct 17, 2016

tsg commented Oct 17, 2016 • edited Loading

andrewkroh commented Oct 7, 2016 •

edited

Loading

monicasarbu Oct 7, 2016 •

edited

Loading

andrewkroh Oct 8, 2016 •

edited

Loading

monicasarbu commented Oct 7, 2016 •

edited

Loading

ruflin commented Oct 11, 2016 •

edited

Loading

tsg commented Oct 17, 2016 •

edited

Loading