-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EC2, GCE, or DigitalOcean metadata to events #2728
Add EC2, GCE, or DigitalOcean metadata to events #2728
Conversation
go func() { c <- fetchJSON("gce", gceHeaders, gceMetadataURL, gceSchema) }() | ||
|
||
var results []result | ||
timeout := time.After(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use a variable like ProviderTimeout instead of using the number 5, in order to make it somehow configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout is now configurable and defaults to 3s. In practice when running on EC2 the request is completed in 2ms.
Timeout: 2 * time.Second, | ||
KeepAlive: 0, // We are only making a single request. | ||
}).Dial, | ||
ResponseHeaderTimeout: 2 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewkroh How did you choose 2 here? Should be 2 < 5, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose 5 to allow the individual requests to either complete or timeout on their own. In the worst case, timeout could take ~4 seconds (2 seconds for connect timeout + 2 seconds for response header timeout). I left a 1 second buffer since it's executing three requests in parallel and they might not all be scheduled immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the timeout implementation to make it configuration so this code is now different and relies on the http.Client.Timeout
.
@andrewkroh This is a great PR 👍 I only have a few minor comments: |
0ca4c19
to
cd00d89
Compare
I renamed the processor to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition. Few thoughts:
- I would suggest we either split this one processor up into 4 processors or make the type configurable. We could have a processor for each cloud type + 1 one which does the auto detection. Currently as far as I understand it checks every time all 3 options? We could also have a config option inside the processor which defines the type
- Is there a specific reason for the 3s default timeout? If not I suggest we use the same as we use for other "metrics" in Metricbeat as default which is 10s.
- Namespace: This processor reserves the namespace
cloud
. I'm kind of worried that we this could conflict with other things like a potentialcloud
module in metricbeat or some data in other beats. As we will face the same problem with other processors we could use a namespace where we add meta in general to prevent similar future conflicts. This namespace could bemeta
. The same namespace could be used in filebeat for data added by reader as there we face a similar issue: Add line number counter for multiline #2279 In addition we could allow for processors to define one which field data should be added - This definitively needs a CHANGELOG.md entry :-)
- Do you plan to add docs for this in an other PR?
description: > | ||
Name of the cloud provider. Possible values are ec2, gce, or digitalocean. | ||
|
||
- name: cloud.instance_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use cloud.instance.id
and cloud.instance.type
here to follow our naming schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further thought, I changed instance_type to machine_type (follow GCE's naming). So I don't think we need to make it instance.id
now.
|
||
digitaloceanMetadataURL = "http://" + digitaloceanMetadataHost + digitaloceanMetadataURI | ||
digitaloceanSchema = s.Schema{ | ||
"instance_id": c.StrFromNum("droplet_id"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance.id
|
||
if instance, ok := m["instance"].(map[string]interface{}); ok { | ||
s.Schema{ | ||
"instance_id": c.StrFromNum("id"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above for the naming.
@@ -4,6 +4,8 @@ import ( | |||
"testing" | |||
"time" | |||
|
|||
"encoding/json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline on top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
# provider about the host machine. It works on EC2, GCE, and DigitalOcean. | ||
# | ||
#processors: | ||
#-add_cloud_metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
I agree with @ruflin and have either a config option under |
This PR introduces some kind of lookup processor as x-exec-lookup branch does. For this we introduced some namespacing on filters. e.g. |
@monicasarbu @ruflin thanks for reviewing.
This processor runs once at startup and auto-detects the cloud provider using three HTTP requests executed in parallel. If the Beat is running in the cloud then it can usually reach a disposition in ~2ms. If it's not running in the cloud then these requests will usually timeout because because there is no route to the metadata services which run on a special link-local IP address. I don't think there is a need to increase the timeout. It should be able to reliably reach a disposition in that 3s window (probably even shorter would be fine). This would allow you to add the processor to all of your deployments whether they be on-prem or in the cloud without much of a penalty.
I really don't think this is necessary. Can we put this into master without these feature, let it get used a bit, then see if these is necessary?
This can definitely be a problem. After looking at the exec lookup feature I think I will default this to writing the data under
I'll write the docs in a second PR after the code and behavior is finalized. |
@andrewkroh Thanks for the details.
That is the part I missed. I confused the timeout with how often it runs. I thought it updates the meta data every 3s. In this case This also answers the second part about the config options. If it is only run once, there is very low overhead of running all 3.
I don't think we should mix manually added data from the user and machine generated data. That is why I would prefer NOT to put it under For me the only blocker to discuss for this PR is the namespace where the data will be written to. |
cd00d89
to
97b7db1
Compare
I pushed a change to rename the processor as |
259afb3
to
b48acfa
Compare
This PR has been updated based on our discussions.
|
LGTM. I think we should also add this change to the CHANGELOG |
This introduces a new processor called `add_cloud_metadata` that detects the hosting provider, caches instance metadata, and enriches each event with the data. There is one configuration option named `timeout` that configures the maximum amount of time metadata fetching can run for. The default value is 3s. Sample config: ``` processors: - add_cloud_metadata: #timeout: 3s ```
b48acfa
to
2918621
Compare
Added this to the CHANGELOG. |
I'm not sure I agree with the chosen namespaces here. Why use Why choose Considering more processors being added in future + having some more options to add custom fields in beats (e.g. filebeat prospector, packetbeat modules, processors) I'd opt for some general guidelines regarding namespacing here. Not saying one naming is better or worse then another, but mostly striving for consistency and some general agreement here. Instead of using |
We've discussed these things on Wednesday and the discussion was about to go long (like it usually does on things like this), so we decided to go with one of the options, knowing that we still have time to change it before this sees the light of day in 5.1. So lets continue the discussion, although this PR is probably not the best place for it. |
This introduces a new processor called
add_cloud_metadata
that detectsthe hosting provider, caches instance metadata, and enriches each event
with the data. There is one configuration option named
timeout
thatconfigures the maximum amount of time metadata fetching can run for. The
default value is 3s.
Sample config:
Sample data from the providers: