Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Filebeat kubernetes module. {issue}10812[10812] #10912

Closed
wants to merge 5 commits into from
Closed

Add Filebeat kubernetes module. {issue}10812[10812] #10912

wants to merge 5 commits into from

Conversation

alakahakai
Copy link

@alakahakai alakahakai commented Feb 22, 2019

Add Filebeat kubernetes module. #10812. Closed old PR #10844.

@alakahakai alakahakai requested review from a team as code owners February 22, 2019 19:28
@alakahakai alakahakai requested review from andrewkroh and tsg February 22, 2019 19:28
@elasticmachine
Copy link
Collaborator

Pinging @elastic/secops

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alakahakai for this. I think that supporting envoy and coredns is going to be a great addition.

But I think that they should be two different modules, and not linked to kubernetes. Even if both envoy and coredns are services that are usually deployed on kubernetes they can also be run with other orchestrators or even as standalone services.
Filebeat modules should focus on supporting specific services, no matter how they are deployed.

coredns:
enabled: true
envoy:
enabled: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they should be two modules, one for coredns and another one for envoy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion. Let me see whether doing these two modules separately will allow them to work in kubernetes deployments. Some additional work might still be needed.

description: >
Fields from coredns logs after normalization
fields:
- name: duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add descriptions for fields in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out how I handled event.duration across very different sources in #10274. I think that's a pattern we can likely reuse broadly. This will reduce compilations a lot across the board, if we can always use the same snippet.

type: integer

- name: response_flags
type: keyword
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ECS fields for http requests and responses https://github.com/elastic/ecs#-http-fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jaime's comment here and below. I've added a few more as well.

Is the intent of this module to preserve the original information under envoy.*, and only copy out to the ECS fields?

In most places we've simply moved all of the data to the ECS field directly, and not defined the field for the source naming. Of course you still need to define all of the fields that don't actually map to ECS.

- name: response_flags
type: keyword

- name: bytes_received
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use http.request.bytes for this?

- name: bytes_sent
type: long

- name: duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field: "message"
target_prefix: "coredns"
- drop_fields:
fields: [time]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better to place all processing in the ingest pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like for Zeek, I'm carrying the fight over here ;-)

I like the idea of splitting out the logs here. It lets people use Beat processors that are inherently agent-side. There's no PID in this specific log, but one may want to do reverse DNS on the IPs.

I would directly name the fields following ECS here.

"%{timestamp} [%{event.level}] %{client.ip}:%{client.port} ...

And make sure to copy client.* to source.*. We're trying to always have source.* as a baseline in ECS.

target_prefix: "envoy"
- drop_fields:
fields: [time]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better to do all processing in the ingest pipeline.

labels:
k8s-app: ingest
---

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already deploy files in deploy/kubernetes, could this be added there?

var:
- name: paths
default:
- /var/lib/docker/containers/*/*-json.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take logs from all containers running in the system, we probably want to set here some safe default, or nothing at all so setting something is required.
Take into account that if this module is used with kubernetes or docker autodiscover this path will be easily set to the one of the specific envoy container.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see file ingest-k8s.yaml and pull 10911 for addressing this. Note that the file names are using some generated 64-byte id here hence hard to specify without auto-discovery,.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless specifically desired, I think we should as much as possible use the ECS fields directly, without defining the source fields that are mapped cleanly to ECS. Of course all of the fields that don't map should be defined, as you're doing.

Noted a few other minor things.

@@ -244,6 +244,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Add ISO8601 timestamp support in syslog metricset. {issue}8716[8716] {pull}10736[10736]
- Add more info to message logged when a duplicated symlink file is found {pull}10845[10845]
- Add Netflow module to enrich flow events with geoip data. {pull}10877[10877]
- Add Filebeat kubernetes module. {issue}10812[10812]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would link the Changelog to this PR which will have all of the action, not the original issue.

- name: upstream_service_time
type: double

- name: forwarded_for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing for xff in ECS yet

- name: request_id
type: keyword

- name: authority
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an RFC Authority (name or IP + port), this can go in .address

- name: forwarded_for
type: keyword

- name: http_user_agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECS user_agent.original

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, all the related ECS fields are added via the beats or ingest pipeline processors. The original fields are kept for reference. I guess I should remove these original fields instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The team has been going back and forth on this.

For all plaintext logs, the cutover has been pretty straightforward and drastic. We parse right into the ECS fields as much as we can, and add custom fields for everything not covered.

For Suricata, we started with the assumption that some pre-existing content or community experience existed, based on the exact Suricata field names & so on, because of the existence of their EVE JSON log format. I'm not sure if that's the case or not.

But since 7.0, we've started "hollowing out" the suricata.eve.* namespace, and replacing all of the fields that map to ECS with field aliases anyway (examples).

So I think we should do the same here. Especially since this is a brand new module. Let's not create legacy we'll have to clean up in 8.0 ;-)

- name: upstream_port
type: integer

- name: message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECS message

field: "message"
target_prefix: "coredns"
- drop_fields:
fields: [time]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like for Zeek, I'm carrying the fight over here ;-)

I like the idea of splitting out the logs here. It lets people use Beat processors that are inherently agent-side. There's no PID in this specific log, but one may want to do reverse DNS on the IPs.

I would directly name the fields following ECS here.

"%{timestamp} [%{event.level}] %{client.ip}:%{client.port} ...

And make sure to copy client.* to source.*. We're trying to always have source.* as a baseline in ECS.

description: >
Fields from coredns logs after normalization
fields:
- name: duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out how I handled event.duration across very different sources in #10274. I think that's a pattern we can likely reuse broadly. This will reduce compilations a lot across the board, if we can always use the same snippet.

"script": {
"lang": "painless",
"source": "ctx.event.created = ctx['@timestamp']; ctx['@timestamp'] = ctx['coredns']['timestamp']; ctx.coredns.remove('timestamp');",
"ignore_failure" : true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should parse the date in ingest node, to ensure the document content is consistent. Even if the Elasticsearch date datatype is able to parse many things, we want the source document itself to be consistent.

It's a thing ECS will have to make more clear in the future, IMO. (See what happened with the Zeek Unix ts)

{
"set": {
"field": "user_agent.original",
"value": "{{envoy.http_user_agent}}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to copy it around to avoid clobbering anymore. Since 7.0, the UA parser re-outputs the .original.

So you can directly populate user_agent.original and have:

    {
      "user_agent": {
        "field": "user_agent.original",
        "ignore_missing": true
      }
    }

and the right thing will happen :-)

]
env:
- name: ELASTICSEARCH_HOST
value: 192.168.99.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this IP be here (same for KIBANA_HOST below)? Is this a Kubernetes default for the gateway / your host?

@alakahakai
Copy link
Author

I will try to break this into two separate modules - one for coredns, the other for envoy; And move them to non x-pack. Should additional things required for these to work in Kubernetes, a separate kubernetes module would be added then.

@alakahakai alakahakai added in progress Pull request is currently in progress. and removed review labels Feb 28, 2019
@alakahakai
Copy link
Author

Close this PR. I will open new issues and PRs for coredns and envoy.

@alakahakai alakahakai closed this Feb 28, 2019
@alakahakai alakahakai deleted the issue-10812 branch February 28, 2019 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat in progress Pull request is currently in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants