Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Winlogbeat to ECS, take 2 #10333

Merged
merged 27 commits into from
Feb 5, 2019
Merged

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Jan 24, 2019

Rather than fight with git to push to @ruflin's PR #10169, I'm opening a new one.
See other PR for initial discussions.

In this PR I'll try to take Winlogbeat to the finish line.

Update

Winlogbeat prior to this had over 20 fields directly at the top level of the events (e.g. { "level": "info" }), which is going to be a risk vs introducing ECS fields in the same place.

The Windows event log is also a log transport more than one specific source, of course.

So here's the approach taken, given the two points above:

  • Nest everything that's windows event log specific under winlog.
  • Take the opportunity to remove unnecessary name changes, restoring the original Win Event Log naming as much as possible. For example, "Channel" is now "winlog.channel", instead of the field "log_name".
  • Introduce a very small amount of ECS fields for now. event.created, event.dataset, etc.
  • Not touch EventData at all. The most interesting data is in there, but is 100% source specific. This should be done in steps, and should be done outside of the Winlogbeat shipper itself.

TODO

  • Adjust the event stream
  • Create aliases from the legacy fields to the new fields
  • Adjust the tests to the new names
  • Remove dataset.
  • Document in ecs-migration.yml
  • Documentation
  • Changelog
  • Update dashboard
  • Figure out how to handle Skipping loading dashboards, No directory C:\vagrant\winlogbeat\kibana/7

@webmat webmat requested a review from a team as a code owner January 24, 2019 20:24
@webmat webmat changed the title Migrate Winlogbeat to ECS Migrate Winlogbeat to ECS, take 2 Jan 24, 2019
@@ -61,22 +62,27 @@
The keywords are used to classify an event.

- name: log_name
type: keyword
# This does not exist yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have same example values what it contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't. I'll try to check out the docs, or better, generate some events

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied over my comments from the other PR.

}
m.Put("event.id", e.EventIdentifier.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as ECS event.id (at least in my understanding). This more like an identifier for the log message in the application. For example 1102 is "The audit log was cleared".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_number is closer to event.id but it's only unique given some additional constraints. If we wanted to populate event.id I'd do a fast hash of @timestamp + computer_name + log_name + record_number.

@@ -139,7 +138,7 @@ func (e Record) ToEvent() beat.Event {
// MapStr.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines up...

e.TimeCreated.SystemTime is used to populate @timestamp. That's the time the event was originally logged. So it would be nice to populate event.created with the current time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ruflin ruflin added in progress Pull request is currently in progress. Winlogbeat ecs labels Jan 28, 2019
@ruflin ruflin mentioned this pull request Jan 29, 2019
Copy link
Contributor Author

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two commits to consider, in tonight's push.

  • c1c3e04b is fully functional. It adds a few more fields for this transition, proposes event.code, etc.
  • bef4b8c4 is me trying to access what's inside event_data, and not succeeding 😆

Please check out the comments I've added in this review and tell me what you think, @ruflin & @andrewkroh.

required: false

# Candidate to add to ECS?
- name: event.code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantics of Windows event ID are useful, they are a log message code, as Andrew explained in his previous comment.

I think it would make sense to have something like this in ECS, even outside of Windows. For example, MySQL uses error codes as well.

@@ -60,23 +72,33 @@
description: >
The keywords are used to classify an event.

- name: channel

- name: provider_name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding the two windows event log fields that were initially mapped like this:

  • channel was log_name. Example values:
    • Application
    • Security
    • Setup
  • provider_name was source_name. Example values:
    • Microsoft-Windows-Security-Auditing
    • Service Control Manager
    • Microsoft-Windows-CEIP

I think the Channel idea may be interesting to map to ECS (e.g. is there a correlation with syslog's facility?), but I'm not sure for the provider name. It's interesting data, but I'm not even sure I would map that to event.origin, which has come up every once in a while. Needs more thought (aka not for FF).

}

m.Put("event.kind", "event")
m.Put("event.dataset", fmt.Sprintf("%v.%v", e.API, strings.ToLower(e.Channel)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experiment. Not really distinct datasets in the module.fileset of other beats. But here's what the values would look like, in the real world:

Modern windows:

  • wineventlog.application
  • wineventlog.security
  • wineventlog.system
  • ...

Vista and older, iirc:

  • eventlogging.application
  • eventlogging.security
  • eventlogging.system
  • ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the field / json structure in each even heavily differ? If not I'm not sure if we should use dataset for this.

Question can also be asked differently: Could there be 1 visualisation which contains data from multiple dataset events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"eventlogging" is a really old (before Vista) limited format for these events. Has way less information than "wineventlog". However afaik "wineventlog" was strictly additions vs "eventlogging".

Comparing between application & security, for example, I'm not sure. Probably somewhat possible (just like you can somewhat visualize apache.access & .error together with some caveats), when you have the same source name.

I understand the semantics are different from what we do elsewhere, as the other places the two positions here are really detailing which source this comes from: [source short name].[type of log generated by the source]`.

But unfortunately in Windows event logs, the closest thing resembling a source name is the provider name, which is really verbose (e.g. "Microsoft-Windows-Security-Auditing").

Perhaps we should just take the event.dataset out for now, and add later, after taking some time to consider how to do this right. The information itself is still present under winlog.*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on not using event.dataset for this as it's not consitent with other use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I answered this from a very different angle than you were asking, actually.

The winlog.* fields except the content of event_data or user_data should always be almost identical, as it's mostly metadata from the win event log pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I read it: There is just one event.dataset if we even populate it. It could be event.dataset: winlogevent or winlog.event or similar. We can still add it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting idea. But I don't think the value is necessary at the moment (you could still build dashboards by querying other fields). And we might decide there's a better way to populate event.dataset if we were to add "modules" to winlogbeat or fold this whole thing into a "logbeat".

Copy link
Contributor Author

@webmat webmat Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. It's more like the bullet lists I had pasted above. E.g. recent windows, you'd have a combination of:

  • wineventlog.application
  • wineventlog.security
  • wineventlog.system
  • ...

But the "wineventlog" part never changes for a given host.

m.Put("source_name", e.Provider.Name)
// Why not keep the Windows naming for those?
m.Put("channel", e.Channel)
m.Put("provider_name", e.Provider.Name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the old Winlogbeat names that were mapped to a different field name vs just keeping the Windows terminology. I ❤️ the Windows name better. Thoughts?

m.Put("channel", e.Channel)
m.Put("provider_name", e.Provider.Name)
m.Put("event.code", e.EventIdentifier.ID)
addOptional(m, "event.action", e.Task)
Copy link
Contributor Author

@webmat webmat Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample values for Task/event.action:

  • Logon
  • Logoff
  • Special Logon
  • Credential Validation

However this field was set less than 50% of the time, in my playing around...

@@ -139,7 +138,7 @@ func (e Record) ToEvent() beat.Event {
// MapStr.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

addOptional(m, "processor_id", e.Execution.ProcessorID)
addOptional(m, "session_id", e.Execution.SessionID)
addOptional(m, "kernel_time", e.Execution.KernelTime)
addOptional(m, "user_time", e.Execution.UserTime)
addOptional(m, "processor_time", e.Execution.ProcessorTime)

// debugf("EventData field inspection %v", e.EventData.TargetUserName)
// debugf("EventData inspection full %v", e.EventData)
// addOptional(m, "user_dbg.name", e.EventData.TargetUserName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to be able to define this accessor, not sure how, see event.go.

@@ -133,6 +133,7 @@ func (t *TimeCreated) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error
// EventData contains the event data. The EventData section is used if the
// message provider template does not contain a UserData section.
type EventData struct {
// TargetUserName string `xml:"TargetUserName,attr"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with building these structs backed by a document. That's what I tried.

@webmat
Copy link
Contributor Author

webmat commented Jan 30, 2019

Here's two login events as parsed by the code in this PR, in case this helps understand the data.

login-events.zip

There's no data.json files in Winlogbeat, right? May be useful to add that at some point :-)


// Additional mapping ideas
// user.id
// - event_data.TargetUserSid (S-1-5-21-3541430928-2051711210-1391384369-1001)
Copy link
Member

@andrewkroh andrewkroh Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on anything in event_data is tricky because the value and meaning of these fields change between sources and even event_ids. But it's possible to do if we research the event manifests and add some conditionals.

I do think I agree that that the current values for user.{name,id} are not correct for ECS and correlation purposes. In most cases these values correspond to the user that the logging process is running as and not to the user that the activity is reporting on. In many cases the user.name is SYSTEM and that's not helpful.

The process_id is the PID of the logging process and again this isn't helpful for when you are trying to find events reported about a process. I think we need a place to put this information that's not process.* (or simply drop it).

[cutting it short, meeting time]

@ruflin
Copy link
Contributor

ruflin commented Jan 31, 2019

@webmat Thanks for the zip. To follow ECS I'm wonder if winlogbeat should populate any fields on the top level or should we start to prefix them with windows or winlog.?

@webmat
Copy link
Contributor Author

webmat commented Jan 31, 2019

@ruflin Yes this is a very good point. I've found it strange to get into this event stream with lots of fields at the top level.

Although I've been hesitant to propose nesting this under a windows-centric name, because why would Windows be nested, and not Linux/Posix?

Also playing into this is the fact that the Posix ones are all nested at various places because there's no single way to do logging in Posix environments. Whereas on Windows, as far as I understand, almost everything uses the windows event log. Akin other logging transports: syslog specific stuff is nested under syslog.* or journald under journald.*.

So in that sense, perhaps we could indeed nest everything under "winlog.*", as the log event transport. And I would really like to take the opportunity to rename "channel" to "channel" (not log_name) and "Provider Name" to "provider_name (not source_name).

@andrewkroh WDYT?

Note that I'll also be talking with one of our resident Windows security expert tomorrow, and discuss some of these things.

The changes we have time to do prior to FF will be limited, but they can set the stage for a more consistent experience in the future.

@webmat
Copy link
Contributor Author

webmat commented Feb 1, 2019

After today's discussion, here's a radical update draft. Note that there's no attempt to set up the aliases yet & so on. I've just:

  • re-nested everything Windows log specific under winlog.*
  • Still kept around the already mapped ECS fields
  • Removed some of the incorrect mappings (e.g. pids were of the event log service, not the event's PID)
  • Updated my comment for the additional crap we can dig out of event_data

And here's a zip file with the same two events types as yesterday, an ssh login and a desktop login.

Let me know what you think.
login-events.zip

@ruflin
Copy link
Contributor

ruflin commented Feb 1, 2019

I'm good with the nesting especially if we have migration aliases around for the old fields. Lets see what @andrewkroh thinks.

@webmat webmat added review and removed in progress Pull request is currently in progress. labels Feb 1, 2019
@webmat
Copy link
Contributor Author

webmat commented Feb 1, 2019

@andrewkroh @ruflin Hey guys, I would say this is feature complete. Please re-read the body of the PR, I've adjusted it to the approach and the thinking I've taken.

Code, fields and tests should be 100% done. If you see a problem in there, it's an actual bug.

Still to do: changelog, ecs-migration.yml (not forgetting the exceptions like type changes, etc), and adjust the documentation.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor questions, overall LGTM.

addOptional(m, "kernel_time", e.Execution.KernelTime)
addOptional(m, "user_time", e.Execution.UserTime)
addOptional(m, "processor_time", e.Execution.ProcessorTime)
addOptional(win, "process.pid", e.Execution.ProcessID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ECS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that's not the "userland" PID, this is always the PID of the Windows service processing log events.

The interesting process.pid is at event_data.ProcessId, and we've decided not to dig in there right from Go / from the deployed agent.

"record_number": strconv.FormatUint(e.RecordID, 10),
// Windows Log Specific data
win := common.MapStr{
"channel": e.Channel,
"event_id": e.EventIdentifier.ID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a message code (a given kind of message always has the same event id), it's not a unique ID.

So I've actually taken the liberty to map it to event.code, and will propose this gets added to ECS soon :-) So this nested one really should remain there.

You may push back on event.code, however. I'm good if you want to take it out for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you prefer 👍

}

m.Put("event.kind", "event")
m.Put("event.dataset", fmt.Sprintf("%v.%v", e.API, strings.ToLower(e.Channel)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the field / json structure in each even heavily differ? If not I'm not sure if we should use dataset for this.

Question can also be asked differently: Could there be 1 visualisation which contains data from multiple dataset events?

m.Put("event.code", e.EventIdentifier.ID)
addOptional(m, "event.action", e.Task)

m.Put("host.hostname", e.Computer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work as winlogbeat is always run on the same host (no remote connection) 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Winlogbeat can't accept remotely, but the Windows Event Log subsystem can do federation. So it may in fact be processing messages that come from remote hosts. I should double-check if Computer will always be the host at the origin of the event.

(host 1: Winlog) => (host 2: Winlog => Winlogbeat)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this host.hostname value get overwritten by the libbeat publisher pipeline?

Like this stuff

// setup 7: add agent metadata
if !config.SkipAgentMetadata {
needsCopy := global.alwaysCopy || global.processors != nil
processors.add(actions.NewAddFields(createAgentFields(info), needsCopy))
}
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're probably right. I'll remove for now.

But the sad thing is that this value is likely the "true" source of the event. add_host_metadata will set this value to the machine running winlogbeat, and it won't be right, if this windows machine is receiving logs from other machines.

Note that this is a conundrum we've faced everywhere in Beats as well, and don't have a good answer to, yet.

@webmat webmat force-pushed the winlogbeat-ecs branch 2 times, most recently from cf9af57 to 3901390 Compare February 4, 2019 21:48
@webmat
Copy link
Contributor Author

webmat commented Feb 4, 2019

@andrewkroh @ruflin Ok this is ready for final review, including documentation, ecs-mig.yml & so on. There shouldn't be anything odd remaining. If that's the case, it's a bug.

dataset and host.hostname and the debugging statement are removed. I also did some minor cleanup in the field definitions.

@webmat
Copy link
Contributor Author

webmat commented Feb 5, 2019

Ok, I've addressed the latest feedback.

As discussed, the dashboard error was on my size.

Just re-pushed with message_error migrated to error.message, error.code populated, and a dashboard adjusted to the field new names. I imported / re-exported, so the diff isn't clean. But it looks like this:

winlogbeat dashboard

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It doesn't have to be in this PR, but I do want to have a breaking changes doc page for Winlogbeat that lists these changes (lots of users will need to update their LS configs).

# Fields moved without adjusting the name

- from: activity_id
to: winlog.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's missing the ending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye! Fixed :-)

laser eyes gif

@webmat
Copy link
Contributor Author

webmat commented Feb 5, 2019

Noted for adding a breaking changes page like Packetbeat's.

Ok if this is done post-FF? I'll see if I can get this done tomorrow, but not sure.

@ruflin
Copy link
Contributor

ruflin commented Feb 5, 2019

I'm hoping for the breaking changes we can use #10405

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Not sure why the dashboard needed adjustments as it should be automatically changed by the migration script. But please don't revert as you already have done it.

For the breaking changes, see my comment above and no rush on this one. Doc changes can go in at any time.

@webmat
Copy link
Contributor Author

webmat commented Feb 5, 2019

Yes, noted separately for the breaking changes docs. Hopefully your script can do most of the work indeed.

@webmat webmat merged commit b35b1ad into elastic:master Feb 5, 2019
@webmat webmat deleted the winlogbeat-ecs branch February 5, 2019 14:10
"attributes": {
"description": "",
"hits": 0,
"kibanaSavedObjectMeta": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webmat I'm just spotting this now but this a dashboard that is not decoded :-(

webmat pushed a commit to elastic/ecs that referenced this pull request May 1, 2019
- Added `event.code` (See elastic/beats#10333)
- Added `event.sequence` (See #129, elastic/beats#10760)
- Added `event.provider` (See #321)
  - Note: Beats modules currently put the Syslog "programname" in `process.name` which is sometimes accurate, sometimes not (e.g. "kernel"). event.provider would be a better field for this.
- Explain event.module and event.dataset without mentioning Beats
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants