[Auditbeat] System module: Uniquely identify processes, sockets, users, and packages #10463

cwurm · 2019-01-31T15:31:33Z

While putting together dashboards for the Auditbeat system module I realized that with the current data model it's not possible to visualize the number of processes, sockets, users, and packages since there is no way to identify a unique entity.

For example, each process can and often will have multiple events of different types (when it starts, when it ends, and when it's reported by a regular state update). There's no one identifying field at the moment to count them properly: process names, executables, args, pid, ppid are all not unique. The same for sockets, users, and packages. Only the host dataset has a host.id field already that should be unique.

I'm proposing to introduce new fields that identify those entities.

As a field name, I'm still torn between:

{entity}.id - we already have host.id doing the same for hosts, but unfortunately, there is user.id so that wouldn't work there.
{entity}.hash - not filled anywhere afaik, so no conflicts. But doesn't follow the convention of host.id.

As for the value, I'm thinking a hash of some of the fields of the entity and the host.id:

Process: pid + start + host.id
Socket: inode + source.ip + source.port + destination.ip + destination.port + host.id. The possibility of inode reuse with the same IP/port combinations seems remote.
User: user.id + user.name + host.id. At least on Linux, this is not foolproof as the user could be deleted and re-created. But I don't think there's really anything we can do since on Linux /etc/passwd is just a text file that can be theoretically edited at will. More likely is that multiple users would share a UID ("virtual" users, some mailservers do this, e.g. Dovecot) in which case our UID to username lookup in various places would probably be off already.
Package: name + version + host.id. I guess it would be possible to remove a package and install it with the same name and version but from a different source and we'd treat it as the same, but again I don't think there's much we can do about that. At the moment, at least.
Login: Just a note here, the login does not send state and so cardinality is not a problem - every event is unique. That only applies when all the data is from the system module though, not when logins are also reported by the auditd module. For that, we would need an ID that is stable across modules. But I think we can treat that as out of scope for now.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-01-31T15:31:35Z

Pinging @elastic/secops

tsg · 2019-01-31T16:02:13Z

@webmat @MikePaquette Is there any ECS field that would fit this purpose?

I think hash might also potentially conflict, for example, with file.hash if we ever add something like that.

tsg · 2019-01-31T16:03:58Z

I see now that user.hash exists in ECS:

Unique user hash to correlate information for a user in anonymized form. Useful if user.id or user.name contain confidential information and cannot be used.

Sort of matches, but not quite..

tsg · 2019-01-31T16:15:21Z

How about unique_id? So you'd have process.unique_id that makes it clear how it is different from process.id.

andrewkroh · 2019-01-31T16:34:02Z

I was literally thinking to use entity.id, but some events will have multiple entities. So along the same lines as @tsg's suggestion, how about process.entity_id or generally {thing}.entity_id?

BTW in Metricbeat's windows-service metricset we do this to create a service.id for the dashboards. It's a sha256(MachineGUID + ServiceName)[:10].

FrankHassanabad · 2019-01-31T16:48:35Z

I think this is uniquely identifying a process by each instance it is created, right?

Process: pid + start + host.id

Since you're using process id and start. So for example, if I run wget repeatedly on a box it will show up with different identities representing each individual process.

cwurm · 2019-01-31T17:05:11Z

@FrankHassanabad yes, that's the idea

cwurm · 2019-01-31T17:57:31Z

I don't have a strong preference on the field name. entity_id sounds good to me since that's exactly what is.

Another question is what hashing algorithm we should use. Internally, the Auditbeat system module uses xxhash for change detection (e.g. to detect if a running process has already existed the last time around). However, since this use is more exposed (the value will be stored in Elasticsearch, together with other data that could be from third parties) maybe there's a case to be made for something more "standard", e.g. SHA-1 or SHA-256? SHA comes out of the box for pretty much every programming language, it's FIPS compliant, it's what other tools in the wider ecosystem accept (e.g. it's one of the file hashes Virus Total accepts), and SHA-1 is the default for the file hash in Auditbeat's file_integrity module. Here as well, I don't have a strong opinion. Curious what others think.

tsg · 2019-01-31T18:23:57Z

+1 on entity_id.

I don't have a preference for the hash algorithm either. I agree the SHA1 will likely make troubleshooting easier. Is speed the argument for xxhash?

webmat · 2019-01-31T19:32:08Z

I like where this discussion is going. ECS would not define what the content of this field would be, as it can be implementation-specific.

For the naming of the field, I actually quite like entity_id. I think it makes it clear that this is a persistent identity, and doesn't give the incorrect impression that every event should have a distinct/unique value in the field. My vote is for entity_id. This field should be keyword.

I'll add this to the mountain of things to address officially in ECS soon ;-)

webmat · 2019-01-31T19:33:19Z

ping @ruflin new pattern emerging ^

cwurm · 2019-01-31T23:57:07Z

I don't have a preference for the hash algorithm either. I agree the SHA1 will likely make troubleshooting easier. Is speed the argument for xxhash?

I think so.

cwurm · 2019-02-01T14:07:51Z

Ok, so how about we use entity_id and SHA-256? Git is moving to it as well.

Implements `{entity}.entity_id` as a SHA-256 hash as proposed in #10463. Closes #10463.

Implements `{entity}.entity_id` as a SHA-256 hash as proposed in elastic#10463. Closes elastic#10463. (cherry picked from commit c047ef7)

Implements `{entity}.entity_id` as a SHA-256 hash as proposed in #10463. Closes #10463. (cherry picked from commit c047ef7)

cwurm added discuss Issue needs further discussion. Auditbeat SecOps labels Jan 31, 2019

cwurm mentioned this issue Feb 1, 2019

[Auditbeat] System module: Add entity_id fields #10500

Merged

cwurm closed this as completed in #10500 Feb 5, 2019

cwurm pushed a commit that referenced this issue Feb 5, 2019

[Auditbeat] System module: Add entity_id fields (#10500)

c047ef7

Implements `{entity}.entity_id` as a SHA-256 hash as proposed in #10463. Closes #10463.

cwurm mentioned this issue Feb 5, 2019

Cherry-pick #10500 to 6.x: [Auditbeat] System module: Add entity_id fields #10569

Closed

cwurm pushed a commit to cwurm/beats that referenced this issue Feb 5, 2019

[Auditbeat] System module: Add entity_id fields (elastic#10500)

6e9d392

Implements `{entity}.entity_id` as a SHA-256 hash as proposed in elastic#10463. Closes elastic#10463. (cherry picked from commit c047ef7)

cwurm mentioned this issue Feb 5, 2019

[Auditbeat] Cherry-pick #10500 to 6.x: System module: Add entity_id fields #10570

Merged

cwurm pushed a commit that referenced this issue Feb 5, 2019

[Auditbeat] System module: Add entity_id fields (#10500) (#10570)

f7c44b1

Implements `{entity}.entity_id` as a SHA-256 hash as proposed in #10463. Closes #10463. (cherry picked from commit c047ef7)

cwurm mentioned this issue Mar 20, 2019

[SecOps][Discuss] Shorten entity IDs #11348

Closed

cwurm mentioned this issue Aug 28, 2019

ECS software packages and runtime dependencies elastic/ecs#532

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auditbeat] System module: Uniquely identify processes, sockets, users, and packages #10463

[Auditbeat] System module: Uniquely identify processes, sockets, users, and packages #10463

cwurm commented Jan 31, 2019

elasticmachine commented Jan 31, 2019

tsg commented Jan 31, 2019

tsg commented Jan 31, 2019 •

edited

Loading

tsg commented Jan 31, 2019

andrewkroh commented Jan 31, 2019

FrankHassanabad commented Jan 31, 2019

cwurm commented Jan 31, 2019

cwurm commented Jan 31, 2019

tsg commented Jan 31, 2019

webmat commented Jan 31, 2019

webmat commented Jan 31, 2019

cwurm commented Jan 31, 2019

cwurm commented Feb 1, 2019

[Auditbeat] System module: Uniquely identify processes, sockets, users, and packages #10463

[Auditbeat] System module: Uniquely identify processes, sockets, users, and packages #10463

Comments

cwurm commented Jan 31, 2019

elasticmachine commented Jan 31, 2019

tsg commented Jan 31, 2019

tsg commented Jan 31, 2019 • edited Loading

tsg commented Jan 31, 2019

andrewkroh commented Jan 31, 2019

FrankHassanabad commented Jan 31, 2019

cwurm commented Jan 31, 2019

cwurm commented Jan 31, 2019

tsg commented Jan 31, 2019

webmat commented Jan 31, 2019

webmat commented Jan 31, 2019

cwurm commented Jan 31, 2019

cwurm commented Feb 1, 2019

tsg commented Jan 31, 2019 •

edited

Loading