Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS software packages and runtime dependencies #532

Merged
merged 12 commits into from
Oct 2, 2019

Conversation

simitt
Copy link
Contributor

@simitt simitt commented Aug 27, 2019

Initial draft for getting the process started to define ECS fields for installed software packages / runtime dependencies.

adressing #515

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, thanks @simitt!

Left a bunch of comments, sometimes calling out to other folks for opinions. But overall I love the way this looks :-)

Perhaps we could also mention in the field set description that package URLs (the package's specific URL, not the repo's) can be recorded in url.original, when available.

schemas/user.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Show resolved Hide resolved
@ruflin
Copy link
Contributor

ruflin commented Aug 28, 2019

Great idea to get this into ECS. @andrewkroh you might be able to chime in here as I think auditbeat also collects some of this info?

schemas/package.yml Outdated Show resolved Hide resolved
Copy link

@cwurm cwurm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this! Left a few comments.

Two fields I wish I had added for Auditbeat's system/package that could also be relevant here:

  1. Package type or origin. Values: RPM, DPKG, Homebrew, NPM, Rubygem, etc.
  2. Path (where the package is located): e.g. /usr/local/Cellar/go/1.10.3/

schemas/package.yml Outdated Show resolved Hide resolved
@simitt
Copy link
Contributor Author

simitt commented Aug 28, 2019

1. Package type or origin. Values: RPM, DPKG, Homebrew, NPM, Rubygem, etc.

@cwurm how about adding package.manager for this kind of information?

2. Path (where the package is located): e.g. `/usr/local/Cellar/go/1.10.3/`

I like the idea, will add it.

@cwurm
Copy link

cwurm commented Aug 28, 2019

1. Package type or origin. Values: RPM, DPKG, Homebrew, NPM, Rubygem, etc.

@cwurm how about adding package.manager for this kind of information?

I'm not sure I have a strong opinion, but I think package.type would be more general. Not all packages are managed by a package manager - e.g. macOS applications under /Applications or Windows programs under C:\ Program Files both of which can be just a download.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there should be a release field for release number used in Redhat packaging? Or would that be too specific to one packaging type.

CHANGELOG.next.md Outdated Show resolved Hide resolved
CHANGELOG.next.md Outdated Show resolved Hide resolved
schemas/package.yml Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Show resolved Hide resolved
schemas/package.yml Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
@webmat
Copy link
Contributor

webmat commented Aug 30, 2019

@andrewkroh Can you give an example of such a release string?

Is this meant to support their extra long version strings, where they mix the original package version + their backporting patches on top?

level: core
type: keyword
description: URL from where the package was installed.
example: https://docs.docker.com/compose/
Copy link
Contributor

@webmat webmat Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing about the URL, I would give an example of a full path to the package archive, all the way to the filename. E.g. "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.3.1-darwin-x86_64.tar.gz"

If places like Docker Hub don't provide that, perhaps that's acceptable. But I expect this URL to be complete as often as possible. So let's lead by example:. Pun intended ;-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the thread here, this might now refer to the "generic URL" of the package and not the URL where it's installed from (what was remote_repository when this PR started). If so, the description should be updated to say something neutral like "URL of this package" / "URL with more information about this package".

On where the package was installed from - I don't know any data source (DPKG, RPM, Homebrew, Chocolatey) that provides this? So I would rather keep that out of ECS, and we can always add it later when we have a concrete example and need to add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if the exact package source URL isn't available, I'll defer to you here.

But I agree we could perhaps flesh out the description of the field a little more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially misunderstanding the purpose of this field. Its description is not clear enough, and the example is misleading (Docker also has a registry of packages in Docker Hub).

Please remove the field from this PR, and reopen a PR with just this field.

When reopening the new PR, please adjust the following things to remove the ambiguity:

  • clarify the description, perhaps something like "Main URL of the software included in the package"
  • change the example so it's aligned with the other examples in the package field set. In other words, this should be Golang's URL, which according to Homebrew, is simply "https://golang.org".

Another discussion point that I'd like to hash out: instead of partially nesting the URL field set here, I think we could use another pattern we've started using elsewhere, and name the field .reference. WDYT?

@andrewkroh
Copy link
Member

@andrewkroh Can you give an example of such a release string?

2.6.32-754.el6

In this example 2.3.6 is the package version. and 754.el6 is the release. The version is generally the upstream software version (like the linux kernel version) and the release is the distribution specific value that gets rev'ed when a patch is backported.

schemas/package.yml Outdated Show resolved Hide resolved
@cwurm
Copy link

cwurm commented Sep 2, 2019

Do you think there should be a release field for release number used in Redhat packaging? Or would that be too specific to one packaging type.

I personally think it would be too specific. Only RPM has it. DPKG has a similar one though, called revision (not parsed out by Auditbeat, but libpdkg can do it).

My preference would be to keep ECS simple and not include it at all, with users always welcome to add extra fields that are useful to them (but are not required or even expected). But if we want to have it, I'd rather have something that covers RPM, DPKG, and anything else (and maybe that's release, revision, or something else generic).

@webmat
Copy link
Contributor

webmat commented Sep 3, 2019

On the distro package versions, do rpm and dpkg give easy access to the origin/upstream version number as well? In other words, to get the 2.6.32 in Andrew's example, do we need to parse it out of the release version? Or is there another attribute that contains 2.6.32?

If extra work is required to parse out the origin version number, it will get nasty real quick. E.g. what if some origin packages contain - in their version numbers? I wouldn't be surprised at all 😂

So I'm wondering if we need to specify this at all, at this time. Perhaps just having package.version without specifics is enough for now? Platforms that have full release versions will get that in package.version, and platforms that only have "simple" versions will get that in package.version...

If on the other hand we can get the origin version easily from rpm/dpkg, I'd be open to discuss the addition of a second field, for distro's release version as well.

@simitt
Copy link
Contributor Author

simitt commented Sep 10, 2019

Do you think there should be a release field for release number used in Redhat packaging? Or would that be too specific to one packaging type.

@andrewkroh have you seen the suggested detailed_version? I originally suggested it with the intention of having fine granular information for unreleased versions, but maybe this could serve both purposes.

* Move to extended for all fields
* Rename url.original to url.full per comment
* Remove remote_repository for now as per discussion
* Add note how to fill values for license
@andrewkroh
Copy link
Member

I think detailed_version could work for RHEL packaging assuming we document it should be formatted as %{VERSION}-%{RELEASE}. Like the value of rpm -qp --queryformat '%{VERSION}-%{RELEASE}' example.rpm. Then since we have a separate field for architecture I think this would cover the RPM details.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few small changes to request, to package.type, package.checksum and package.license.

The fields package.detailed_version and package.url.full require more discussion. Please remove them from this PR for now, and submit a new PR for each.

This way we can can get most of "package" merged in short order, and hash out the details of the other two fields in the new PRs.

schemas/package.yml Outdated Show resolved Hide resolved
level: core
type: keyword
description: URL from where the package was installed.
example: https://docs.docker.com/compose/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially misunderstanding the purpose of this field. Its description is not clear enough, and the example is misleading (Docker also has a registry of packages in Docker Hub).

Please remove the field from this PR, and reopen a PR with just this field.

When reopening the new PR, please adjust the following things to remove the ambiguity:

  • clarify the description, perhaps something like "Main URL of the software included in the package"
  • change the example so it's aligned with the other examples in the package field set. In other words, this should be Golang's URL, which according to Homebrew, is simply "https://golang.org".

Another discussion point that I'd like to hash out: instead of partially nesting the URL field set here, I think we could use another pattern we've started using elsewhere, and name the field .reference. WDYT?

schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
@simitt
Copy link
Contributor Author

simitt commented Oct 2, 2019

I incorporated all the feedback, from my perspective we need to come to a conclusion in the following points before merging:

@webmat I don't have strong opinions about #532 (comment), please let me know how you prefer to move forward.

@cwurm I gave the package type | origin | manager discussion another thought. Reading package.type I think of .deb , .rpm, .tgz and not of the way the package was installed. Maybe using package.origin or package.install_type (we already use package.install_scope, so that would fit) would be preferable here. WDYT?

@simitt simitt changed the title [WIP] first draft for ECS software packages and runtime dependencies ECS software packages and runtime dependencies Oct 2, 2019
@simitt simitt marked this pull request as ready for review October 2, 2019 07:10
@cwurm
Copy link

cwurm commented Oct 2, 2019

Reading package.type I think of .deb , .rpm, .tgz and not of the way the package was installed.

I struggle with that - I'm not sure we could always fill it with a sensible value. What would the value for a Homebrew package be? When installed, it's just a folder in /usr/local/Cellar. Or a Windows program? It's just an .exe in C:\Program Files - but that was not how it was installed and it's probably impossible to know what the installation method was (could have been an .msi, or just copying the .exe).

@webmat
Copy link
Contributor

webmat commented Oct 2, 2019

@simitt Thanks for all of the adjustments!

It looks like package.type needs more thought as well. Please take it out of the PR and submit a new one. We'll take the time to hash this one out stress-free ;-)

@webmat
Copy link
Contributor

webmat commented Oct 2, 2019

Note: Auditbeat will benefit from the package field set, and currently doesn't seem to capture the package type as an attribute (based on events from a Debian host).

@simitt
Copy link
Contributor Author

simitt commented Oct 2, 2019

@webmat I removed package.type, the PR is ready to merge if you approve.

schemas/package.yml Outdated Show resolved Hide resolved
schemas/package.yml Outdated Show resolved Hide resolved
level: extended
type: keyword
description: Package architecture.
example: runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86_64

@webmat
Copy link
Contributor

webmat commented Oct 2, 2019

Thanks for all the last minute adjustments 🙂

@cwurm
Copy link

cwurm commented Oct 2, 2019

I think without the package type (deb, rpm, homebrew, etc.) it's hard to distinguish between different packages? For example, there is an elasticsearch package in pretty much every package manager, but we cannot distinguish between them without the package type.

Also, pretty much every package manager has a URL/homepage for the package (e.g. https://www.elastic.co/products/elasticsearch) and that's very useful for finding out what the package is (ok, elasticsearch is pretty obvious, but there are many obscure packages, e.g. libewf). In my opinion, this field would be a lot more useful than let's say the package architecture (almost always amd64 these days I assume).

Do we have a need to merge this now (i.e. is something waiting and ready to implement these fields), or can we wait and get these useful fields in?

@webmat
Copy link
Contributor

webmat commented Oct 2, 2019

@cwurm As far as I can tell, PR as it stands covers all of what Auditbeat captures about packages, except for entity_id. Is the absence of a place for the URL a blocker for getting this in 1.2?

Auditbeat 7.4 ECS
system.audit.package.arch package.architecture
system.audit.package.entity_id -
system.audit.package.name package.name
system.audit.package.size package.size
system.audit.package.summary package.description
system.audit.package.version package.version

@cwurm
Copy link

cwurm commented Oct 2, 2019

@webmat That the package dataset does not have a type field is an oversight and my hope is that when we touch it again we can add it. We can get this into ECS 1.2, then get package.type into 1.3. For the development, it doesn't matter all that much if something is released in ECS, just that it's agreed upon and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants