Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auditbeat: system/process module backed by quark #42032

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

haesbaert
Copy link
Contributor

@haesbaert haesbaert commented Dec 13, 2024

Proposed commit message

This introduces a new provider for the sytem/process module in linux.

The main motivation is to address some of the limitations of the current implementation. The gosysinfo provider sends state reports by scraping /proc from time to time, so it loses all short lived processes. Some customers also would like to have full telemetry but can't run auditd for various reasons.

As a bonus we get some extra ECS fields that were not available before.

MAIN DIFFERENCES:

  • Publishes every process in the system, regardless of lifespan.
  • Publishes exec events for an existing process (without a fork).
  • Aggregates fork+exec+exit within one event.
  • Adds event.exit_code for processes that exited, can't express exit_time in ECS?
  • Include the original process.args, sysinfo reports args that were fetched when it parsed /proc, so a userland process can masquerade itself. For the initial /proc scraping we report the current value like sysinfo. We can't get the original value since the kernel overwrites it, if you wanna have fun: https://github.com/systemd/systemd/blob/main/src/basic/argv-util.c#L165
  • Adds process.args_count.
  • Adds process.interactive and if true, process.tty.char_device.{major,minor}
  • Attempts to hash all processes, not just long lived ones.
  • Hashing is not rate-limited anymore, but it's cached and refreshed based on metadata. It's a LRU keyed by path and refreshed if the metadata of the file changes, statx(2) if the kernel supports, stat(2) otherwise.
  • No more periodic state reports, only initial batch.
  • No more saving the timestamp of the last state-report in disk.
  • No more /proc parsing during runtime, only on boot.

MISSING:

  • Unify entity id with sessionview.
  • Publish metrics from quark.Stats().
  • Docs.
  • Properly define config options and names.

EXTRA CHANGES:

  • Added statx(2) to seccomp_linux so we can properly use CachedHasher.
  • Updated quark to 0.3 so we have namespace inode numbers.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Run auditbeat on linux with the following configuration:

auditbeat.modules:

- module: system
  datasets:
    - process
  process.backend: "quark"

Related issues

Integrated PRs related to this

List of previous work done to minimize the size of this PR

Screenshots

Non interactive SSH

Below is a shot of a non interactive ssh session, done with ssh fc39vm /bin/echo hi from quarkio.
It shows the intermediary processes of sshd until we fork the shell and echo, the interesting bits is that we can see a process that forked+execed and then execs again: sshd forks+execs mksh,, which in turn execs /bin/echo, without forking.
ssh_nonint

Comparison against the sysinfo provider for a long lived process:

Here we run a long sleep and just compare the events against the existing provider on 8.14.3:
vs_old

On event.type, event.action and others

I've tried to keep things as close as possible to the old provider, but it's really just a suggestion at this point and it's likely we want to change things

event.type gosysinfo quark
fork start start
fork+exec start [start, change]
short fork+exec+exit N/A [start, change, end]
short fork+exit N/A [start, end]
existing processes info info
exec only N/A change
exec+exit end [change, end]
event.action gosysinfo quark
fork process_started process_started
fork+exec process_started process_started
short fork+exec+exit N/A process_ran
short fork+exit N/A process_ran
existing processes existing_process existing_process
exec only N/A process_changed_image
exec+exit end process_stopped

As you can see, expressing things in event.action is not great, I'm
all open to suggestions, life would be easier if it could be an
array. I've tried to compromise more states into fewer words.
process_changed_image might look a bit weird, but it's less ambiguous
than "executed". Again really open to suggestions here and I have no
strong feelings about it.

event.kind is now always event as there is no more state reports every X seconds.
The initial state report at init remains, but it's also event.

On the state of this PR

This doesn't include the documentation bits, I'd like to do this in a subsequent PR once the naming, config and whatnot is decided.
We should unify process.entity_id with sessionviewer, and we can do it in this PR, worth noting that the gosysinfo backend calculates things differently as well, so this is no worse than that.

I'm going out on holidays, but I'm taking this PR out of draft so that we can start the discussion and interested parties can test it.

This introduces a new provider for the sytem/process module in linux.

The main motivation is to address some of the limitations of the current
implementation. The gosysinfo provider sends state reports by scraping /proc
from time to time, so it loses all short lived processes. Some customers also
would like to have full telemetry but can't run auditd for various reasons.

As a bonus we get some extra ECS fields that were not available before.

MAIN DIFFERENCES:
 * Publishes every process in the system, regardless of lifespan.
 * Publishes exec events for an existing process (without a fork).
 * Aggregates fork+exec+exit within one event.
 * Adds event.exit_code for processes that exited, can't express exit_time in ECS?
 * Include the original process.args, sysinfo reports args that were
   fetched when it parsed /proc, so a userland process can masquerade
   itself. For the initial /proc scraping we report the current value like
   sysinfo. We can't get the original value since the kernel
   overwrites it, if you wanna have fun:
   https://github.com/systemd/systemd/blob/main/src/basic/argv-util.c#L165
 * Adds process.args_count.
 * Adds process.interactive and if true, process.tty.char_device.{major,minor}
 * Attempts to hash all processes, not just long lived ones.
 * Hashing is not rate-limited anymore, but it's cached and refreshed
   based on metadata. It's a LRU keyed by path and refreshed if the
   metadata of the file changes, statx(2) if the kernel supports,
   stat(2) otherwise.
 * No more periodic state reports, only initial batch.
 * No more saving the timestamp of the last state-report in disk.
 * No more /proc parsing during runtime, only on boot.

MISSING:
 * Unify entity id with sessionview.
 * Publish metrics from quark.Stats().
 * Docs.
 * Properly define config options and names.

EXTRA CHANGES:
 * Added statx(2) to seccomp_linux so we can properly use CachedHasher.
 * Updated quark to 0.3 so we have namespace inode numbers.
@haesbaert haesbaert added enhancement Team:Security-Linux Platform Linux Platform Team in Security Solution labels Dec 13, 2024
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 13, 2024
Copy link
Contributor

mergify bot commented Dec 13, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @haesbaert? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Dec 13, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 13, 2024
@haesbaert haesbaert marked this pull request as ready for review December 13, 2024 11:11
@haesbaert haesbaert requested review from a team as code owners December 13, 2024 11:11
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

if err := c.HasherConfig.Validate(); err != nil {
return err
}
if c.Backend != "quark" && c.Backend != "proc" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent with add_session_metadata in terms of backend option names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify enhancement Team:Security-Linux Platform Linux Platform Team in Security Solution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants