Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Setuid to regular user / lower capabilities when possible #27878

Merged
merged 51 commits into from
Oct 13, 2021

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Sep 11, 2021

Possible partial fix for #27648 , this PR:

  1. Detects if the user is running as root then:
  2. Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
  3. Attempts to Setuid , Setgid and Setgroups to that user / groups
  4. Invokes setcap to drop all privileges except NET_RAW+ep

This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Test this from the x-pack/heartbeat directory on Linux, not mac (you need linux capabilities). So, docker makes a lot of sense.

Run:

cd x-pack/heartbeat
mage package

Then, checkout https://github.com/elastic/synthetics-demo

Then, cd heartbeat in that repo, and modify run.sh to point at the 8.0.0 version of the docker image, which mage package built earlier. Run that like so:

./run.sh CLOUD_ID CLOUD_AUTH and check the results in elastic cloud

Then, change the username for docker to root and verify it still works.

In the logs you should find the line: 2021-09-13T16:09:31.748Z INFO beater/heartbeat.go:86 Effective user/group ids: 1000/1000, with groups: [], and with capabilities: cap_net_raw=ep indicating that heartbeat is running successfully with the net_raw capability and with the correct userid 1000/1000

@andrewvc andrewvc added bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team Team:Elastic-Agent Label for the Agent team labels Sep 11, 2021
@andrewvc andrewvc requested a review from a team as a code owner September 11, 2021 01:49
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 11, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 11, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-10-12T18:39:53.061+0000

  • Duration: 239 min 11 sec

  • Commit: 70fcce6

Test stats 🧪

Test Results
Failed 0
Passed 53759
Skipped 5346
Total 59105

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this change seems to currently only affect heartbeat. If my understanding is correct the change, heartbeat itself drops its own privilege if the BEAT_LOCAL_USER is set, otherwise nothing happens.

heartbeat/monitors/active/icmp/stdloop.go Outdated Show resolved Hide resolved
x-pack/heartbeat/monitors/browser/source/local.go Outdated Show resolved Hide resolved
"sigaltstack",
"SIGINT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at comment in line 19 I assume there was never a good reason that these were here except copy/paste from somewhere. We likely should do the same for other beats ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually broke the syscall filters. Will update the main issue description in a bit.

x-pack/heartbeat/seccomp_linux.go Outdated Show resolved Hide resolved
dev-tools/notice/overrides.json Outdated Show resolved Hide resolved
@@ -30,6 +30,7 @@ FROM {{ .from }}
# Contains the elastic agent image variant, an empty string for the standard variant
# or "complete" for the bigger one.
ENV ELASTIC_AGENT_IMAGE_VARIANT={{.Variant}}
ENV BEAT_LOCAL_USER={ .user }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this just LOCAL_USER? I assume it could apply to any subprocess, it does not have to be a Beat? Or LOCAL_USER_SUBPROCESS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to BEAT_SETUID_AS I think it should be obscure and not something with any risk of clashing.

@andrewvc andrewvc marked this pull request as draft September 13, 2021 12:45
@andrewvc
Copy link
Contributor Author

andrewvc commented Oct 1, 2021

So, after much debugging I figured out where things were going wrong with the build. On arm64 (only?) The {{ beat }} binary is actually a symlink and setcap cannot follow this. Unfortunately there's something strange going on with our build system where stdout/stderr are not logged during a docker build, so I had to figure this out via an emulated arm64 docker build.

At any rate, the latest commit fixes this, and also takes off the arm restriction. See code comments for details.

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Havent ran it locally, But overall code LGTM

func init() {
// Here we set a bunch of linux specific security stuff.
// In the context of a container, where users frequently run as root, we follow BEAT_SETUID_AS to setuid/gid
// and add capabilities to make this actually run as a regular user. This also helps node in synthetics, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// and add capabilities to make this actually run as a regular user. This also helps node in synthetics, which
// and add capabilities to make this actually run as a regular user. This also helps Node.js in synthetics, which

// in the container, but we need to repeat that here.
err = syscall.Setgroups([]int{localUserGid, 0})
if err != nil {
return fmt.Errorf("could not prsetgroups: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could not set

heartbeat/security.go Show resolved Hide resolved
return fmt.Errorf("error setting effective setcap: %w", err)
}

// We do not want these capabilities to be inherited by subprocesses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is for Node.js ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, node doesn't need to do raw network stuff :)

return plugin.Plugin{}, fmt.Errorf("script monitors cannot be run as root! Current UID is %s", curUser.Uid)
// We do not use user.Current() which does not reflect setuid changes!
if syscall.Geteuid() == 0 {
euid := syscall.Geteuid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this basically a dead code as its 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch lol

@@ -35,12 +35,10 @@ func create(name string, cfg *common.Config) (p plugin.Plugin, err error) {
logp.Info("Synthetic browser monitor detected! Please note synthetic monitors are a beta feature!")
})

curUser, err := user.Current()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be here even now? If we havent set BEAT_SETUID_AS, do we run as heartbeat user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in that case we're running as root, and we bail with the error message. If that env var isn't set we don't know who to run as and we can't figure it out any other way. Keep in mind heartbeat is only the correct user in the heartbeat docker image. In the elastic-agent container the correct user is elastic-agent

@@ -0,0 +1,12 @@
- name: Todos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file added by mistake?

@mergify
Copy link
Contributor

mergify bot commented Oct 6, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b root-caps upstream/root-caps
git merge upstream/master
git push upstream root-caps

@andrewvc andrewvc merged commit a78a980 into elastic:master Oct 13, 2021
@andrewvc andrewvc deleted the root-caps branch October 13, 2021 00:37
mergify bot pushed a commit that referenced this pull request Oct 13, 2021
…#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

(cherry picked from commit a78a980)

# Conflicts:
#	NOTICE.txt
#	dev-tools/packaging/packages.yml
#	go.mod
newly12 pushed a commit to newly12/beats that referenced this pull request Oct 13, 2021
…elastic#27878)

partial fix for elastic#27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
andrewvc pushed a commit that referenced this pull request Oct 14, 2021
…abilities when possible (#28377)

* [Heartbeat] Setuid to regular user / lower capabilities when possible (#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
andrewvc added a commit to andrewvc/beats that referenced this pull request Oct 18, 2021
andrewvc added a commit to andrewvc/beats that referenced this pull request Oct 18, 2021
fearful-symmetry pushed a commit that referenced this pull request Oct 20, 2021
* singleton sysinfo host to avoid frequently collecting host info

* add Host object to Stats object

* update changelog

* set procStats.host to nil if any error calling sysinfo.Host()

* Update aws-lambda-go library version to 1.13.3 (#28236)

* [cloud][docker] use the private docker namespace (#28286)

* [7.x] [DOCS] Update api_key example on elasticsearch output (#28288)

* packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (#28297)

* seccomp: allow clone3 syscall for x86 (#28117)

clone3 is a linux syscall that is now used by glibc starting version
2.34. It is used when pthread_create() gets called. Current seccomp
filters do not allow this syscall leading to crashes like
runtime/cgo: pthread_create failed: Operation not permitted

See elastic/apm-server#6238 for more details

* Osquerybeat: Improve handling of osquery.autoload file, allow customizations (#28289)

Previously the osquery.autoload file was overwritten every time on
osquerybeat start and stamped with our extension.
After the change we check the content of the file and do not overwrite it on
each osquerybeat start. This allows the user to deploy their own
extensions if their want and start osquery with that.

* Osquerybeat: Runner and Fetcher unit tests (#28290)

* Runner and Fetcher unit tests

* Fix header formatting

* Tweak test

* Update go release version 1.17.1 (#27543)

* format of conditional build tags has changed
* matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123
* iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests
* go.mod file contains separate require directive for transitive dependencies

* Move labels and annotations under kubernetes.namespace. (#27917)

* Move labels and annotations under kubernetes.namespace.

* Remove GCP support from Functionbeat (#28253)

* Fix build tags for Go 1.17 (#28338)

* [Elastic Agent] Add ability to communicate with Kibana through service token (#28096)

* Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand.

* Add changelog entry.

* Fix go fmt.

* Add username to ASA Security negotiation log (#26975)

* Add username to ASA Security negotiation log

I added the username user.name field to ASA Security negotiation log line.

* adding support for both formats

* adding changelog entry

* updating geo fields in expected output files

* reverse formatting

* reverting to older version of file

* reverting formatting again

* regenrate golden files again

* remove formatting, ready for review

* fixing missing message due to no newline

* fix dissect pattern to fit correctly

Co-authored-by: Marius Iversen <marius.iversen@elastic.co>

* x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (#28325)

* Redis: remove deprecated fields (#28246)

* Redis: remove deprecated fields

* Disable generator tests temporarily (#28362)

* Windows/perfmon metricset -  remove deprecated perfmon.counters configuration (#28282)

* remove deprecated config

* changelog

* [Filebeat] - S3 Input - Add support for only iterating/accessing only… (#28252)

* [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (#28230)

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations

* Take care of namespace being nil

* [Heartbeat] Setuid to regular user / lower capabilities when possible (#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

* mage fmt

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr>
Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: Michael Katsoulis <michaelkatsoulis88@gmail.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: LaZyDK <dennisperto@gmail.com>
Co-authored-by: Marius Iversen <marius.iversen@elastic.co>
Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Co-authored-by: Mariana Dima <mariana@elastic.co>
Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
Icedroid pushed a commit to Icedroid/beats that referenced this pull request Nov 1, 2021
…elastic#27878)

partial fix for elastic#27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
Icedroid pushed a commit to Icedroid/beats that referenced this pull request Nov 1, 2021
* singleton sysinfo host to avoid frequently collecting host info

* add Host object to Stats object

* update changelog

* set procStats.host to nil if any error calling sysinfo.Host()

* Update aws-lambda-go library version to 1.13.3 (elastic#28236)

* [cloud][docker] use the private docker namespace (elastic#28286)

* [7.x] [DOCS] Update api_key example on elasticsearch output (elastic#28288)

* packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (elastic#28297)

* seccomp: allow clone3 syscall for x86 (elastic#28117)

clone3 is a linux syscall that is now used by glibc starting version
2.34. It is used when pthread_create() gets called. Current seccomp
filters do not allow this syscall leading to crashes like
runtime/cgo: pthread_create failed: Operation not permitted

See elastic/apm-server#6238 for more details

* Osquerybeat: Improve handling of osquery.autoload file, allow customizations (elastic#28289)

Previously the osquery.autoload file was overwritten every time on
osquerybeat start and stamped with our extension.
After the change we check the content of the file and do not overwrite it on
each osquerybeat start. This allows the user to deploy their own
extensions if their want and start osquery with that.

* Osquerybeat: Runner and Fetcher unit tests (elastic#28290)

* Runner and Fetcher unit tests

* Fix header formatting

* Tweak test

* Update go release version 1.17.1 (elastic#27543)

* format of conditional build tags has changed
* matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123
* iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests
* go.mod file contains separate require directive for transitive dependencies

* Move labels and annotations under kubernetes.namespace. (elastic#27917)

* Move labels and annotations under kubernetes.namespace.

* Remove GCP support from Functionbeat (elastic#28253)

* Fix build tags for Go 1.17 (elastic#28338)

* [Elastic Agent] Add ability to communicate with Kibana through service token (elastic#28096)

* Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand.

* Add changelog entry.

* Fix go fmt.

* Add username to ASA Security negotiation log (elastic#26975)

* Add username to ASA Security negotiation log

I added the username user.name field to ASA Security negotiation log line.

* adding support for both formats

* adding changelog entry

* updating geo fields in expected output files

* reverse formatting

* reverting to older version of file

* reverting formatting again

* regenrate golden files again

* remove formatting, ready for review

* fixing missing message due to no newline

* fix dissect pattern to fit correctly

Co-authored-by: Marius Iversen <marius.iversen@elastic.co>

* x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (elastic#28325)

* Redis: remove deprecated fields (elastic#28246)

* Redis: remove deprecated fields

* Disable generator tests temporarily (elastic#28362)

* Windows/perfmon metricset -  remove deprecated perfmon.counters configuration (elastic#28282)

* remove deprecated config

* changelog

* [Filebeat] - S3 Input - Add support for only iterating/accessing only… (elastic#28252)

* [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (elastic#28230)

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations

* Take care of namespace being nil

* [Heartbeat] Setuid to regular user / lower capabilities when possible (elastic#27878)

partial fix for elastic#27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

* mage fmt

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr>
Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: Michael Katsoulis <michaelkatsoulis88@gmail.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: LaZyDK <dennisperto@gmail.com>
Co-authored-by: Marius Iversen <marius.iversen@elastic.co>
Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Co-authored-by: Mariana Dima <mariana@elastic.co>
Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.16.0 Automated backport with mergify bug Heartbeat Team:Elastic-Agent Label for the Agent team Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants