CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

mrcnski · 2023-04-13T13:54:01Z

PULL REQUEST

Overview

Summary: Running --validator on a platform other than Linux x86-64 requires the --insecure-validator-i-know-what-i-do flag.

Due to paritytech/polkadot-sdk#882 becoming high-priority for parathreads, we are now forced to provide a secure validator mode only for Linux x86-64 (to start).

We will still support other platforms with an --insecure-validator-i-know-what-i-do flag. (Naming follows interpreted-i-know-what-i-do.)

We may want to wait for paritytech/polkadot-sdk#882 before merging, but it seems at this point pretty certain that platform-specific code is a requirement for securely running the PVF workers.

TODO

Add flag to zombienet (so existing commands don't break) (@pepoviola can we minimize breakage?)
Anywhere else?
Documentation (where? validators' guide?)
Require landlock to be enabled for secure-mode.

Related Issues

See paritytech/polkadot-sdk#882

Closes paritytech/polkadot-sdk#881

Due to #4718 becoming high-priority for parathreads, we are now forced to provide a secure validator mode only for Linux x86-64 (to start). We will still support MacOS with an `--insecure-validator-i-know-what-i-do` flag. (Naming follows [`interpreted-i-know-what-i-do`](https://github.com/paritytech/substrate/blob//client/cli/src/arg_enums.rs#L58).) See https://github.com/paritytech/polkadot/issues/4718#issuecomment-1484137059 Closes https://github.com/paritytech/polkadot/issues/4720

koute

Hmm.... I'm not entirely sure we want to immediately do this? Are we going to insta-enable the sandboxing? I was thinking that maybe we could do something like this:

Add sandboxing support, but keep it disabled by default.
Get the people from the thousand validators program to test it for us for one release.
Then if it all goes well switch it on by default and add this flag.

But maybe this is too cautiousness? In case there would be any problems people could always disable sandboxing if they want, but on the other hand it's probably not a good look to tell people to run with this parameter because we didn't test it out enough to ensure it's completely compatible with everything.

koute · 2023-04-13T15:42:16Z

cli/src/cli.rs

@@ -118,6 +118,10 @@ pub struct RunCmd {
 	#[arg(long)]
 	pub beefy: bool,

+	/// Allows the validator to run insecurely if they know what they're doing.


This is visible to the user in --help, right? In which case a slightly longer explanation would be nice. (Or as I've said in the other comment, maybe a link to somewhere where it's explained in more detail.)

koute · 2023-04-13T15:49:45Z

cli/src/command.rs

+	#[cfg(not(target_os = "linux"))]
+	let result = Err("Must be on Linux to run a validator securely.".into());
+
+	#[cfg(all(target_os = "linux", not(target_arch = "x86_64")))]
+	let result = Err("Must be on x86_64 to run a validator securely.".into());


Maybe a little bit less jargon-y? Normal users might not necessarily know what x86_64 is.

Suggested change

#[cfg(not(target_os = "linux"))]

let result = Err("Must be on Linux to run a validator securely.".into());

#[cfg(all(target_os = "linux", not(target_arch = "x86_64")))]

let result = Err("Must be on x86_64 to run a validator securely.".into());

#[cfg(not(target_os = "linux"))]

let result = Err("Running a validator is only supported on Linux.".into());

#[cfg(all(target_os = "linux", not(target_arch = "x86_64")))]

let result = Err("Running a validator is only supported on CPUs from the x86_64 family (usually Intel or AMD).".into());

koute · 2023-04-13T15:52:50Z

cli/src/error.rs

@@ -57,4 +57,7 @@ pub enum Error {

 	#[error("This subcommand is only available when compiled with `{feature}`")]
 	FeatureNotEnabled { feature: &'static str },
+
+	#[error("Insecure validator: {0} Run with --insecure-validator-i-know-what-i-do if you understand and accept the risks of running insecurely.")]


"Insecure validator" can be kinda vague, maybe something like this?

Suggested change

#[error("Insecure validator: {0} Run with --insecure-validator-i-know-what-i-do if you understand and accept the risks of running insecurely.")]

#[error("Your system cannot securely run a validator: {0}\nYou can ignore this requirement by using the `--insecure-validator-i-know-what-i-do` command line argument if you understand and accept the risks of running insecurely.")]

Also, I think in this case it'd be nice to maybe e.g. put a link to the docs with a longer explanation of why this is and what will be the consequences when running in unsecure mode.

Like the idea, just don't know where the doc should go. Maybe on the validators guide?

mrcnski · 2023-04-13T16:47:30Z

Good point @koute. I just thought, since the upcoming os/arch requirement is certain, we could start enforcing it early -- or at least start informing validators and getting the docs ready. And people seem confident that essentially all validators are already on Linux amd64 (though I still haven't seen any data myself).

Add sandboxing support, but keep it disabled by default.

Get the people from the thousand validators program to test it for us for one release.

Then if it all goes well switch it on by default and add this flag.

That is a really good idea. I am worried about turning on sandboxing and having legitimate syscalls blocked on some real PVFs. One concern though is that for validators, it might seem too risky to beta test a new feature that could get them slashed?

This PR can certainly wait until sandboxing, no strong preference here as to when it's merged. Can mark as DNM if we want.

bkchr · 2023-04-13T19:52:15Z

I'm with @koute. I would also propose to close the pr, it will just get stale and recreating it will be very easy.

That is a really good idea. I am worried about turning on sandboxing and having legitimate syscalls blocked on some real PVFs. One concern though is that for validators, it might seem too risky to beta test a new feature that could get them slashed?

That is a good point. I think at the beginning we should run with some kind of training wheels, aka: https://github.com/pmem/syscall_intercept We could intercept all the syscalls and let each unknown syscall print some error message.

mrcnski · 2023-04-14T09:12:22Z

Thanks! To be clear, this PR doesn't enable sandboxing, it just restricts the os/arch for validators without the insecure flag.

That is a good point. I think at the beginning we should run with some kind of training wheels, aka: pmem/syscall_intercept We could intercept all the syscalls and let each unknown syscall print some error message.

That looks like a C library and not very easy to use? seccomp already provides a log-only mode so we can use that right now for any syscalls that pass through the filter. Validators could then grep e.g. kern.log for the syscalls and share with us. Or we can grep it ourselves in the worker process after execution, and raise an error trace if there's something there.

We could provide e.g. a --sandbox-mode flag with only one possible value for now, like log-syscalls. This would totally safe. Later we can add another option like restrict-syscalls and make that the default. We can add more options as we expand the sandboxing, and make whatever the strongest one is the default.

koute · 2023-04-14T10:16:09Z

We could intercept all the syscalls and let each unknown syscall print some error message.

We could, but the problem is - how do we get the users to report that to us if the only thing it does is print out a message? If a tree falls down in a forest and no one is there, does it make a sound? :P

That looks like a C library and not very easy to use?

I think Basti meant to just adopt a similar concept, not to actually use that library. Especially since from what I can see that library works by monkeypatching the process' memory, which is not what we want.

mrcnski · 2023-04-14T10:27:02Z

We could, but the problem is - how do we get the users to report that to us if the only thing it does is print out a message? If a tree falls down in a forest and no one is there, does it make a sound? :P

The message could say: "in the future this error may result in a dispute, please report to the Polkadot development team!"

bkchr · 2023-04-14T11:09:55Z

Thanks! To be clear, this PR doesn't enable sandboxing, it just restricts the os/arch for validators without the insecure flag.

I know :P

I think Basti meant to just adopt a similar concept, not to actually use that library. Especially since from what I can see that library works by monkeypatching the process' memory, which is not what we want.

No, I meant to use this monkeypatching :P My 5 min google search told me that you can not replace syscall that easily as it is already loaded before anything else, which makes sense. But if there exist some kind of logging for seccomp, it sounds nice! If we want to abuse the 1kv program any way, we can send the data about accessed syscalls to the telemetry. Maybe we can let seccomp log somewhere else or we may need to parse the kernel log 🙈

mrcnski · 2023-04-14T12:06:34Z

Maybe we can let seccomp log somewhere else or we may need to parse the kernel log 🙈

I don't think we can change where it logs. ☹️ I also don't know if the log location is always the same on each machine. On my GCP instance it's kern.log but the man page says it goes in the audit log, which I thought was audit.log.

@koute Is it possible to use the trace action to log from a tracer?

koute · 2023-05-11T04:50:27Z

@koute Is it possible to use the trace action to log from a tracer?

Well, with SECCOMP_RET_TRACE the ptrace host will get notified about the call so it can do whatever it wants with it, including logging it wherever it wants.

Polkadot-Forum · 2023-05-17T11:55:46Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/ux-of-distributing-multiple-binaries-take-2/2854/22

mrcnski · 2023-06-01T19:53:18Z

I realized some things related to this work:

For determinism, we should fully require whatever security features (like landlock) we have, in order to run secure-mode.
On the other hand, too many validators on insecure-mode can also be a source of indeterminism.
So we should be careful not to have some crazy unrealistic standards for running in secure-mode.
We should monitor with telemetry how many validators are secure vs. insecure. This shouldn't use the public telemetry server.

See #7303 (comment) for more informations.

mrcnski requested review from koute, ordian, bkchr and acatangiu April 13, 2023 13:54

koute reviewed Apr 13, 2023

View reviewed changes

mrcnski self-assigned this Apr 13, 2023

mrcnski mentioned this pull request Aug 24, 2023

PVF host: sandbox/harden worker process paritytech/polkadot-sdk#882

Closed

mrcnski marked this pull request as draft June 1, 2023 14:50

mrcnski mentioned this pull request Jun 1, 2023

PVF: add landlock sandboxing #7303

Merged

3 tasks

mrcnski mentioned this pull request Jun 6, 2023

Secure Mode Announcement w3f/polkadot-wiki#4881

Closed

mrcnski mentioned this pull request Jul 23, 2023

98.6% OF DEVELOPERS CANNOT REVIEW THIS PR! [read more...] #7337

Merged

mrcnski mentioned this pull request Sep 7, 2023

Secure validator mode paritytech/polkadot-sdk#1444

Closed

8 tasks

mrcnski mentioned this pull request Nov 24, 2023

PVF: Add Secure Validator Mode paritytech/polkadot-sdk#2486

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

mrcnski commented Apr 13, 2023 •

edited

Loading

koute left a comment •

edited

Loading

koute Apr 13, 2023

koute Apr 13, 2023

koute Apr 13, 2023

mrcnski Apr 13, 2023

mrcnski commented Apr 13, 2023

bkchr commented Apr 13, 2023

mrcnski commented Apr 14, 2023

koute commented Apr 14, 2023

mrcnski commented Apr 14, 2023

bkchr commented Apr 14, 2023

mrcnski commented Apr 14, 2023

koute commented May 11, 2023

Polkadot-Forum commented May 17, 2023

mrcnski commented Jun 1, 2023

	#[error("Insecure validator: {0} Run with --insecure-validator-i-know-what-i-do if you understand and accept the risks of running insecurely.")]
	#[error("Your system cannot securely run a validator: {0}\nYou can ignore this requirement by using the `--insecure-validator-i-know-what-i-do` command line argument if you understand and accept the risks of running insecurely.")]

CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

Are you sure you want to change the base?

CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

Conversation

mrcnski commented Apr 13, 2023 • edited Loading

PULL REQUEST

Overview

TODO

Related Issues

koute left a comment • edited Loading

Choose a reason for hiding this comment

koute Apr 13, 2023

Choose a reason for hiding this comment

koute Apr 13, 2023

Choose a reason for hiding this comment

koute Apr 13, 2023

Choose a reason for hiding this comment

mrcnski Apr 13, 2023

Choose a reason for hiding this comment

mrcnski commented Apr 13, 2023

bkchr commented Apr 13, 2023

mrcnski commented Apr 14, 2023

koute commented Apr 14, 2023

mrcnski commented Apr 14, 2023

bkchr commented Apr 14, 2023

mrcnski commented Apr 14, 2023

koute commented May 11, 2023

Polkadot-Forum commented May 17, 2023

mrcnski commented Jun 1, 2023

mrcnski commented Apr 13, 2023 •

edited

Loading

koute left a comment •

edited

Loading