engine: default location for reading/writing jwt secrets #297

lightclient · 2022-08-25T22:08:51Z

Currently, if a user wanted casually run a EL and CL on the same host without the aid of any virtualization, it is required that the user specify the --jwt-secret flag for at least one client. Example:

$ geth --sepolia

$ lighthouse beacon --network sepolia --execution-jwt $HOME/.ethereum/sepolia/geth/jwtsecret

I think this is both unnecessary and confusing behavior for users. Ideally, the user should be able to, in any order, bring up an EL and CL and they communicate by default.

The preferred behavior would be:

$ geth --sepolia

$ lighthouse beacon --network sepolia

--

This PR expands the authentication specification to define a list of standard locations that clients should read and write JWT secrets. It follows the XDG Base Directory Specification for Linux and the directories-rs conversions for cross-platform support. The resulting path list is:

For Linux:
- $XDG_DATA_DIR/ethereum/engine/jwt.hex
- $HOME/.local/share/ethereum/engine/jwt.hex

For Mac:
- $HOME/Library/Application Support/Ethereum/Engine/jwt.hex

For Windows:
- {FOLDERID_RoamingAppData}/Ethereum/Engine/jwt.hex

Rationale

Differentiating networks

I'm not sure if it is necessary to differentiate between networks with the JWT. The assumption is that the adversary in this scenario does not have filesystem access and therefore the key is secure.

Multiple clients

Since the path is standard, the token may be from any client for any network. This shouldn't be an issue since the format is specified. It's possible that there is a race to generate the JWT. I'm open to adding some locking concept, but it feels like that is overkill as this is targeted at the casual user w/o a sophisticated setup (e.g. mostly manual).

Reading secrets from client data directories

This is the current behavior as far as I'm aware. If clients want to do a little extra work, they should be able to poke about standard directories for a JWT secret. This makes this PR backwards compatible with the current situation with the possibility of clients being smart enough to avoid asking the user to specify a path directly. For example, suppose the user starts Nethermind first and the EL is unable to write a secret in the $XDG_DATA_HOME directory, so it falls back to its own default data directory. Next the user starts Nimbus. Nimbus could be smart enough to know what the default Nethermind data directory is, search it for a key, and attempt to connect on 8551.

Only CL can search other directories

Because the CL can actually connect to the local EL, it is possible for it to affirm the key it has found is valid. If an EL were able to search other directories outside the explicit list, it's not obvious the CL would find the same key or find the same key in the same order.

djrtwo

I think this is a reasonable tradeoff in UX

src/engine/authentication.md

MicahZoltu · 2022-08-26T06:48:17Z

src/engine/authentication.md

 If such a parameter _is_ given, but the file cannot be read, or does not contain a hex-encoded key of `256` bits, the client should treat this as an error: either abort the startup, or show error and continue without exposing the authenticated port.

+If such a parameter is not given, the client **MUST** attempt to read the secret from the default paths defined below, in the order they are listed.  If a secret is found, but the file cannot be read, does not contain a hex-encoded key of `256` bit, or is rejected by the other client, the client should continue searching. The CL **MAY** search other locations, such as default EL data directories.


As mentioned below, I'm not a fan of the MUST here. A high security client may reasonably refuse to read secrets from disk and demand that they be read from a secure secret location in order to properly start. Such a client shouldn't be in violation of the specification in that section. I think a SHOULD would be appropriate here and address the need that I believe you are trying to address.

I think a "high security" client should just flag this configuration with lots of warnings. If I knew all current clients would implement this if it is listed as SHOULD, I would be inclined to change it. But generally, I think this is an important UX improvement and we should force clients to follow suite.

If it is sufficiently useful to users, then all clients will implement it without being forced to. I generally am pretty 👎 on sacrificing security for usability. We should always try to find ways to improve usability, but never at the cost of security.

A "high security client" should not even start if it is configured in a way that is insecure. This sort of client is valuable for businesses and enterprise shops where they want to make sure human error cannot result in a security compromise. Those clients wouldn't be using this feature at all, as they would have their own key management system that injects the keys just-in-time (likely as part of whatever process is launching the client).

src/engine/authentication.md

arnetheduck · 2022-09-01T06:35:24Z

A few points:

a single BN can drive multiple EL:s - it's fine that all EL:s use the same jwt secret but they too must be careful not to overwrite each other
for the single-machine setup, it is doubtful whether JWT secret adds any signficant security: from a technical point of view, a socket bound to localhost cannot accept "outside" connections meaning that the only difference is the "filesystem" security - ie processes running as a different user may be thwarted by the additional token exchange, but we have to weigh this against the complexity and failure modes that having a file introduces. The alternative here is that we simply allow connecting without JWT as long as the socket is bound to localhost which achieves the "simple single-machine setup" with less of a hurdle (notably, the engine api is already sitting on a separate port from the "user-level" JSON-RPC interface, so the two live in different "security domains").

MicahZoltu · 2022-09-11T07:44:59Z

Why was this closed?

lightclient · 2022-09-12T08:09:54Z

@MicahZoltu #302 replaces it

lightclient · 2023-07-17T21:45:55Z

Reopening this PR as it seems like we've reached a bit of a dead end on #302.

The tldr; is that we really shouldn't rely on the security policy of browsers to ensure a malicious webpage isn't able to control clients. In #302, if an EL does not have the correct CORS policy it may be possible to control it with a webpage.

lightclient · 2023-07-20T14:48:08Z

Discussed on ACD 166, a few comments:

will the permissions work out correctly on all platforms if different processes need to write
adds some additional complexity, probably not a show stopper though

smartprogrammer93 · 2023-07-20T14:52:59Z

As discussed on ACD 166, I support having a default value since it will not break any existing setups. and it will remove the complexity from users just wanting to run a node at home.
The additional complexity @lightclient is talking about here is, users' setups might not work with the default value, and users would then ask the team for support! which I don't see as a big problem. I believe that having a default value is better for most users than not having it.

MicahZoltu · 2023-07-21T06:07:44Z

Is there an already existing piece of data that is machine unique and not externally available on all major operating systems? If there is, perhaps we could instead use that for the entropy in default secret generation rather than writing a new file.

On Windows there is HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography\MachineGuid which is randomly generated at Windows install time and doesn't change over the life of the installation. It appears Linux has /etc/machine-id which serves a similar purpose, and I would assume that OSX also has that or something similar.

One caveat here is that for installations running inside a container/VM you wouldn't get a matching ID, but IIUC this feature is designed specifically for people doing simple host-level installs so that is out of scope anyway.

tbenr · 2023-07-21T07:07:42Z

It appears Linux has /etc/machine-id which serves a similar purpose, and I would assume that OSX also has that or something similar.

Seems like tricky on OSX: https://www.npmjs.com/package/node-machine-id

The nice thing of this approach is that, avoiding writing a file in the home folder that the user isn't aware of, it removes the accidental deletion of the file.
But doing tricky things like accessing the registry in windows or running weird logic on OSX seems like too much to me.

tbenr · 2023-07-21T07:25:15Z

I still have a feeling that this could lead to confusion and potential problems.
What worries me is relaying on a default that have non-0 chance to break at runtime (CL-EL ending up loading different secrets) instead of forcing the user to prepare the file well in advance (with current required param).

MicahZoltu · 2023-07-21T07:55:27Z

But doing tricky things like accessing the registry in windows or running weird logic on OSX seems like too much to me.

By "tricky" here you mean you have to use OS specific APIs and you can't just use platform agnostic APIs like file access (which are basically the same across platforms and abstracted away in most languages)?

tbenr · 2023-07-21T08:28:13Z

By "tricky" here you mean

Yes exactly. With the risk of falling in the rabbit hole of which OS version\flavour supports the given technique.

smartprogrammer93 · 2023-07-21T08:36:31Z

Have to agree with @tbenr here. Having different paths to check for each OS is complex enough, in my opinion. Further complications are just unnecessary for such a small feature.

rubo

This PR needs a revision in regard to Linux environment variable names. Currently, there are 3 different variable names mentioned for Linux and what's worse, 2 of them are invalid:

XDG_DATA_DIR in the PR description. This is invalid. It should be XDG_DATA_HOME instead. The XDG spec defines XDG_DATA_DIRS as well but that's not what's needed here.
XDG_DATA_HOME in the PR description
XDG_CACHE_DIR in authentication.md#L47. This is invalid. It should be XDG_CACHE_HOME instead.

With this said, we have XDG_CACHE_HOME vs XDG_DATA_HOME. Which one then?

rubo · 2023-08-03T18:32:31Z

src/engine/authentication.md

+### Default JWT secret locations
+
+For Linux:
+* `$XDG_CACHE_DIR/ethereum/engine/jwt.hex`


Should be either of these instead:

$XDG_CACHE_HOME/... and $HOME/.cache/...

$XDG_DATA_HOME/... and $HOME/.local/share/...

Pretty sure CACHE is correct here because it can be regenerated as needed, and should NOT be included in any backup processes.

Suggested change

* `$XDG_CACHE_DIR/ethereum/engine/jwt.hex`

* `$XDG_CACHE_HOME/ethereum/engine/jwt.hex`

djrtwo reviewed Aug 26, 2022

View reviewed changes