Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate WebSocketClient in PolykeyClient #582

Merged
merged 1 commit into from
Oct 21, 2023

Conversation

addievo
Copy link
Contributor

@addievo addievo commented Oct 13, 2023

Description

This PR aims to encapsulate WebSocketClient in PolykeyClient.

At the same time Options usage has to be limited to only PolykeyAgent and PolykeyClient.

Because:

  1. Subdomains aren't really meant to have "options". They are just groups of classes closely related, that may or may not be factored out in a separate package one day.
  2. Classes themselves have options - more correctly as parameters. As parameters, they should be top level, not nested within a options. Why? Because you can just do X.createX({ param1, param2 });, no need to do X.createX({ options: { param1, param2 }); as the options is superfluous.
  3. But why does PolykeyAgent and PolykeyClient have options? The reason is simply because there is so many parameters, that it is useful to have a separate type to keep track of them all. But this is exceptional to PolykeyAgent and PolykeyClient, it should not be considered the default way of creating classes in our codebase. And this type PolykeyAgentOptions and PolykeyClientOptions will be used in various places, in particular tests that need to run many different variants of the Polykey program.
  4. There should necessarily be an exact relationship between the parameters in each domain class, and the options of PolykeyAgent and PolykeyClient. We want to retain some flexibility here.

Thus we will design which will be put into src/types.ts:

type PolykeyAgentOptions
type PolykeyClientOptions

But all other domains will not do such a thing.

Tasks

  • 1. Align PolykeyClientOptions with PolykeyAgentOptions
  • 2. Ensure that subdomains are just flattened parameters, not options which is unique only to PolykeyClient and PolykeyAgent.
  • 3. Encapsulate WebSocketClient into PolykeyClient and brought back nodeId as a necessary parameter, verification of the client service certificate is now done within PolykeyClient.
  • 4. Made PolykeyClient a @timedCancellable operation with the timeout defaulted to config.defaultsSystem.clientConnectTimeoutTime. This propagates down to the await WebSocketClient.createWebSocketClient.
  • 5. Created password ops limits and mem limits defaults in the config using the raw value and added tests to ensure that the raw values is maintained. It is now configured at the top level.
  • 6. Update all tests using PolykeyAgent and PolykeyClient and subdomains which was relying upon options, these call signatures are all changed.
  • 7. Test that the timed cancellable works for PolykeyClient.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@ghost
Copy link

ghost commented Oct 13, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@addievo addievo changed the title fix: encapuslate WebSocketClient in PolykeyClient.ts fix: encapuslate WebSocketClient in PolykeyClient.ts and increase Github AuthenticationTimeout Oct 13, 2023
@addievo addievo self-assigned this Oct 13, 2023
@tegefaulkes
Copy link
Contributor

Don't forget to update all code and test that depend on the PolykeyClient. That may just be the PolykeyClient tests in tests/PolykeyClient.test.ts but check for all usages in the code base.

@CMCDragonkai
Copy link
Member

Make 4. a separate PR, keep this PR to only PK client encapsulating WS client.

@CMCDragonkai
Copy link
Member

Can you address all comments.

@CMCDragonkai
Copy link
Member

Rebase, squash down to 1 commit, and test what you're doing here, fix all tests.

@CMCDragonkai
Copy link
Member

Note in the future, we don't use reverts in feature branches. You're just supposed to move ahead, and eventually squash down.

@CMCDragonkai CMCDragonkai changed the title fix: encapuslate WebSocketClient in PolykeyClient.ts and increase Github AuthenticationTimeout Encapsulate WebSocketClient in PolykeyClient Oct 17, 2023
@tegefaulkes
Copy link
Contributor

Summary of all failing tests
 FAIL  tests/PolykeyClient.test.ts (8.94 s)
  ● PolykeyClient › preserving and destroying session state

    TypeError: Cannot destructure property 'address' of 'undefined' as it is undefined.

      64 |       fresh,
      65 |     });
    > 66 |     const ws: WebSocketClient = new WebSocketClient();
         |                                 ^
      67 |     const rpcClientClient = new RPCClient({
      68 |       manifest: clientClientManifest,
      69 |       streamFactory,

      at new WebSocketClient (node_modules/@matrixai/ws/src/WebSocketClient.ts:292:5)
      at new WebSocketClient (node_modules/@matrixai/events/src/Evented.ts:55:9)
      at new WebSocketClient (node_modules/@matrixai/async-init/src/CreateDestroy.ts:49:26)
      at Function.createPolykeyClient (src/PolykeyClient.ts:66:33)
      at Object.<anonymous> (tests/PolykeyClient.test.ts:67:22)

  ● PolykeyClient › end to end with authentication logic

    TypeError: Cannot destructure property 'address' of 'undefined' as it is undefined.

      64 |       fresh,
      65 |     });
    > 66 |     const ws: WebSocketClient = new WebSocketClient();
         |                                 ^
      67 |     const rpcClientClient = new RPCClient({
      68 |       manifest: clientClientManifest,
      69 |       streamFactory,

      at new WebSocketClient (node_modules/@matrixai/ws/src/WebSocketClient.ts:292:5)
      at new WebSocketClient (node_modules/@matrixai/events/src/Evented.ts:55:9)
      at new WebSocketClient (node_modules/@matrixai/async-init/src/CreateDestroy.ts:49:26)
      at Function.createPolykeyClient (src/PolykeyClient.ts:66:33)
      at Object.<anonymous> (tests/PolykeyClient.test.ts:97:22)


Test Suites: 1 failed, 70 passed, 71 total
Tests:       2 failed, 5 skipped, 10 todo, 580 passed, 597 total
Snapshots:   0 total
Time:        193.513 s

Only the PolykeyClient.test.ts test is failing. Not surprising since...

  1. You haven't actually updated the tests using PolykeyClient.
  2. You change is actually broken. WebSocketClient is CreateDestroy, so you shouldn't be constructing it directly with new WebSocketClient(). You should be using the async-init methods that are relevant.

@addievo addievo force-pushed the feature-websocket_client-encapsulation branch from 4079fdf to 2c0ca01 Compare October 17, 2023 05:00
@CMCDragonkai
Copy link
Member

Needs rebase.

package.json Outdated Show resolved Hide resolved
package-lock.json Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
@addievo
Copy link
Contributor Author

addievo commented Oct 17, 2023

image

@CMCDragonkai
Copy link
Member

Don't resolve unless you've pushed.

@addievo addievo force-pushed the feature-websocket_client-encapsulation branch from 2c0ca01 to 56c3139 Compare October 17, 2023 06:03
@tegefaulkes
Copy link
Contributor

I was going to look into this but it's actually pretty relevant to this PR.

The src/client/utils.ts verifyServerCertificateChain may have a problem. When verifying with an old NodeId, the custom verification is failing even though the old NodeId and new NodeId exist in the cert chain.

There needs to be some tests added to test the custom verification logic for certain cases.

  1. Verification success.
  2. Verification failure.
  3. Verification succeeds using old node ID that is still valid in the cert chain.

@CMCDragonkai
Copy link
Member

This PR is not done.

  1. The PolykeyClientOptions lacks a clear spec.
  2. It needs to be considered in line with PolykeyAgentOptions
  3. It needs to be defaulted using the utils.mergeObject
  4. The rpc option naming between client and server needs to be aligned.
  5. The ws options need to be filled automatically except for a few transport level defaults.

@addievo addievo assigned CMCDragonkai and addievo and unassigned addievo Oct 19, 2023
@addievo addievo requested a review from tegefaulkes October 19, 2023 03:55
Copy link
Contributor

@tegefaulkes tegefaulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things to address.

src/PolykeyAgent.ts Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
src/PolykeyClient.ts Outdated Show resolved Hide resolved
tests/PolykeyClient.test.ts Outdated Show resolved Hide resolved
Comment on lines 88 to 92
tokenBase: config.paths.tokenBase,
rpc: {
callTimeoutTime: config.defaultsSystem.rpcCallTimeoutTime,
parserBufferSize: config.defaultsSystem.rpcParserBufferSize,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless they need to be set for the test to work, don't bother setting them. Less noise for the test.

src/PolykeyAgent.ts Outdated Show resolved Hide resolved
@CMCDragonkai
Copy link
Member

Taking over this PR. I'm going to standardise the option names between the agent and client. And it's important to know that domain options are not composed into the agent/client options. They are abstracted from each other.

@CMCDragonkai
Copy link
Member

All options must be top-level to ensure it's easy to compare between PolykeyAgent and PolykeyClient.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 20, 2023

The reason is like this.

  1. Subdomains aren't really meant to have "options". They are just groups of classes closely related, that may or may not be factored out in a separate package one day.
  2. Classes themselves have options - more correctly as parameters. As parameters, they should be top level, not nested within a options. Why? Because you can just do X.createX({ param1, param2 });, no need to do X.createX({ options: { param1, param2 }); as the options is superfluous.
  3. But why does PolykeyAgent and PolykeyClient have options? The reason is simply because there is so many parameters, that it is useful to have a separate type to keep track of them all. But this is exceptional to PolykeyAgent and PolykeyClient, it should not be considered the default way of creating classes in our codebase. And this type PolykeyAgentOptions and PolykeyClientOptions will be used in various places, in particular tests that need to run many different variants of the Polykey program.
  4. There should necessarily be an exact relationship between the parameters in each domain class, and the options of PolykeyAgent and PolykeyClient. We want to retain some flexibility here.

@CMCDragonkai
Copy link
Member

I'm reverting: 0f81b34

@CMCDragonkai
Copy link
Member

Reverting my old BootstrapOptions change too.

[nix-shell:~/Projects/Polykey/src]$ ag 'Options'
PolykeyAgent.ts
48:type PolykeyAgentOptions = {
111:    // Options
119:    options?: DeepPartial<PolykeyAgentOptions>;
847:export type { PolykeyAgentOptions };

keys/KeyRing.ts
53:    ...startOptions
88:    await keyRing.start(startOptions);
161:    const { fresh = false, ...setupKeyPairOptions } = options;
171:      await this.setupKeyPair(setupKeyPairOptions);

PolykeyClient.ts
21:type PolykeyClientOptions = {
60:    options?: DeepPartial<PolykeyClientOptions>;
173:export type { PolykeyClientOptions };

vaults/VaultOps.ts
11:type FileOptions = {
145:  fileOptions?: FileOptions,
150:      await efs.rmdir(secretName, fileOptions);
171:  fileOptions?: FileOptions,
174:  const recursive = !!fileOptions?.recursive;
178:      await efs.mkdir(dirPath, fileOptions);

sigchain/Sigchain.ts
299:    const orderOptions =
301:    let seekOptions:
306:      seekOptions =
317:      ...orderOptions,
318:      ...seekOptions,
347:    const orderOptions =
349:    let seekOptions:
354:      seekOptions =
365:      ...orderOptions,
366:      ...seekOptions,

All removed now except of course PolykeyAgentOptions.

Oh yea and this is another reason to use ag, you can actually copy the result as a comment, and you can't do that in your IDE. Use your terminal more @addievo @tegefaulkes!

@CMCDragonkai
Copy link
Member

I noticed that in PolykeyAgent, the connectTimeoutTime isn't being used.

The connectTimeoutTime is supposed to be the timeout to connect to the server, and separate from the keep alive.

Now PolykeyAgent runs the server side of the client service, so it doesn't actually connect to other clients. However when establishing the reverse side of connections you may still want a timeout in case of protocol logic.

In QUIC:

      await connection.start(
        {
          data,
          remoteInfo,
        },
        { timer: this.minIdleTimeout },
      );

Is used on the server side.

@amydevs how is this done in WS?

@CMCDragonkai
Copy link
Member

@amydevs Ok so:

  1. config.clientConnectTimeout - is used as the default of PolykeyClientOptions.connectTimeoutTime
  2. This is now removed from PolykeyAgentOptions and instead moved into PolykeyClientOptions.
  3. This option is used as part of the ctx { timer: optionsDefaulted.connectTimeoutTime } we creating the WebSocketClient. This makes sense, as it is the timeout for the transport connecting to the client service.
  4. The rpcCallTimeoutTime however is instead being set to streamKeepAliveTimeoutTime for RPCClient. I think this option needs to be renamed. It's too confusing what it means. I think it can be updated Update handler and caller timeouts to be able to overwrite server and client default timeouts regardless of value. js-rpc#47

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 20, 2023

    const webSocketClient = await WebSocketClient.createWebSocketClient(
      {
        host: host,
        port: port,
        config: {
          verifyPeer: true,
          keepAliveTimeoutTime: optionsDefaulted.keepAliveTimeoutTime,
          keepAliveIntervalTime: optionsDefaulted.keepAliveIntervalTime,
        },
        logger: logger.getChild(WebSocketClient.name),
      },
      {
        // this makes sense
        timer: optionsDefaulted.connectTimeoutTime,
      },
    );
    const rpcClient = new RPCClient({
      manifest: clientClientManifest,
      streamFactory: () => webSocketClient.connection.newStream(),
      middlewareFactory: rpcMiddleware.defaultClientMiddlewareWrapper(
        clientMiddleware.middlewareClient(session),
        optionsDefaulted.rpcParserBufferSize,
      ),
      toError: networkUtils.toError,
      // this should be renamed
      streamKeepAliveTimeoutTime: optionsDefaulted.rpcCallTimeoutTime,
      logger: logger.getChild(RPCClient.name),
    });

@CMCDragonkai
Copy link
Member

The ops limits numbers are:

[nix-shell:~/Projects/Polykey]$ node
Welcome to Node.js v20.5.1.
Type ".help" for more information.
> const sodium = require('sodium-native')
undefined
> sodium.crypto_pwhash_OPSLIMIT_MIN
1
> sodium.crypto_pwhash_OPSLIMIT_MAX
4294967295
> sodium.crypto_pwhash_OPSLIMIT_INTERACTIVE
2
> sodium.crypto_pwhash_OPSLIMIT_MODERATE
3
> sodium.crypto_pwhash_OPSLIMIT_SENSITIVE
4

So actually instead of loading keys/utils/password into config.ts, I'll just set the raw number. I need to put some tests into tests/keys/utils/password.test.ts to ensure these numbers are maintained.

@CMCDragonkai
Copy link
Member

Actually we can just the strings:

type PasswordOpsLimitChoice = 'min' | 'max' | 'interactive' | 'moderate' | 'sensitive';

type PasswordMemLimitChoice = 'min' | 'max' | 'interactive' | 'moderate' | 'sensitive';

Can derive them from the password option keys.

@CMCDragonkai
Copy link
Member

Ok time to fix up the tests.

In one of the tests PolykeyClient.test.ts we have end to end with authentication logic.

Here it is written:

    const options: PolykeyClientOptions = {
      nodePath: nodePath,
      wsConfig: {
        verifyPeer: true,
        verifyCallback: async (certs) => {
          await clientUtils.verifyServerCertificateChain(
            [pkAgent.keyRing.getNodeId()],
            certs,
          );
        },
      },
    };

This says it is verifying the server certificate of the client service.

I'd expect that this would be done automatically, since the PolykeyClient doesn't actually know what the node ID is.

Right now because the WebSocketClient is encapsulated, I've just switched on verifyPeer. I suspect this would not work because the verification of the client service needs to do it without referring the traditional x509 verification since we're dealing with self-signed node certificates.

So shouldn't this verifyCallback be embedded into PolykeyClient then?

@CMCDragonkai
Copy link
Member

The clientConnectTimeoutTime is now the default value for @timedCancellable, so PolykeyClient is now a timed cancellable thing. The main thing is WebSocketClient.createWebSocketClient.

I've also recovered the nodeId: NodeId that was removed way before. I don't remember why you did this commit @tegefaulkes 7e050e8.

@CMCDragonkai
Copy link
Member

The PolykeyClient.test.ts now passes.

One thing I noticed was that it is possible to establish a connection to an agent without application-level authentication, because authentication occurs at the RPC level.

This is similar to being able to establish a TCP connection without authentication. Which is sort of what is happening here in this case.

It's interesting that agent to agent connections P2P wise, this is not possible, we do mutual TLS, thus ensuring the connection is authenticated from the beginning.

But for the client service, we cannot use MTLS because clients may have access to the root certificate of the agent. Thus they are using the root password or recovery code or a session token in their RPC requests.

I'm just wondering if we can move down this auth to the connection layer itself, to avoid malicious connections just hanging on to the client service. One quick way to deal with this is with a timeout on the PK client that has no RPC request activity, but this can be complicated because who knows, maybe I do want a long-running PK client connection.

If the auth were to be pushed down to the connection, this could work since websocket connections are just HTTP 1.1 connections atm. However transport layer wise, we don't have a defined interface for such an interaction. It would fit into the HTTP 1.1 basic authentication since you could set a basic token as part of the initial HTTP request.

@amydevs can you track this to create an issue.

* only `PolykeyAgentOptions` and `PolykeyClientOptions` should be used,
  all other subdomains should use flattened options
* updated all uses of subdomains to stop using `options`
* `PolykeyClient` encapsulates certificate verification behaviour
* `PolykeyClient` creation is now timed cancellable
* new utilities to deal with `NodeId` - `isNodeId`, `assertNodeId`, `generateNodeId`
@CMCDragonkai CMCDragonkai force-pushed the feature-websocket_client-encapsulation branch from abaeb82 to e57c36e Compare October 21, 2023 01:17
@CMCDragonkai
Copy link
Member

@addievo make sure to review this PR, so you can see the real scope of this particular issue, it was definitely more complex than originally thought. Because this is a the root of the entire project, you now have to consider global integrity of the entire application, which means you have to consider all sorts of trade offs here. So this is a good learning exercise.

@CMCDragonkai CMCDragonkai merged commit 426b5dd into staging Oct 21, 2023
@@ -122,11 +240,14 @@ class PolykeyClient {
this.logger.info(`Stopped ${this.constructor.name}`);
}

public async destroy() {
public async destroy({ force = false }: { force?: boolean }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to make this default to false. I don't actually think force should be true by default. There is a special case in our transports that does this, but at an application level we should know why we are forcing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tegefaulkes @amydevs @addievo Still need to consider what happens for PolykeyAgent.

@amydevs
Copy link
Contributor

amydevs commented Oct 22, 2023

The PolykeyClient.test.ts now passes.

One thing I noticed was that it is possible to establish a connection to an agent without application-level authentication, because authentication occurs at the RPC level.

This is similar to being able to establish a TCP connection without authentication. Which is sort of what is happening here in this case.

It's interesting that agent to agent connections P2P wise, this is not possible, we do mutual TLS, thus ensuring the connection is authenticated from the beginning.

But for the client service, we cannot use MTLS because clients may have access to the root certificate of the agent. Thus they are using the root password or recovery code or a session token in their RPC requests.

I'm just wondering if we can move down this auth to the connection layer itself, to avoid malicious connections just hanging on to the client service. One quick way to deal with this is with a timeout on the PK client that has no RPC request activity, but this can be complicated because who knows, maybe I do want a long-running PK client connection.

If the auth were to be pushed down to the connection, this could work since websocket connections are just HTTP 1.1 connections atm. However transport layer wise, we don't have a defined interface for such an interaction. It would fit into the HTTP 1.1 basic authentication since you could set a basic token as part of the initial HTTP request.

@amydevs can you track this to create an issue.

i feel like verifyCallback should be changed to also pass through the entire request if HTTP headers are going to be used for auth rather than MTLS @CMCDragonkai

@addievo
Copy link
Contributor Author

addievo commented Oct 24, 2023

So theres a prob with CommandStop and CommandStatus, if status is "NULL", then nodeID/Host/Port can be undefined, but PKC expects them regardless, causing issues. What do I do here?

async function processClientStatus(
  nodePath: string,
  nodeId?: NodeId,
  clientHost?: string,
  clientPort?: number,
  fs = require('fs'),
  logger = new Logger(processClientStatus.name),
): Promise<
  | {
      statusInfo: StatusStarting | StatusStopping | StatusDead;
      status: Status;
      nodeId: NodeId | undefined;
      clientHost: string | undefined;
      clientPort: number | undefined;
    }
  | {
      statusInfo: StatusLive;
      status: Status;
      nodeId: NodeId;
      clientHost: string;
      clientPort: number;
    }
  | {
      statusInfo: undefined;
      status: undefined;
      nodeId: NodeId;
      clientHost: string;
      clientPort: number;
    }
> {
  if (nodeId != null && clientHost != null && clientPort != null) {
    return {
      statusInfo: undefined,
      status: undefined,
      nodeId,
      clientHost,
      clientPort,
    };
  } else if (nodeId == null && clientHost == null && clientPort == null) {
    const statusPath = path.join(nodePath, config.paths.statusBase);
    const statusLockPath = path.join(nodePath, config.paths.statusLockBase);
    const status = new Status({
      statusPath,
      statusLockPath,
      fs,
      logger: logger.getChild(Status.name),
    });
    const statusInfo = await status.readStatus();
    if (statusInfo == null) {
      return {
        statusInfo: { status: 'DEAD', data: {} },
        status,
        nodeId: undefined,
        clientHost: undefined,
        clientPort: undefined,
      };
    } else if (statusInfo.status === 'LIVE') {
      nodeId = statusInfo.data.nodeId;
      clientHost = statusInfo.data.clientHost;
      clientPort = statusInfo.data.clientPort;
      return {
        statusInfo,
        status,
        nodeId,
        clientHost,
        clientPort,
      };
    } else {
      return {
        statusInfo,
        status,
        nodeId: undefined,
        clientHost: undefined,
        clientPort: undefined,
      };
    }
  } else {
    const errorMsg = arrayZip(
      [nodeId, clientHost, clientPort],
      [
        'missing node ID, provide it with --node-id or PK_NODE_ID',
        'missing client host, provide it with --client-host or PK_CLIENT_HOST',
        'missing client port, provide it with --client-port or PK_CLIENT_PORT',
      ],
    )
      .flatMap(([option, msg]) => {
        if (option == null) {
          return [msg];
        } else {
          return [];
        }
      })
      .join('; ');
    throw new errors.ErrorPolykeyCLIClientOptions(errorMsg);
  }
}

Problem domain

In CommandStop

    try {
        pkClient = await PolykeyClient.createPolykeyClient({
          nodeId: clientStatus.nodeId,
          host: clientStatus.clientHost,
          port: clientStatus.clientPort,
          options: {
            nodePath: options.nodePath,
          },
          logger: this.logger.getChild(PolykeyClient.name),
        });

x | undefined is not assignable to x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants