Updating remote node information after root keypair and `NodeID` has changed. #386

tegefaulkes · 2022-06-21T04:04:13Z

Specification

The root certificate of nodes and by extension their root key pair and NodeID is allowed to change. There are 3 methods in the KeyManager that does this;

renewRootKeyPair, Generates a new root keyPair, adds it to the cert chain and is signed by the previous certificate. This creates a new NodeId ensures that the old NodeId is still valid.
resetRootKeyPair, Throws out the old cert chain and generates a new on from scratch. The new cert is self signed. This is a new NodeId and the old one is now Invalid.
resetRootCert. This keeps the old root keypair but generates the cert chain from scratch. This means the NodeId remains the same but the cert chain is a new one from scratch.

What we want to handle is if a nodes starts using a new NodeId but it's old NodeId remains valid within it's cert chain. When dealing with these nodes we need to treat it as the valid old NodeId while updating any information we have about it to use it's new NodeId.

This means we will have to check the full cert chain when connecting to a node. Determine if the NodeId we're expecting is an older node in the cert chain. If it is older we need to obtain the new NodeId of the node and start using that.

The following domains need to be updated to handle the NodeId of a remote node changing.

sigchain This stores any claims between nodes. I think a new claim would need to be generated and reference the old one. but this wlll need to be looked into.
GestaltGraph This keeps track of nodes within their gestalts. If a node's ID changes then it needs to be updated here.
ACL stores permissions allowed for each node. If a node's ID changes then it needs to be updated here.
NodeGraph, I think that the new node will be added to the NodeGraph automatically through the usual operation. We may need to remove the old NodeId from the graph. Since the node can be contacted via all it's NodeIds in the cert chain it's possible for a node to exist in the graph as multiple NodeIds. So we only want to keep the newest NodeId for the node.

It's also possible that when a node's ID changes it's old NodeId and information will slowly exit the network as it is replaced with the new NodeId. As a result the node will become un-discoverable with the old ID though it may be unlikely. We may need a way to discover a node ID via it's old nodeId by having nodes keep information about the ID change when we update our information on it. That way we can inform other nodes about the change when they come looking for it.

Additional context

Relates to Test communications when root key pair is renewed and NodeID is changed #317
Seed Agent Cluster Auto-Configuration #403 - auto configuration of the seed cluster also has to deal with NodeId changes
Problem with tests/bin/keys/reset.test.ts and tests/bin/keys/renew.test.ts stopping the PolykeyAgent #347 - never figured out the core problem for test failures during key renewal
Integrate Observables to coordinate push-based Dataflow (with reactive properties) #444 - Main big issue about considering observable and push-based dataflow, however that's about inter-domain reactivity, whereas this is about node to node reactivity

Tasks

Add a way to detect when a nodes root certificate, key and ID has changed when we connect to it.
Update any information about the node when we detect the change. These changes will need to be applied to the following domains.
1. SigChain
2. GestaltGraph
3. ACL
4. NodeGraph
Add ability to make note of the change to inform other nodes of the change.

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2022-08-01T13:11:14Z

I actually still don't like the fact that we get our NodeId from this.keyManager.getNodeId().

Obviously the KeyManager is the SoT for the root key, but it has a code smell.

A Node is an abstraction around the KeyManager, therefore the KeyManager should be supplying its functionality to a Node class or something representing the current node.

That class can then provide some of the common functionality required...

But NodeId is a derived property, and is likely to change over time. We would need things to instead be "observing" the NodeId value.

I think some research into observables might be relevant.

You can get the current Node Id, or you can register to observe the NodeId. I imagine some changes to EventBus may be relevant too.

This then allows our domain models to be reactive to change. Right now they are only reactive to function calls.

Consider the Push/Pull concept:

                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Push────────► Realised State │
│               │                    │                │
└───────┬───────┘                    └────────▲───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow



                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Pull────────► Realised State │
│               │                    │                │
└───────▲───────┘                    └────────┬───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow

We could setup a push based reactivity where NodeId changes pushes out to the other domains. And this may apply to several areas in the code.

CMCDragonkai · 2022-10-10T06:26:08Z

In #466, I've renamed these methods to (in CertManager):

renewCertWithNewKeyPair

resetCertWithNewKeyPair

resetCertWithExistingKeyPair

These names are more descriptive as to the intention of what these are meant to do.

During operational usage, only renewCertWithNewKeyPair is used. The other 2 are more used for diagnostics and debugging.

The user may decide however to "break" the chain... and use the the other 2 functions.

If they do, we have to plan for how this impacts the sigchain. It may result in a breaking the sigchain since all previous claims are no longer valid or can be verified.

Specifically resetCertWithNewKeyPair basically means there's no cryptographic link from the previous key node identity to the new identity. It's almost like starting fresh in the identity space of Polykey.

However resetCertWithExistingKeyPair is curious. Because although you break the link with previous identities. You do you have still retain the current identity, which means, any claims signed by the current identity is still valid.

We may need to be able to automatically garbage collect invalid claims on the sigchain... or we still preserve this data, but make the consumers robust against this sort of situation by ignoring claims are that are no longer cryptographically valid.

CMCDragonkai · 2022-10-12T07:35:43Z

See: #446 (comment) for discussion regarding what happens during certificate renewal.

CMCDragonkai · 2022-10-15T12:02:38Z

There are many domains that currently don't work when the NodeId changes. For example the sigchain creates a claim ID generator, the generator must be updated with the new NodeId when it has been changed. To ensure this occurs, we basically have to subscribe to these changes. However lacking rxjs integration right now, we have to do this with an explicit update function on these objects, and then have this be set up in the PolykeyAgent on the cert change data.

      this.events.on(
        PolykeyAgent.eventSymbols.CertManager,
        async (data: CertManagerChangeData) => {
          this.logger.info(`${KeyRing.name} change propagating`);
          await this.status.updateStatusLive({
            nodeId: data.nodeId,
          });
          await this.nodeManager.resetBuckets();
          // Update the sigchain
          await this.sigchain.onKeyRingChange();
          const tlsConfig: TLSConfig = {
            keyPrivatePem: keysUtils.privateKeyToPEM(
              data.keyPair.privateKey,
            ),
            certChainPem: await this.certManager.getCertPEMsChainPEM(),
          };
          this.grpcServerClient.setTLSConfig(tlsConfig);
          this.proxy.setTLSConfig(tlsConfig);
          this.logger.info(`${KeyRing.name} change propagated`);
        },
      );

For now, this is a short term fix until we properly integrate rxjs observables into our class objects.

It makes more sense for these objects to subscribe to the specific data. And for the ability to "emit" change events to their properties in the realm of the domain objects themselves.

So in the future I expect Sigchain to subscribe to a property like this.keyRing.nodeId$.

CMCDragonkai · 2022-10-15T12:05:18Z

About the $ suffix naming convention: https://stackoverflow.com/questions/70053021/should-i-use-sign-suffix-for-subjects-in-the-rxjs-and-angular

CMCDragonkai · 2023-09-12T08:07:14Z

@tegefaulkes can you confirm if #552 will end up fixing this issue? If so, change #552 to fix this issue instead of related.

CMCDragonkai · 2023-09-12T08:08:20Z

I think even after this is technically done, extensive testing would need to be done to ensure that this change propagation is in fact robust and also works across the network. But we are likely get there only at testnet 7 rather than testnet 6 #551

CMCDragonkai · 2023-10-18T06:49:27Z

Relevant?

tegefaulkes · 2023-10-18T23:28:28Z

I think it's still relevant. No direct work has been done to address this. We can handle a node when it's ID has changed but currently we don't really update anything about the ID change.

CMCDragonkai · 2023-11-05T23:39:35Z

This has changed from using the EventBus to using js-events where all the domains are evented. However this behaviour has not been verified yet. So PK nodes atm cannot handle nodes that it has previously connected to, but the NodeId has changed?

CMCDragonkai · 2023-11-10T18:06:56Z

I was refactoring NodeGraph in #618, and I noticed the usage of this.keyRing.getNodeId() alot. While this allows NodeGraph to always get the latest NodeId, of the node, it also means it's pull oriented and also requires a separate system to remember to call NodeGraph.resetBuckets when the node's own NodeId changes.

It's also a bit awkward to require a huge dependency on the KeyRing only to get the NodeId. This seems like a code smell. At the very least we should be able to decouple the node's own NodeId as a sort of reactive property that is driven by changes in the system. So it's something to think about.

tegefaulkes added the development Standard development label Jun 21, 2022

tegefaulkes mentioned this issue Jun 21, 2022

Test communications when root key pair is renewed and NodeID is changed #317

Closed

CMCDragonkai mentioned this issue Jun 30, 2022

Problem with tests/bin/keys/reset.test.ts and tests/bin/keys/renew.test.ts stopping the PolykeyAgent #347

Closed

teebirdy added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 24, 2022

This was referenced Sep 29, 2022

Updating Crypto to use WebCrypto API and to replace RSA with ECC #446

Merged

Slice out KeyManager into 2 parts: KeyRing, CertificateManager #472

Closed

Integrate Observables to coordinate push-based Dataflow (with reactive properties) #444

Open

CMCDragonkai assigned tegefaulkes Jul 10, 2023

CMCDragonkai mentioned this issue Sep 12, 2023

Consolidating Configuration Entropy and Nodes and Client Services Refactoring #552

Merged

15 tasks

CMCDragonkai mentioned this issue Nov 5, 2023

CLI Beta Launch MatrixAI/Polykey-CLI#40

Closed

amydevs mentioned this issue Nov 13, 2023

Setup audit domain for tracking user/action events and metrics #628

Closed

CMCDragonkai mentioned this issue Nov 14, 2023

Change tests depending on KeyRing change propagation to stop using sleep, and instead use mocks or multi-event barrier #594

Open

tegefaulkes removed their assignment Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating remote node information after root keypair and `NodeID` has changed. #386

Updating remote node information after root keypair and `NodeID` has changed. #386

tegefaulkes commented Jun 21, 2022 •

edited by CMCDragonkai

Loading

CMCDragonkai commented Aug 1, 2022

CMCDragonkai commented Oct 10, 2022 •

edited

Loading

CMCDragonkai commented Oct 12, 2022

CMCDragonkai commented Oct 15, 2022 •

edited

Loading

CMCDragonkai commented Oct 15, 2022

CMCDragonkai commented Sep 12, 2023

CMCDragonkai commented Sep 12, 2023 •

edited

Loading

CMCDragonkai commented Oct 18, 2023

tegefaulkes commented Oct 18, 2023

CMCDragonkai commented Nov 5, 2023

CMCDragonkai commented Nov 10, 2023

Updating remote node information after root keypair and NodeID has changed. #386

Updating remote node information after root keypair and NodeID has changed. #386

Comments

tegefaulkes commented Jun 21, 2022 • edited by CMCDragonkai Loading

Specification

Additional context

Tasks

CMCDragonkai commented Aug 1, 2022

CMCDragonkai commented Oct 10, 2022 • edited Loading

CMCDragonkai commented Oct 12, 2022

CMCDragonkai commented Oct 15, 2022 • edited Loading

CMCDragonkai commented Oct 15, 2022

CMCDragonkai commented Sep 12, 2023

CMCDragonkai commented Sep 12, 2023 • edited Loading

CMCDragonkai commented Oct 18, 2023

tegefaulkes commented Oct 18, 2023

CMCDragonkai commented Nov 5, 2023

CMCDragonkai commented Nov 10, 2023

Updating remote node information after root keypair and `NodeID` has changed. #386

Updating remote node information after root keypair and `NodeID` has changed. #386

tegefaulkes commented Jun 21, 2022 •

edited by CMCDragonkai

Loading

CMCDragonkai commented Oct 10, 2022 •

edited

Loading

CMCDragonkai commented Oct 15, 2022 •

edited

Loading

CMCDragonkai commented Sep 12, 2023 •

edited

Loading