Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating remote node information after root keypair and NodeID has changed. #386

Open
tegefaulkes opened this issue Jun 21, 2022 · 11 comments
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Jun 21, 2022

Specification

The root certificate of nodes and by extension their root key pair and NodeID is allowed to change. There are 3 methods in the KeyManager that does this;

  • renewRootKeyPair, Generates a new root keyPair, adds it to the cert chain and is signed by the previous certificate. This creates a new NodeId ensures that the old NodeId is still valid.
  • resetRootKeyPair, Throws out the old cert chain and generates a new on from scratch. The new cert is self signed. This is a new NodeId and the old one is now Invalid.
  • resetRootCert. This keeps the old root keypair but generates the cert chain from scratch. This means the NodeId remains the same but the cert chain is a new one from scratch.

What we want to handle is if a nodes starts using a new NodeId but it's old NodeId remains valid within it's cert chain. When dealing with these nodes we need to treat it as the valid old NodeId while updating any information we have about it to use it's new NodeId.

This means we will have to check the full cert chain when connecting to a node. Determine if the NodeId we're expecting is an older node in the cert chain. If it is older we need to obtain the new NodeId of the node and start using that.

The following domains need to be updated to handle the NodeId of a remote node changing.

  • sigchain This stores any claims between nodes. I think a new claim would need to be generated and reference the old one. but this wlll need to be looked into.
  • GestaltGraph This keeps track of nodes within their gestalts. If a node's ID changes then it needs to be updated here.
  • ACL stores permissions allowed for each node. If a node's ID changes then it needs to be updated here.
  • NodeGraph, I think that the new node will be added to the NodeGraph automatically through the usual operation. We may need to remove the old NodeId from the graph. Since the node can be contacted via all it's NodeIds in the cert chain it's possible for a node to exist in the graph as multiple NodeIds. So we only want to keep the newest NodeId for the node.

It's also possible that when a node's ID changes it's old NodeId and information will slowly exit the network as it is replaced with the new NodeId. As a result the node will become un-discoverable with the old ID though it may be unlikely. We may need a way to discover a node ID via it's old nodeId by having nodes keep information about the ID change when we update our information on it. That way we can inform other nodes about the change when they come looking for it.

Additional context

Tasks

  1. Add a way to detect when a nodes root certificate, key and ID has changed when we connect to it.
  2. Update any information about the node when we detect the change. These changes will need to be applied to the following domains.
    1. SigChain
    2. GestaltGraph
    3. ACL
    4. NodeGraph
  3. Add ability to make note of the change to inform other nodes of the change.
@CMCDragonkai
Copy link
Member

I actually still don't like the fact that we get our NodeId from this.keyManager.getNodeId().

Obviously the KeyManager is the SoT for the root key, but it has a code smell.

A Node is an abstraction around the KeyManager, therefore the KeyManager should be supplying its functionality to a Node class or something representing the current node.

That class can then provide some of the common functionality required...

But NodeId is a derived property, and is likely to change over time. We would need things to instead be "observing" the NodeId value.

I think some research into observables might be relevant.

You can get the current Node Id, or you can register to observe the NodeId. I imagine some changes to EventBus may be relevant too.

This then allows our domain models to be reactive to change. Right now they are only reactive to function calls.

Consider the Push/Pull concept:

                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Push────────► Realised State │
│               │                    │                │
└───────┬───────┘                    └────────▲───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow



                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Pull────────► Realised State │
│               │                    │                │
└───────▲───────┘                    └────────┬───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow

We could setup a push based reactivity where NodeId changes pushes out to the other domains. And this may apply to several areas in the code.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 10, 2022

In #466, I've renamed these methods to (in CertManager):

renewCertWithNewKeyPair

resetCertWithNewKeyPair

resetCertWithExistingKeyPair

These names are more descriptive as to the intention of what these are meant to do.

During operational usage, only renewCertWithNewKeyPair is used. The other 2 are more used for diagnostics and debugging.

The user may decide however to "break" the chain... and use the the other 2 functions.

If they do, we have to plan for how this impacts the sigchain. It may result in a breaking the sigchain since all previous claims are no longer valid or can be verified.

Specifically resetCertWithNewKeyPair basically means there's no cryptographic link from the previous key node identity to the new identity. It's almost like starting fresh in the identity space of Polykey.

However resetCertWithExistingKeyPair is curious. Because although you break the link with previous identities. You do you have still retain the current identity, which means, any claims signed by the current identity is still valid.

We may need to be able to automatically garbage collect invalid claims on the sigchain... or we still preserve this data, but make the consumers robust against this sort of situation by ignoring claims are that are no longer cryptographically valid.

@CMCDragonkai
Copy link
Member

See: #446 (comment) for discussion regarding what happens during certificate renewal.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 15, 2022

There are many domains that currently don't work when the NodeId changes. For example the sigchain creates a claim ID generator, the generator must be updated with the new NodeId when it has been changed. To ensure this occurs, we basically have to subscribe to these changes. However lacking rxjs integration right now, we have to do this with an explicit update function on these objects, and then have this be set up in the PolykeyAgent on the cert change data.

      this.events.on(
        PolykeyAgent.eventSymbols.CertManager,
        async (data: CertManagerChangeData) => {
          this.logger.info(`${KeyRing.name} change propagating`);
          await this.status.updateStatusLive({
            nodeId: data.nodeId,
          });
          await this.nodeManager.resetBuckets();
          // Update the sigchain
          await this.sigchain.onKeyRingChange();
          const tlsConfig: TLSConfig = {
            keyPrivatePem: keysUtils.privateKeyToPEM(
              data.keyPair.privateKey,
            ),
            certChainPem: await this.certManager.getCertPEMsChainPEM(),
          };
          this.grpcServerClient.setTLSConfig(tlsConfig);
          this.proxy.setTLSConfig(tlsConfig);
          this.logger.info(`${KeyRing.name} change propagated`);
        },
      );

For now, this is a short term fix until we properly integrate rxjs observables into our class objects.

It makes more sense for these objects to subscribe to the specific data. And for the ability to "emit" change events to their properties in the realm of the domain objects themselves.

So in the future I expect Sigchain to subscribe to a property like this.keyRing.nodeId$.

@CMCDragonkai
Copy link
Member

@CMCDragonkai
Copy link
Member

@tegefaulkes can you confirm if #552 will end up fixing this issue? If so, change #552 to fix this issue instead of related.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 12, 2023

I think even after this is technically done, extensive testing would need to be done to ensure that this change propagation is in fact robust and also works across the network. But we are likely get there only at testnet 7 rather than testnet 6 #551

@CMCDragonkai
Copy link
Member

Relevant?

@tegefaulkes
Copy link
Contributor Author

I think it's still relevant. No direct work has been done to address this. We can handle a node when it's ID has changed but currently we don't really update anything about the ID change.

@CMCDragonkai
Copy link
Member

This has changed from using the EventBus to using js-events where all the domains are evented. However this behaviour has not been verified yet. So PK nodes atm cannot handle nodes that it has previously connected to, but the NodeId has changed?

@CMCDragonkai
Copy link
Member

I was refactoring NodeGraph in #618, and I noticed the usage of this.keyRing.getNodeId() alot. While this allows NodeGraph to always get the latest NodeId, of the node, it also means it's pull oriented and also requires a separate system to remember to call NodeGraph.resetBuckets when the node's own NodeId changes.

It's also a bit awkward to require a huge dependency on the KeyRing only to get the NodeId. This seems like a code smell. At the very least we should be able to decouple the node's own NodeId as a sort of reactive property that is driven by changes in the system. So it's something to think about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
Development

No branches or pull requests

3 participants