Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

Closed
CMCDragonkai opened this issue Nov 6, 2022 · 5 comments · Fixed by #446
Closed

Gestalt Link Schema Refactoring - Derived from JOSE replacement #492

CMCDragonkai opened this issue Nov 6, 2022 · 5 comments · Fixed by #446
Assignees
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Nov 6, 2022

Specification

The major change to the GG is how the vertexes and edges are stored.

Originally the vertexes we stored as collection of records. Each record's key would be the neighbouring vertex and the record's value would be null.

This was put into the matrix level path, and this was suited to sparse adjacency matrices.

Now the GG stores the vertexes as pairs, where both vertexes are lifted to the DB key.

Some vertexes do not have any neighbours, in these cases, the GG stores only the vertex with the value being null.

  /**
   * Gestalt adjacency matrix represented as a collection vertex pairs.
   * Each vertex can be `GestaltNodeKey` or `GestaltIdentityKey`.
   * These are the allowable structures:
   * `GestaltGraph/matrix/{GestaltKey} -> null`
   * `GestaltGraph/matrix/{GestaltKey}/{GestaltKey} -> {raw(GestaltLinkId)}`
   */
  public readonly dbMatrixPath: Readonly<LevelPath> = [this.constructor.name, 'matrix'];

The allowed structures are:

  • Singleton node
  • Singleton identity
  • Node to node
  • Node to identity
  • Identity to node

Identity to identity is not allowed.

The previous GG design kept all the link information within the node information and identity information for each vertex. This is now replaced by having each vertex pair store a GestaltLinkId. This is a uniquely generated random ID that points to a GestaltLink.

The GestaltLink represents the edge information between vertexes. They wrap the SignedClaim<ClaimLinkIdentity> and SignedClaim<ClaimLinkNode> with additional required metadata. This is far more efficient than storing the entire chain of claims with each vertex, and this way we are able to provide the claim which "proves" the link between vertexes.

  /**
   * Gestalt links.
   * `GestaltGraph/links/{GestaltLinkId} -> {json(GestaltLink)}`
   */
  public readonly dbLinksPath: Readonly<LevelPath> = [this.constructor.name, 'links'];

From these 2 changes, we derive the full impact on the GestaltGraph and downstream usage. Let's go through each of them.

Internal GestaltKey Type

In the old GG, the GestaltKey was a JSON serialisation of the GestaltId. And the GestaltId was a tagged union of structures wrapping NodeId or ProviderId and IdentityId.

The GestaltKey is used because it is necessary to disambiguate between the 2 kinds of vertexes that would be stored in under the same level path matrix. The disambiguation relies on a "prefix". In the old GG, this was done through JSON serialisation of an ADT with type key. This has been changed to using a direct prefix with some buffer concatenation.

3 major changes here:

  1. The DB does not store the encoded versions of the NodeId in its key path. This is important because, if we ever change the encoding algorithm, we would not be able to reference old NodeId. Therefore the DB must never use encoded IDs in its key paths.
  2. Instead of using JSON serialisation everywhere, where possible we have used an tuple encoding instead with a prefix as the first element. This is contrasted to using a type key to form tagged unions for richer structures. I believe this tuple structure of ['node', NodeId] and ['identity', ProviderIdentityId] is more suitable for "linear" data, especially data to be used as keys.
  3. The ProviderIdentityId is a tuple of [ProviderId, IdentityId].

The new GestaltKey is just an opaque buffer now. And we have toGestaltKey and fromGestaltKey utility functions to convert between the GestaltKey and the GestaltId.

So just remember GestaltKey is an "internal" type, only to be used within the GestaltGraph. It is never to be used outside the GestaltGraph, which means all input and output types with respect to the GestaltGraph is always the GestaltId or encoded versions of GestaltId. This is why the GestaltKey utility functions are part of gestalts/utils and not ids/index.

The Gestalt type now uses GestaltIdEncoded rather than GestaltKey, because that is just a prefixed versions of NodeIdEncoded and ProviderIdentityIdEncoded, and these are external types.

Binary Data in the DB

The storing of the links into the DB means dealing with the signature buffers in SignedClaim. The solution to this is the same as we do everywhere else. We have to use the JSON encoding of the buffer, but then create XJSON variants of the types we are storing in the DB. When putting to the DB, we can use X, but when we are getting from the B, we have to use XJSON types. Then use a utility function to convert the JSON buffers back to regular buffers. You can see in fromGestaltLinkJSON. This problem occurs because our DB doesn't support structured types with buffers. There's an issue on js-db to integrate a drop-in replacement to JSON, using something like BSON or CBOR so that we don't have to do this JSON encoding work for more common types in JS.

GestaltGraph's ACL

The way the GG uses the ACL does not change. However in the process of developing the above design, I realised that the ACL schema could be improved by introducing a GraphId (could also known as a GestaltId but this conflicts with the existing type GestaltId representing the union of GestaltNodeId and GestaltIdentityId). This GraphId would be a uniquely randomly generated ID that points to a disjoint subgraph in the GG. Then permissions can be assigned to each GraphId. This eliminates the need for PermId and having to maintain a map of all the NodeId in a gestalt to that common PermId.

This could have downstream impacts too, because now it's possible to have a single identifier that refers to a "gestalt", rather than trying to use "any of the vertexes in a gestalt". Many algorithms such as getGestalts can be made more efficient by using the the GraphId instead. This kind of improvement can be done later.

Traversing a Gestalt

Almost all the methods in the GG make use of traversing a gestalt. This is now centralised under 1 single protected method called: GestaltGraph.getGestaltByKey.

Originally this was done through 2 methods GestaltGraph.getGestaltByKey and GestaltGraph.traverseGestalt. These 2 methods are now longer necessary, both functionalities have become part of the new GestaltGraph.getGestaltByKey.

Take note that it takes a visited set which gets mutated during execution. This visited set can be used outside of the function such as in the case of getGestalts.

Setting and Unsetting Vertexes

Setting a vertex and unsetting vertexes are very very similar to before.

You can see this with the already working GestaltGraph.setNode and GestaltGraph.setIdentity.

When vertexes are first set, they are always set as a singleton gestalt.

This means under the matrix level path, only 1 GestaltKey is used and the value is null.

Note that setNode and setIdentity are capable of updating the node or identity information. But they only set the matrix level path if they haven't already been set.

This means even as vertexes are being linked or unlinked, one can independently update their node information or identity information.

However do note that concurreny conflicts can occur here, but this should be bubbled back up to the user for a retry policy.

When a vertex is unset, their permissions in the ACL must be deleted, and all their links must also be broken. This means you have to iterate over all its neighbours and remove those entries in matrix. But you must also remove themselves from the matrix if they are a singleton gestalt.

GestaltGraph.setNode(
  nodeInfo: GestaltNodeInfo,
  tran?: DBTransaction,
): Promise<['node', NodeId]>
;
GestaltGraph.setIdentity(
  identityInfo: GestaltIdentityInfo,
  tran?: DBTransaction,
): Promise<['identity', ProviderIdentityId]>;

GestaltGraph.unsetNode(
  nodeId: NodeId,
  tran?: DBTransaction,
): Promise<void>;

GestaltGraph.unsetIdentity(
  providerIdentityId: ProviderIdentityId,
  tran?: DBTransaction,
): Promise<void>;

// Calls one of `setNode` or `setIdentity`
GestaltGraph.setVertex(
  gestaltInfo: GestaltInfo,
  tran?: DBTransaction,
): Promise<GestaltId>;

// Calls one of `unsetNode` or `unsetIdentity`
GestaltGraph.unsetVertex(
  gestaltId: GestaltId,
  tran?: DBTransaction,
): Promise<void>;

Linking and Unlinking Vertexes

Linking vertexes establishes vertex pair in the matrix level path.

It is also capable of updating the node or identity information.

These are the methods to be updated:

GestaltGraph.linkNodeAndNode(
  nodeInfo1: GestaltNodeInfo,
  nodeInfo2: GestaltNodeInfo,
  linkNode: Omit<GestaltLinkNode, 'id'>,
  tran?: DBTransaction,
): Promise<GestaltLinkId>;

GestaltGraph.linkNodeAndIdentity(
  nodeInfo: GestaltNodeInfo,
  identityInfo: GestaltIdentityInfo,
  linkIdentity: Omit<GestaltLinkIdentity, 'id'>,
  tran?: DBTransaction,
): Promise<GestaltLinkId>;

// Overloaded version of linkNodeAndNode and linkNodeAndIdentity
GestaltGraph.linkVertexAndVertex(
  gestaltInfo1: GestaltInfo,
  gestaltInfo2: GestaltInfo,
  link: [GestaltLink[0], Omit<GestaltLink[1], 'id'>]
  tran?: DBTransaction,
): Promise<GestaltLinkId>;

GestaltGraph.unlinkNodeAndNode(
  nodeId1: NodeId,
  nodeId2: NodeId,
  tran?: DBTransaction,
): Promise<void>;

GestaltGraph.unlinkNodeAndIdentity(
  nodeId: NodeId,
  providerIdentityId: ProviderIdentityId,
  tran?: DBTransaction,
): Promise<void>;

// Overlaoded version of unlinkNodeAndNode and unlinkNodeAndIdentity
GestaltGraph.unlinkVertexAndVertex(
  gestaltId1: GestaltId,
  gestaltId2: GestaltId,
  tran?: DBTransaction,
): Promise<void>;

The linking methods now also take a GestaltLink and it will check that the link's claim data matches the vertexes being linked. Currently only checkLinkNodeMatches utility function is available. One must be built also for node to identity links.

However it will not be checking the signatures of the claims in the links. It is expected that the signatures will already be checked by whatever is submitting the link to the GG. This is likely the job of the discovery domain.

Linking vertexes together has several effects:

  1. Update the permission with the union of the 2 gestalt permissions in the ACL.
  2. Update the link information.
  3. Set the vertex pair.
  4. Remove any singleton entry.
  5. Update node or identity information.
  6. Return new GestaltLinkId.

In the case that the vertexes are already linked, it will only:

  1. Update the link information.
  2. Update node or identity information.
  3. Return existing GestaltLinkId.

Gestalt Actions

The GG depends on the ACL. This means ACL changes actually occur in the GG, not directly on the ACL.

The gestalt actions should remain the same. Only adjust according to the new data schema in GG.

These are the relevant methods to be re-enabled:

GestaltGraph.getGestaltActions(
  gestaltId: GestaltId,
  tran?: DBTransaction
): Promise<GestaltActions | undefined>:

GestaltGraph.setGestaltActions(
  gestaltId: GestaltId,
  action: GestaltAction,
  tran?: DBTransaction
): Promise<void>:

GestaltGraph.unsetGestaltActions(
  gestaltId: GestaltId,
  action: GestaltAction,
  tran?: DBTransaction
): Promise<void>:

Getting Information

Methods to create/update:

GestaltGraph.getGestalts(
  tran?: DBTransaction
): AsyncGenerator<Gestalt>;

GestaltGraph.getGestalt(
  gestaltId: GestaltId,
  tran?: DBTransaction
): Promise<Gestalt | undefined>;

GestaltGraph.getNode(
  nodeId: NodeId,
  tran?: DBTransaction,
): Promise<GestaltNodeInfo | undefined>;

GestaltGraph.getIdentity(
  providerIdentityId: ProviderIdentityId,
  tran?: DBtransaction,
): Promise<GestaltIdentityInfo | undefined>;

// Overloaded getVertex

GestaltGraph.getVertex(
  gestaltId: ['node', NodeId],
  tran?: DBTransaction,
): Promise<['node', GestaltNodeInfo] | undefined>;
GestaltGraph.getVertex(
  gestaltId: ['identity', ProviderIdentityId],
  tran?: DBTransaction,
): Promise<['identity', GestaltIdentityInfo] | undefined>;
GestaltGraph.getVertex(
  gestaltId: GestaltId,
  tran?: DBTransaction,
): Promise<GestaltInfo | undefined>;

// Overloaded getLink

GestaltGraph.getLink(
  linkId: GestaltLinkId,
  tran?: DBTransaction,
): Promise<GestaltLink | undefined>;
GestaltGraph.getLink(
  gestaltId1: ['node', NodeId],
  gestaltId2: ['node', NodeId],
  tran?: DBTransaction,
): Promise<['node', GestaltNodeLink] | undefined>;
GestaltGraph.getLink(
  gestaltId1: ['identity', ProviderIdentityId],
  gestaltId2: ['node', NodeId],
  tran?: DBTransaction,
): Promise<['identity', GestaltIdentityLink] | undefined>;
GestaltGraph.getLink(
  gestaltId1: ['node', NodeId],
  gestaltId2: ['identity', ProviderIdentityId],
  tran?: DBTransaction,
): Promise<['identity', GestaltIdentityLink] | undefined>;
GestaltGraph.getLink(
  gestaltId1: GestaltId,
  gestaltId2: GestaltId,
  tran?: DBTransaction,
): Promise<GestaltLink | undefined>;

Usage

What we would expect is that the links are being discovered by discovery and these links are being checked, and then the necessary updates are made on the GG.

The GG is also how the discovery starts its discovery process. At the moment there is no timestamp to any of the GG data, except perhaps the links.

Future improvements should add an expiry/timestamp all the data in the GG so we know how old the data is. When it was last verified. This can aid the discovery and deciding what to prioritize for discovery.

Additional context

Tasks

  1. Refer to the spec above, and implement the spec with respect to the GestaltGraph.ts
  2. Add in fast-check testing to the methods of GestaltGraph, attempt model testing similar to the CertManager.test.ts.
  3. Remove all old tests that are unnecessary and clunky.
  4. Update any relevant GRPC handlers that are using the GG.
  5. Move parseGestaltId into parseGestaltIdHuman in gestalts/utils as per https://github.com/MatrixAI/MatrixAI-Graph/issues/44
@tegefaulkes
Copy link
Contributor

All the methods have been implemented now. Next step is updating the tests.

@CMCDragonkai
Copy link
Member Author

Adding a note here about GraphId. The number of operations don't change between using ACL with NodeId and GestaltGraph with GraphId. You still have to store GraphId -> GestaltId and GestaltId -> GraphId. And when you are joining 2 gestalts together, one of the gestalt's GraphId has to be re-assigned in a transaction. Ideally you choose the smallest gestalt to do so, however it's not really possible to know which gestalt is the smallest unless you maintain a counter for the number of vertices in the gestalt. This is of course possible once you do have a notion of GraphId, whereas you couldn't do this with the current ACL design.

So in the future we should switch to using GraphId but it is not a priority right now.

@CMCDragonkai
Copy link
Member Author

Testing the GG should use the fast check system. Pretty much from now on, all of our tests should be fast check driven. It's tooling is vastly superior to writing individual unit tests.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Nov 25, 2022

Example of a test prop for setNode.

  testProp.only(
    'getNode, setNode and unsetNode',
    [
      testsIdsUtils.nodeIdArb.chain(testsGestaltsUtils.gestaltNodeInfoArb)
    ],
    async (gestaltNodeInfo) => {
      const gestaltGraph = await GestaltGraph.createGestaltGraph({
        db,
        acl,
        logger,
      });
      expect(await gestaltGraph.setNode(gestaltNodeInfo)).toEqual([
        'node',
        gestaltNodeInfo.nodeId,
      ]);
      expect(await gestaltGraph.getNode(gestaltNodeInfo.nodeId)).toEqual(
        gestaltNodeInfo
      );
      await gestaltGraph.unsetNode(gestaltNodeInfo.nodeId);
      expect(await gestaltGraph.getNode(gestaltNodeInfo.nodeId)).toBeUndefined();
      await gestaltGraph.stop();
    }
  );

@CMCDragonkai
Copy link
Member Author

I've added GestaltGraph.getLinkById to get a gestalt link by an gestalt link id.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
Development

Successfully merging a pull request may close this issue.

2 participants