-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gestalt Link Schema Refactoring - Derived from JOSE replacement #492
Comments
All the methods have been implemented now. Next step is updating the tests. |
Adding a note here about So in the future we should switch to using |
Testing the GG should use the fast check system. Pretty much from now on, all of our tests should be fast check driven. It's tooling is vastly superior to writing individual unit tests. |
Example of a test prop for testProp.only(
'getNode, setNode and unsetNode',
[
testsIdsUtils.nodeIdArb.chain(testsGestaltsUtils.gestaltNodeInfoArb)
],
async (gestaltNodeInfo) => {
const gestaltGraph = await GestaltGraph.createGestaltGraph({
db,
acl,
logger,
});
expect(await gestaltGraph.setNode(gestaltNodeInfo)).toEqual([
'node',
gestaltNodeInfo.nodeId,
]);
expect(await gestaltGraph.getNode(gestaltNodeInfo.nodeId)).toEqual(
gestaltNodeInfo
);
await gestaltGraph.unsetNode(gestaltNodeInfo.nodeId);
expect(await gestaltGraph.getNode(gestaltNodeInfo.nodeId)).toBeUndefined();
await gestaltGraph.stop();
}
); |
I've added |
Specification
The major change to the GG is how the vertexes and edges are stored.
Originally the vertexes we stored as collection of records. Each record's key would be the neighbouring vertex and the record's value would be
null
.This was put into the
matrix
level path, and this was suited to sparse adjacency matrices.Now the GG stores the vertexes as pairs, where both vertexes are lifted to the DB key.
Some vertexes do not have any neighbours, in these cases, the GG stores only the vertex with the value being
null
.The allowed structures are:
Identity to identity is not allowed.
The previous GG design kept all the link information within the node information and identity information for each vertex. This is now replaced by having each vertex pair store a
GestaltLinkId
. This is a uniquely generated random ID that points to aGestaltLink
.The
GestaltLink
represents the edge information between vertexes. They wrap theSignedClaim<ClaimLinkIdentity>
andSignedClaim<ClaimLinkNode>
with additional required metadata. This is far more efficient than storing the entire chain of claims with each vertex, and this way we are able to provide the claim which "proves" the link between vertexes.From these 2 changes, we derive the full impact on the
GestaltGraph
and downstream usage. Let's go through each of them.Internal
GestaltKey
TypeIn the old GG, the
GestaltKey
was a JSON serialisation of theGestaltId
. And theGestaltId
was a tagged union of structures wrappingNodeId
orProviderId
andIdentityId
.The
GestaltKey
is used because it is necessary to disambiguate between the 2 kinds of vertexes that would be stored in under the same level pathmatrix
. The disambiguation relies on a "prefix". In the old GG, this was done through JSON serialisation of an ADT withtype
key. This has been changed to using a direct prefix with some buffer concatenation.3 major changes here:
NodeId
in its key path. This is important because, if we ever change the encoding algorithm, we would not be able to reference oldNodeId
. Therefore the DB must never use encoded IDs in its key paths.type
key to form tagged unions for richer structures. I believe this tuple structure of['node', NodeId]
and['identity', ProviderIdentityId]
is more suitable for "linear" data, especially data to be used as keys.ProviderIdentityId
is a tuple of[ProviderId, IdentityId]
.The new
GestaltKey
is just an opaque buffer now. And we havetoGestaltKey
andfromGestaltKey
utility functions to convert between theGestaltKey
and theGestaltId
.So just remember
GestaltKey
is an "internal" type, only to be used within theGestaltGraph
. It is never to be used outside theGestaltGraph
, which means all input and output types with respect to theGestaltGraph
is always theGestaltId
or encoded versions ofGestaltId
. This is why theGestaltKey
utility functions are part ofgestalts/utils
and notids/index
.The
Gestalt
type now usesGestaltIdEncoded
rather thanGestaltKey
, because that is just a prefixed versions ofNodeIdEncoded
andProviderIdentityIdEncoded
, and these are external types.Binary Data in the DB
The storing of the links into the DB means dealing with the signature buffers in
SignedClaim
. The solution to this is the same as we do everywhere else. We have to use the JSON encoding of the buffer, but then createXJSON
variants of the types we are storing in the DB. When putting to the DB, we can useX
, but when we are getting from the B, we have to useXJSON
types. Then use a utility function to convert the JSON buffers back to regular buffers. You can see infromGestaltLinkJSON
. This problem occurs because our DB doesn't support structured types with buffers. There's an issue on js-db to integrate a drop-in replacement to JSON, using something like BSON or CBOR so that we don't have to do this JSON encoding work for more common types in JS.GestaltGraph's ACL
The way the GG uses the ACL does not change. However in the process of developing the above design, I realised that the ACL schema could be improved by introducing a
GraphId
(could also known as aGestaltId
but this conflicts with the existing typeGestaltId
representing the union ofGestaltNodeId
andGestaltIdentityId
). ThisGraphId
would be a uniquely randomly generated ID that points to a disjoint subgraph in the GG. Then permissions can be assigned to eachGraphId
. This eliminates the need forPermId
and having to maintain a map of all theNodeId
in a gestalt to that commonPermId
.This could have downstream impacts too, because now it's possible to have a single identifier that refers to a "gestalt", rather than trying to use "any of the vertexes in a gestalt". Many algorithms such as
getGestalts
can be made more efficient by using the theGraphId
instead. This kind of improvement can be done later.Traversing a Gestalt
Almost all the methods in the GG make use of traversing a gestalt. This is now centralised under 1 single protected method called:
GestaltGraph.getGestaltByKey
.Originally this was done through 2 methods
GestaltGraph.getGestaltByKey
andGestaltGraph.traverseGestalt
. These 2 methods are now longer necessary, both functionalities have become part of the newGestaltGraph.getGestaltByKey
.Take note that it takes a
visited
set which gets mutated during execution. Thisvisited
set can be used outside of the function such as in the case ofgetGestalts
.Setting and Unsetting Vertexes
Setting a vertex and unsetting vertexes are very very similar to before.
You can see this with the already working
GestaltGraph.setNode
andGestaltGraph.setIdentity
.When vertexes are first set, they are always set as a singleton gestalt.
This means under the
matrix
level path, only 1GestaltKey
is used and the value isnull
.Note that
setNode
andsetIdentity
are capable of updating the node or identity information. But they only set thematrix
level path if they haven't already been set.This means even as vertexes are being linked or unlinked, one can independently update their node information or identity information.
However do note that concurreny conflicts can occur here, but this should be bubbled back up to the user for a retry policy.
When a vertex is unset, their permissions in the ACL must be deleted, and all their links must also be broken. This means you have to iterate over all its neighbours and remove those entries in
matrix
. But you must also remove themselves from thematrix
if they are a singleton gestalt.Linking and Unlinking Vertexes
Linking vertexes establishes vertex pair in the
matrix
level path.It is also capable of updating the node or identity information.
These are the methods to be updated:
The linking methods now also take a
GestaltLink
and it will check that the link's claim data matches the vertexes being linked. Currently onlycheckLinkNodeMatches
utility function is available. One must be built also for node to identity links.However it will not be checking the signatures of the claims in the links. It is expected that the signatures will already be checked by whatever is submitting the link to the GG. This is likely the job of the
discovery
domain.Linking vertexes together has several effects:
GestaltLinkId
.In the case that the vertexes are already linked, it will only:
GestaltLinkId
.Gestalt Actions
The GG depends on the ACL. This means ACL changes actually occur in the GG, not directly on the ACL.
The gestalt actions should remain the same. Only adjust according to the new data schema in GG.
These are the relevant methods to be re-enabled:
Getting Information
Methods to create/update:
Usage
What we would expect is that the links are being discovered by discovery and these links are being checked, and then the necessary updates are made on the GG.
The GG is also how the discovery starts its discovery process. At the moment there is no timestamp to any of the GG data, except perhaps the links.
Future improvements should add an expiry/timestamp all the data in the GG so we know how old the data is. When it was last verified. This can aid the discovery and deciding what to prioritize for discovery.
Additional context
tokens
domain and specialise tokens for Sigchain, Notifications, Identities and Sessions #481Tasks
GestaltGraph.ts
GestaltGraph
, attempt model testing similar to theCertManager.test.ts
.GG
.parseGestaltId
intoparseGestaltIdHuman
ingestalts/utils
as per https://github.com/MatrixAI/MatrixAI-Graph/issues/44The text was updated successfully, but these errors were encountered: