-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace JOSE with our own tokens
domain and specialise tokens for Sigchain, Notifications, Identities and Sessions
#481
Comments
The hashing of the claims should be using multihash and not just a raw SHA256 hash. We can combine this with our new hashing utilities provided by the keys domain. |
Ok so we now have:
These use libsodium to do the hashing. I also have Now the next issue is that multiformats now is only used in 2 places:
In And so it's left to claims utils that uses it and The main thing now is multihashing. Which if we want to re-use our multiformats structure, we can actually re-use our keys functions for that purpose. https://github.com/multiformats/js-multiformats#multihash-hashers
So a question is should these derivation sit inside the claims or token utilities or should they be put into a more general utilities somewhere? That way it's possible to use them in potentially other places where we may want to do multihashing. |
I've replaced multiformats usage in |
For now I'll put the multihashing derivations into |
The current
Basically something like:
However I want to digress about the JWS spec. JWS already has some form of algorithm agility through the In the general format, it may look like this:
Unlike JWE there is no shared header, so headers have to be specific to each signature. This means our Now a question is whether we can use Now that we cannot put it into a shared header, and the fact that we are using multibase encoding for the So anything that is locked down in the JWS spec will be multibase and/or multihash. So I propose we use the following:
The selected multibase will just be base64url similar to the rest of the JWS spec. While the multihash can be SHA256 for now. Another thing I want to touch on is that JWS has both The Both are case insensitive. It is recommended to omit This is something we can use later, like There are some recommendations to use Finally for the
Examples possible algs:
|
Ok so I've come up with realisations here. Consistency of JWS/JWT (token)All the claims and JWS usage right now are all over the place, and we need to make it more consistent. Here's an idea. Let's start with a generic /**
* Token based on JWT specification.
* All properties are "claims" and they are all optional.
* The entire POJO is put into the payload for signing.
*/
type Token = {
iss?: string;
sub?: string;
aud?: string | Array<string>;
exp?: number;
nbf?: number;
iat?: number;
jti?: string;
[key: string]: any;
}; The above is based on the JWT specification. And specifically it just the payload. This payload is what gets signed. Everything else is just metadata. Although a portion of the metadata is also signed. Then we can have specialise "tokens" for various usage. I suspect the /**
* Signed token based on General JWS specification.
*/
type TokenSigned = {
payload: string;
signatures: Array<{
signature: string;
protected: string;
}>;
}; In particular we know that the protected header which is base64url encoded of the JSON string will contain something like this:
Atm, sodium-native only exposes the blake2b algorithm and the HMAC-SHA512-256 (truncated 512). Even though libsodium has The HS512256 is just what is exposed by I think since we're going to do this for now, we let's just use In that case... we our multidigest work needs a bit of a tweak. Atm, we don't have blake2b, and I'm not sure which of these https://github.com/multiformats/multicodec/blob/master/table.csv are actually specific to libsodium. The default output size is 32 bytes or 256 bits. Therefore I think it is this one: Subsequently we then need to create "specialised" tokens for all the different usecases we have for them. Here are some draft tokens I've synthesised from the current situation in the codebase:
The old types were originally defined in the
I believe the new iteration is far clearer and uses standardised terminology. Note that The One important thing about the The session token doesn't have a This is why the generic token interface does not have any required properties. The The claim tokens are used by both The session token type will still be in The
Both functions will produce a Note that doubly signed claims will be made alot simpler. They will be stored on both sigchains. The |
Ok I've replaced So now keys utils exposes We don't go straight to producing a signed token structure because it can be mutated. Instead one may combine these things together to produce a It seems that a safe way of doing this is to construct an rich object representing a token, and the ability to add and remove signatures at will, then the ability to produce the serialised representation. That way, one would not just be able to mutate the token payload, or just add random signatures in. It would encapsulate the mechanics of tokens internally. |
So I've renamed them to be more clear:
MAC is clearer than hash. Since we are not just producing a digest, but a MAC which ensures integrity and authenticity. |
We now have a It exposes several methods:
During verification, the default behaviour is to iterate over the signatures to verify the token. This means it actually algorithm or signature doesn't match, it just goes to the next one. This may not be efficient if there are lots of signatures on a token. However this is not likely to be the case, so at most This So for now there is no "general" token manager. Just like how there is no "general" key manager. Such things would only come into play during secure computation. Both of which would have to be built on top of the vault system to take advantage of high-level secret exchange like git sync #385. There are of course limitations here... especially if the FS paradigm is too limiting for such secrets. But generally high level UIs tend to fit into the FS metaphor. But we could imagine a KeyManager and TokenManager sitting on top of the vault system. |
So therefore |
JWKs are just a particular variant of a Token payload. The token payload has some "registered claims", but these are all optional. So a payload is ultimately just any structured data that can be represented by JSON. The token payload can be signed, which turns it into a JWS. A token payload can be encrypted, which turns it into a JWE. A token payload can be signed then encrypted, which means it becomes a JWS, then becomes a JWE. This allows arbitrary nesting of signing and encryption. The expected "payload" when signing a JWT was meant to be a base64url of the JSON stringification. When it's a JWE though, we also have different representations for the JWS if we want to use it as a payload like compact, general and flattened. So I'm not sure if it matters which representation you are encrypting. Because JWE only desires that the plaintext is some octet-sequence string. This could mean that it depends on the
The spec seems to recommend that when nesting a JWT, the cty should be |
Ok I've actually worked out what all these So for now we will have |
The The next step is testing for the We also need to adapt the In particular the sigchain shouldn't actually be storing the Inside the sigchain, it makes sense to leave the payload unencoded, so it will be just JSON stringified. However if the JSON data contains binary data, then this becomes complicated. We now that Another idea is that the sigchain can store the signatures separately. After all the token format isn't really designed for efficient storage. It's possible that the payload can be just JSON, while the signature should be stored more efficiently elsewhere. Anyway the internal storage format of sigchain just needs to be flexible and optimised, it doesn't have to be the same as the token format. |
The difference between
Not all tokens are claims. But all claims are tokens. Claims are signed. But the base type of claims is just the structured data. The signed version of the data is interface Claim extends TokenPayload {
jti: ClaimIdEncoded;
iat: number;
nbf: number;
prev: string | null;
seq: number;
}
interface ClaimSigned<T extends Claim> extends TokenSigned {
payload: T;
signatures: Array<{
protected: {
alg: 'EdDSA';
kid: NodeIdEncoded;
[key: string]: any;
};
signature: Signature;
}>;
} And the payloads: interface ClaimLinkIdentity extends Claim {
iss: NodeIdEncoded;
sub: ProviderIdentityId;
} The |
So the sigchain doesn't have to store things exactly as a JWT. That was a source of complexity before, where the data being stored was also encoded, or there was an encoding/decoding process for the entire JWT spec. Instead the Sigchain can store claims and sigchain in a normalised fashion. These are also the un-encoded versions.
Regarding the However we won't exactly know what is the "oldest" index, it would require looking up the previous number first, before then making a put. This is not some global counter like or ID generators or sequence. The alternative is to store the entire array in one go. But as we saw before with the other domains, we are storing things in a more normalised way to lift the data structure to the DB's knowledge. This does mean it's possible to create new claims, and then sign claims from the past. It's also possible to remove signatures too... but this should also be disallowed by the interface of the sigchain. What does it mean to add a new signature to a past claim? Does it change the semantics of the claim? I don't think so. It's ok to add new claims, and add new signatures to existing claims. Things still work. However it should not be allowed to remove claims or remove signatures. Only forward movement is allowed. Of course, one can always destroy the data as well, that changes thing a bit. No replication consensus just yet. If we do later want to do a blockchain consensus on this, I think then the signing past claims are not allowed. Yes, because the |
In the sigchain, I'm currently reviewing the I can see there's a challenge here in terms of managing intermediate claims or doubly signed claims, or claims that need to be signed multiple parties. In the current staging branch, this involves several additional methods and types for incrementally processing the claim. And this translates to having a method like I don't like the way this is structured. We can simply this procedure. The first thing to understand is that adding claims to the sigchain is serialised process. Only one can be done at a time. The second thing to realise is that one cannot sign or manipulate signatures of existing claims in the sigchain. The sigchain is "immutable". It's append only. This is enforced through the hashing property, which contains the hash of the previous signed claim including all the signatures. This means adding a claim in must be transactional, and therefore any doubly or multi-party signed claim must be done before the claim is entered into the With the availability of transactions and locks we can do this now with just one method. This method can be Once the callback is executed and finishes we can then put the claim in the DB and commit the transaction. The transaction will also lock to ensure serialised usage here. During doubly signed processes, they will need to have timed cancellable deadlines so they cancel the operation. Similarly one can cancel the transaction by throwing an exception while in the transaction. However if one just calls |
The callback is called
This takes the token which is now parameterised to But now by exposing the This means, our |
I'm removing the Will need to add indexes into the sigchain now based on usecase to know what kind of information we need to operate over. |
One of the weird things that was done was that /**
* Serialized version of a node's sigchain.
* Currently used for storage in the gestalt graph.
*/
type ChainData = Record<ClaimIdEncoded, Claim>;
/**
* Serialized version of a node's sigchain, but with the claims as
* Should be used when needing to transport ChainData, such that the claims can
* be verified without having to be re-encoded as ClaimEncoded types.
*/
type ChainDataEncoded = Record<ClaimIdEncoded, ClaimEncoded>; You can see here that this would be scalable. The chain data could grow forever. Why do we need the entire chain data in the node info? Surely there's only some information that is relevant...? Apparently this information is then stored in the gestalt graph. So I have a feeling this information is being used for the gestalt linking. However this is of course not efficient. |
Ok we need to do some intermediate testing of the new sigchain for now before we proceed. This would give us the ability to know whether our |
Then we have to figure out the sigchain integration into the |
Ok sigchain has now fastcheck doing property tests on all of its methods. We can now proceed with sigchain integration into We have the ability to link node to node, and link node to identity. Things we want to ask the sigchain:
Without revocations, and assuming all claims are only link tokens, it's sufficient to just download the entire sigchain data and use that. That might explain why the entire chain data was passed around. However with revocations it makes sense that we would want index the sigchain by connections to other nodes or identities. Then one can lookup the index instead. Should our indexing be hardcoded or something that is doable based on the user? In the tasks system, tasks can be arbitrarily indexed by paths. However this makes less sense if we have specific operations on the sigchain for looking up these identity claims. Right now the sigchain can take any input claim data. But when indexing, it only makes sense to do that for specific claims, and ignore the other kinds of claim types. We may need to change the Automatic indexing on the DB was meant to allow us to easily create indexes instead of doing it manually, but even more automatic would be something that creates indexes over the entire structure of the data. That might look like indexing every key. Note that indexing something right now makes the key and value unencrypted because keys cannot be encrypted atm. Ok I think we start with a set of allowed claim types. Then index them appropriately. The sigchain can provide domain specific operations, as we expect for identity operations, and in the future other kinds of operations for other kinds of claims. |
The
It is used by It is in fact not used in the There's a part of the code in I'm going to change the name of that variable to ensure that we can see that this is mostly a gestalt graph sort of thing. It ends up being used as part of:
Which is then placed into:
This would imply that during our GG currently stores a static Comparing to
So the problem here is the GG is storing a static data, that was discovered. This data should be considered "stale" as soon as the data was acquired. And we should also figure out exactly what kind of data we actually need here. Given that I think the main idea here is that the GG stores
The identity info is meant to be a record mapping an "identity claim ID" to the identity claim. The node info is meant to contain the claims that the node is also claiming. I think these 2 types make more sense to put it into the Both types are only used by These types are not relevant to the |
The However the The reason is because a These properties are not part of the claim itself, because the claim would not have this information before it is posted. This means anything actually returning For example the methods of Furthermore the Therefore these methods would have to change. Firstly they should be returning something that contains the Secondly the additional metadata shouldn't be part of the same structure being called a claim, it's not actually part of the claim. It's just metadata returned as part of the response that can be useful. So what I'm doing is this:
This has now resulted in the GH provider using This schema verification is a bit of a problem, because it should be defined as part of the payloads in this case So next thing is to update the One thing that is different is that we are not publishing At the same time, we would expect any published token to be the signed token, so we would not bother validating non-signed tokens. So all the schemas would all be verifying "signed tokens". |
There's a problem with publishing the It's the same problem that sigchain has with JSON encoding of the binary signatures (which is I why I created the In the sigchain we ended up storing the JSON encoding of the buffer. So the problem is this, we have So what we really want is something in between. Something where the payload isn't encoded into base64url, but where the signatures are encoded into base64url. This means JWS is just kind of not well designed for this usecase - look this is just not human readable:
We're already augmenting the JWS spec. So I guess at this point we might as well and go one step further. We will have a "human readable" format. The main reason why JWS general format/JSON format still base64url encodes JWS is due to this #481 (comment). As in JWS's payload could be non-JSON, it could be anything else. But for our usecase, the payload is JSON and in fact, it's meant to be a proper JWT. But the end result is that we get a non-human readable payload. There should have been a JWT format (in JWS) that had human readable messages. Maybe something like "Super-General" format, not just General format. So... we go from compact, to flattened to general to human readable format. In such a format, one would argue that the payload shouldn't be encoded, the protected headers shouldn't be encoded. Nothing should be encoded EXCEPT the binary signatures and MAC codes. Let's created a new format for JWT-JWS, the human readable format. This format doesn't have anything base64url encoded except for signatures. |
I think the for the tokens and claims, these should be rolled into the validation routines, we have no real need to have them as JSON schema, since these are never going to be exposed at an API-level (like I have done before with OpenAPI). JSON schema can be useful once JSON RPC gets provided (or a variant of JSON like CBOR to enable binary data RPC) |
I think top-level await will be needed before we can really make use of the JSON schemas well, and mostly at the highest-level validation routines involving the API where we get a good RoI for doing something like OpenAPI. This means all the JSON schemas in This can be done for a later work though as I'm continuing down the path of fixing up token usage. If we proceed with this we may want to either centralise all the parsers into the validation domain, they currently make use of all the decoding functions anyway, so or we can spread out the parsers across the domains. It depends... In many cases we want to be able to do They shouldn't be importing the validation utils as validation is a highly connected module and that means any breakage any part of the code will break everything depending on the validation utilities. So replacing JSON schemas with validation parsers should also end up decentralising the parsers across the codebase, while the validation domain can re-export those parsers so they can be used in one place. The The MatrixAI/MatrixAI-Graph#44 has been created to address input validation refactoring and the decentralising of the validation domain. |
It looks like the |
With regards to MatrixAI/MatrixAI-Graph#44, one important thing to realise is that JSON schemas are limited to JSON data, not to arbitrary JS data. Arbitrary JS POJOs have to use validation parsers, not the JSON schema. This is another reason it only makes sense to have JSON schemas on the program boundaries, the actual serialised JSON data going in and out, rather than any arbitrary data. Thus Only |
I've started on this idiom:
The I'm using the |
The |
I had to change to using |
All the JSON schema validations for singly signed or doubly signed claims are removed. Instead the It's also that we aren't special casing any of these claims. Instead claims are built up on top of tokens, and all token operations still work on claims. |
If we have types |
The normal |
These grpc related utilities are left over here for posterity until I can adapt it to the new structure: /**
* Constructs a CrossSignMessage (for GRPC transfer) from a singly-signed claim
* and/or a doubly-signed claim.
*/
function createCrossSignMessage({
singlySignedClaim = undefined,
doublySignedClaim = undefined,
}: {
singlySignedClaim?: ClaimIntermediary;
doublySignedClaim?: ClaimEncoded;
}): nodesPB.CrossSign {
const crossSignMessage = new nodesPB.CrossSign();
// Construct the singly signed claim message
if (singlySignedClaim != null) {
// Should never be reached, but for type safety
if (singlySignedClaim.payload == null) {
throw new claimsErrors.ErrorClaimsUndefinedClaimPayload();
}
const singlyMessage = new nodesPB.ClaimIntermediary();
singlyMessage.setPayload(singlySignedClaim.payload);
const singlySignatureMessage = new nodesPB.Signature();
singlySignatureMessage.setProtected(singlySignedClaim.signature.protected!);
singlySignatureMessage.setSignature(singlySignedClaim.signature.signature);
singlyMessage.setSignature(singlySignatureMessage);
crossSignMessage.setSinglySignedClaim(singlyMessage);
}
// Construct the doubly signed claim message
if (doublySignedClaim != null) {
// Should never be reached, but for type safety
if (doublySignedClaim.payload == null) {
throw new claimsErrors.ErrorClaimsUndefinedClaimPayload();
}
const doublyMessage = new nodesPB.AgentClaim();
doublyMessage.setPayload(doublySignedClaim.payload);
for (const s of doublySignedClaim.signatures) {
const signatureMessage = new nodesPB.Signature();
signatureMessage.setProtected(s.protected!);
signatureMessage.setSignature(s.signature);
doublyMessage.getSignaturesList().push(signatureMessage);
}
crossSignMessage.setDoublySignedClaim(doublyMessage);
}
return crossSignMessage;
}
/**
* Reconstructs a ClaimIntermediary object from a ClaimIntermediaryMessage (i.e.
* after GRPC transport).
*/
function reconstructClaimIntermediary(
intermediaryMsg: nodesPB.ClaimIntermediary,
): ClaimIntermediary {
const signatureMsg = intermediaryMsg.getSignature();
if (signatureMsg == null) {
throw claimsErrors.ErrorUndefinedSignature;
}
const claim: ClaimIntermediary = {
payload: intermediaryMsg.getPayload(),
signature: {
protected: signatureMsg.getProtected(),
signature: signatureMsg.getSignature(),
},
};
return claim;
}
/**
* Reconstructs a ClaimEncoded object from a ClaimMessage (i.e. after GRPC
* transport).
*/
function reconstructClaimEncoded(claimMsg: nodesPB.AgentClaim): ClaimEncoded {
const claim: ClaimEncoded = {
payload: claimMsg.getPayload(),
signatures: claimMsg.getSignaturesList().map((signatureMsg) => {
return {
protected: signatureMsg.getProtected(),
signature: signatureMsg.getSignature(),
};
}),
};
return claim;
} |
The |
The /**
* Gestalt adjacency matrix represented as a collection of each vertex
* mapping to the set of adjacent vertexes.
* Kind of like: `{ a: { b, c }, b: { a, c }, c: { a, b } }`.
* Each vertex can be `GestaltNodeKey` or `GestaltIdentityKey`.
* `GestaltGraph/matrix/{GestaltKey} -> {json(GestaltKeySet)}`
*/
protected gestaltGraphMatrixDbPath: LevelPath = [
this.constructor.name,
'matrix',
];
/**
* Node information
* `GestaltGraph/nodes/{GestaltNodeKey} -> {json(GestaltNodeInfo)}`
*/
protected gestaltGraphNodesDbPath: LevelPath = [
this.constructor.name,
'nodes',
];
/**
* Identity information
* `GestaltGraph/identities/{GestaltIdentityKey} -> {json(GestaltIdentityInfo)}`
*/
protected gestaltGraphIdentitiesDbPath: LevelPath = [
this.constructor.name,
'identities',
]; Now it is possible to actually make the We could instead make use of the multilevels now and do something like this:
This would enable us to manipulate each gestalt without having to load the entire JSON structure. The It could also be applied to the |
The purpose of the The In a way, this is because the ACL has to apply to whole gestalts and the ACL isn't aware of the gestalts, but the gestalts is aware of the ACL. This dataflow relationship was always a bit iffy, and it could be reversed if gestalt changes can be observed, and ACL permissions could subscribe to that... But that is to be solved later. |
Assuming we rename These methods:
All take the One thing I don't like about this is that there's nothing ensuring consistency between the information in If the I think the original idea, is that for any link between the gestalt graph, there must be a corresponding cryptolink/claim that is also recorded by the gestalt graph. This means rather than just storing the vertexes (and associating adjacency by position), we actually need to store their edges too. The edge information must be first-class and be derived or equal to the cryptolink claims. |
If we move to using:
It would then be possible to store edge information where that However this information maybe duplicated.
We've solved this before by a point of indirection like in the ACL. A common ID where these vertex pairs may to an
Then it's also possible to GC the edges too, the edges are bidirectional anyway. So deleting any one vertex pair, also deletes the opposite pair. This now gives us an opportunity to store information about each vertex, and also information about each edge. This means we don't store an entire copy of the sigchain into each vertex's info. In fact, I'm not entirely sure what should be put in to the vertex info at this point in time. The Now the edges themselves can store information, and we can derive a structure from We could store the entire claim, since they just contain information like:
However, no signature data is available here. Should we store the signatures as well? I feel like we should be doing this. That would then look like this:
Now there is a problem similar to the In the |
If the edge info were to be stored directly into the DB without JSON encoding. It may be a quite complex due the nested structure of the claims. So I think we should stick with JSON in that case. As for the node info and identity info, they can include One thing about the edge info that does require a bit of change. One problem is that link identity claims has additional metadata that cannot be in the claim itself. This is reflected by a new type:
This means, the edge info isn't just an We may need to specialise edge types like we do with vertex types. So it could be something like:
We would have to ensure that we can differentiate the 2 kinds of links as well. |
Also thinking of simplifying the
|
These types are now in
And we are proceeding to update the We shouldn't be storing the encoded versions of the
Remember the actual key in the DB gets base128 encoded and it is expected to be a buffer, so style 2 is likely to be more efficient anyway. If we use style 2, we may not have any use for the However we can see that |
So we are using Utility functions have changed their names to avoid confusion with the validation procedures in MatrixAI/MatrixAI-Graph#44. |
With the change in the gestalts graph structure, one must be able to also represent a singleton gestalt, a vertex with no adjacencies.
Furthermore, if want to ensure that the Prior issue for this is: #319. |
It's possible to create an index like This could be used instead of the ACL right now where multiple NodeIds point to a shared At that point, it would be possible to create permissions for a specific gestalt, and use the Of course this would be out of scope, but it may reduce the complexity between ACL and GG. The |
I've had to create special types |
The GG is now centralising alot of its methods on the The GG now also checks that the link data is accurate, but it does not check the signatures during its linking operations. So what this means now is that cryptolinks are first class structures in the GG. When the GG gives back gestalts, the entire structure is still It would be understood that the edges being stored is the latest edge. And if discovery discovers that the edges are no longer correct, the GG would excise that vertex. Signature verification has to be done separately by the discovery. It has to fetch the claims and then check it. However it will always be out of date, because it's possible revocations could occur. For cryptolinks between nodes and identities, one must always check the identity. Therefore there's always a sort of timestamp according to what has been discovered. So far, no timestamp is being set for the gestalt graph data at this point in time. When using the GG, it would be important for the node's own gestalt to be maintained, which means discovery should also be discovering its own gestalt. That would all be addressed once discovery is factored into this JOSE move. In the future, the GG and ACL should be reviewed if the graphs themselves can have a unique ID, and for the ACL to set permissions to the graph's unique ID, rather than setting it for every node ID in the graph. That seems like a more elegant structure. Still not entirely settled on the design between gestalt and ACL, and whether ACL should be subsumed entirely under gestalts. |
Specification
With the recent crypto update in #446, we have found that the JOSE library is not longer compatible, and not even with the webcrypto polyfill.
Since JOSE is easier to replace than x509, we have decided to remove JOSE entirely.
This affects several domains:
sigchain
- claims here are general JWS structures, with multiple signaturesidentities
- claims are being put on identity providers, but this was developed before we understood JOSE and JWS, these can now use specialised tokensnotifications
- using JWS to send "signed" messages, the encryption of these notification messages are reliant on P2P E2E encryption, but we don't store the messages as encrypted, they are stored unencrypted, but subsequently disk-encrypted, however we retain the signature of the messages as they are usefulsessions
- using JWS MAC authenticated tokens as a short-lived session token for authentication (remember you always exchange long-lived credentials for short-lived credentials)This is the relationship of JOSE to what we are doing:
Additional context
Tasks
claims
totokens
, we will have a generic "Token" structureTokenClaim
orTokenSession
can then be used by various domains like sigchain, identities and notifications and sessionsThe text was updated successfully, but these errors were encountered: